Do you have a UNIX computer that is unusually slow and do not know why? Let me introduce a few UNIX commands that can help.
The first UNIX command is called the system activity reporter or "sar". Sar will report the history of a UNIX computer’s load for several system activities. Let’s take a look at this example:
$ sar
HP-UX machinename B.11.11 U 9000/800 08/13/07
00:00:00 %usr %sys %wio %idle
00:20:00 4 2 0 64
00:40:00 4 2 0 90
01:00:00 5 3 0 92
01:20:00 6 4 1 88
01:40:00 6 3 2 85
02:00:00 5 3 3 80
02:20:00 7 3 2 87
02:40:00 6 3 2 80
03:00:00 5 3 2 80
03:20:00 5 3 3 81
03:40:00 6 3 3 78
04:00:00 8 3 6 72
04:20:00 6 3 5 74
Average 6 3 1 81
The first row of information is the basic information about the machine, such as the OS, machine name, OS version, etc. The interesting data is in the following columns. The first column (time) is the timestamp as to when the data was collected. By default, this is every 20 minutes. The second column (%user) is the load that the users are putting on the system. The third column (%system) is the load that the system or root (including daemons) processes are putting on the system. And fourth column (%IO) is the load or processes that are waiting for hardware reads or writes (input and outputs, also known as I/O). The final column (%idle) is the average idle load of the system; in other words, how often is the system waiting for tasks to start.
If the idle percentage load is consistently below 20% for long periods of time, then the server’s users would be noticing a slower than normal performance from the machine. Thus, they will be contacting the system administrator, saying that something is wrong, and we would start by running the "sar" UNIX command to find out why the server is slow.
If the idle percentage is low thus there is a server issue, then you will need to look at the other columns for more information. Next, check the last two rows of data. Use the average load information to determine if the problem is a recurring issue that needs a long term solution or just a one time occurrence. You may need to look at the sar history of several days to determine this. Use the second to last row to determine if the performance of the machine is currently good or bad. If the most recent idle load is high, then there is nothing presently running that is using a lot of system resources.
If the system or user percentage loads are high, then there is probably a rogue or zombie process(es) that is eating up a lot of system resources. To list all the process running on the machine sorted from the highest load first, run the "top" UNIX command (exit this tool by pressing Control-C). Here is an example of this command:
$ top
Load averages: 0.06, 0.06, 0.07
845 processes: 813 sleeping, 22 running, 6 stopped, 6 zombies
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.06 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
1 0.06 0.0% 0.0% 1.0% 99.0% 0.0% 0.0% 0.0% 0.0%
2 0.04 1.0% 0.0% 0.0% 98.0% 0.0% 0.0% 0.0% 0.0%
3 0.06 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.06 1.0% 0.0% 0.0% 99.0% 0.0% 0.0% 0.0% 0.0%
Memory: 1093116K (283720K) real, 1320320K (332308K) virtual, 10717400K free Page# 1/30
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
3 pts/th 18029 jim 154 20 39732K 18844K sleep 3:24 0.69 0.69 perl
3 ? 39 root 152 20 1888K 1888K run 2:36 0.65 0.65 vxfsd
3 ? 752 root 154 20 4180K 1756K sleep 34:04 0.64 0.64 autom
0 ? 9074 john 154 20 16812K 8892K sleep 2:09 0.63 0.63 Xvnc
1 pts/11 28419 phil 178 20 32K 32K zomb 0:00 0.42 0.42 perl
The upper part of the output shows the status of each CPU and their averages, in addition to how many zombie processes (processes without valid parent processes) and other useful information. Consider killing all zombie processes if they appear to be using too many resources and are no longer needed. The next section shows how much memory the computer has and is using, including virtual memory. This will show if the system needs more RAM installed, especially if there is a lot of thrashing from RAM onto a disk drive. Finally, the bottom section will report the list of processes in order of the most load intensive processes first. Note that this tool will update this data quite often and quickly, so look for processes that are consistently high and that can be killed without any bad effects. If you do not know how to kill a UNIX process, then please speak with a system administrator. Be careful to not terminate good processes that normally have a high load. If no particular process or set of processes are running amok, then it is probably time to upgrade your RAM or the entire machine.
Finally, if the I/O percentages are high, then that means the hardware resources are doing so many reads and writes that application processes are consistently waiting on the hardware to finish. This is a common issue with high traffic websites, where the high rate of reading of webpages taxes the capability of the hardware. Once you have determined that the I/O load is high, you must determine which drive(s) have a high load by running the "iostat" UNIX command. Here is an example of this command:
$ iostat
device bps sps msps
c2t0d0 0 0.0 1.0
c2t1d0 0 0.0 1.0
c4t1d1 0 0.0 1.0
c6t1d1 0 0.0 1.0
c8t2d0 0 0.0 1.0
c6t2d0 0 0.0 1.0
c4t2d1 0 0.0 1.0
This command will list each device by their internal names, followed by the kilobytes transferred per second (bps), number of seeks per second (sps), and milliseconds per average seek time (msps). Look for any unusually high numbers for any particular device; therefore, you should keep history records of this data to look for data trends that are higher than normal. If the computer transfers a lot of data, such as downloading and uploading files, the closely monitor the transfer rate (bps). If the computer reads in a lot of small packets of information scattered throughout different databases, then look at the other two data columns. Either way, this will quickly reveal any overloaded drives that need upgrading or offloading to other drives that have lower loads.
That pretty much summarizes an introductory to monitoring, analyzing, and improving UNIX system loads. Don’t forget to read the manual pages for these commands to see additional options by using these commands: "man sar", "man top", and "man iostat".