Understanding the Top command on Linux

Aug 282012

Article by AlexioBash published on his website about ArchLinux in italian.

Know what is happening in “real time” on your systems is in my opinion the basis to use and optimize your OS. On ArchLinux or better on GNU/Linux in general the top command can help us, this is a very useful system monitor that is really easy to use, and that can also allows us to understand why our OS suffers and which process use most resources. The command to be run on the terminal is:

$ top

And we’ll get a screen similar to the one on the right:

Let’s see now every single row of this output to explain all the information found within the screen.

1° Row — top

This first line indicates in order:

current time (11:37:19)
uptime of the machine (up 1 day, 1:25)
users sessions logged in (3 users)
average load on the system (load average: 0.02, 0.12, 0.07) the 3 values refer to the last minute, five minutes and 15 minutes.

2° Row – task

The second row gives the following information:

Processes running in totals (73 total)
Processes running (2 running)
Processes sleeping (71 sleeping)
Processes stopped (0 stopped)
Processes waiting to be stoppati from the parent process (0 zombie)

3° Row – cpu

The third line indicates how the cpu is used. If you sum up all the percentages the total will be 100% of the cpu. Let’s see what these values indicate in order:

Percentage of the CPU for user processes (0.3%us)
Percentage of the CPU for system processes (0.0%sy)
Percentage of the CPU processes with priority upgrade nice (0.0%ni)
Percentage of the CPU not used (99,4%id)
Percentage of the CPU processes waiting for I/O operations(0.0%wa)
Percentage of the CPU serving hardware interrupts (0.3% hi — Hardware IRQ
Percentage of the CPU serving software interrupts (0.0% si — Software Interrupts
The amount of CPU ‘stolen’ from this virtual machine by the hypervisor for other tasks (such as running another virtual machine) this will be 0 on desktop and server without Virtual machine. (0.0%st — Steal Time)

4° and 5° Rows – memory usage

The fourth and fifth rows respectively indicate the use of physical memory (RAM) and swap. In this order: Total memory in use, free, buffers cached. On this topic you can also read the following article

Following Rows — Processes list

And as last thing ordered by CPU usage (as default) there are the processes currently in use. Let’s see what information we can get in the different columns:

PID – l’ID of the process(4522)
USER – The user that is the owner of the process (root)
PR – priority of the process (15)
NI – The “NICE” value of the process (0)
VIRT – virtual memory used by the process (132m)
RES – physical memory used from the process (14m)
SHR – shared memory of the process (3204)
S – indicates the status of the process: S=sleep R=running Z=zombie (S)
%CPU – This is the percentage of CPU used by this process (0.3)
%MEM – This is the percentage of RAM used by the process (0.7)
TIME+ –This is the total time of activity of this process (0:17.75)
COMMAND – And this is the name of the process (bb_monitor.pl)

Conclusions

Now that we have seen in detail all the information that the command “top” returns, it will be easier to understand the reason of excessive load and/or the slowing of the system.

A good alternative to “ TOP ” is “ HTOP “, an evolution of top with features really amazing.

17 Responses to “Understanding the Top command on Linux”

imo says:

Wednesday August 29th, 2012 at 03:56 PM

‘atop’ is *way* better and should replace top as a standard

Reply
Frederick Wrigley says:

Wednesday August 29th, 2012 at 05:55 PM

I use top, but more often use htop, which is like top on steriods. It’s likely in most repos for any distro.

Reply
Duncan says:

Thursday August 30th, 2012 at 01:53 AM

OK, so you say what each line “means”, but at that level, the information is pretty much there for the reading in the labels… or the top manpage, anyway.

The sort of questions I remember asking myself when I first started were things like:

OK, we have “Load”. I know what average means and I /think/ I know what load means, and it goes up as the system gets busier, and down as the system is less busy, but what does “0.02” (the load 1 minute average in the example above) actually MEAN? It OBVIOUSLY can’t be percent CPU load, because that’s reported too, and it’s WAYY too low for that!

I KNOW it’s load average, because I can READ the labels. And it’s easy enough to find that the three numbers are 1/5/15 minute averages. What I did NOT know was what load average was.

FWIW, if I understand correctly but not getting /too/ technical. “load”, on Linux/Unix, refers the the number of threads that would be ready to run if given CPU time at that instant, as opposed to those that couldn’t run, because they’re waiting for some event. For instance, as I type this, firefox is spending a lot of time waiting for me to type the next letter in the textbox, so it’s mostly idle and not runnable, thus not contributing to load very much. The instantaneous load can be seen on top’s task line as “running” (note that at the moment the measurement is taken, top itself is running, so top should always report at least one running, more if anything else is running at that instant as well, also note that “running” actually means “runnable”, it’s quite possible to have a “running” load well above the actual number of CPU cores available on your system, so it’s NOT reporting actually RUNNING, but “runnable”, despite the label). For those who like a file interface, it’s also available as the “procs_running” line near the bottom of /proc/stats. Also see /proc/loadavg for the averages, a runnable/total threads ratio as the 4th number, and I don’t know what the 5th is.

“Load average” indicates how many threads were runnable on average during the time in question, and a one-minute load average of 1.00, under ideal conditions, would mean that exactly one cpu core was running at 100% during the last full minute, which on a single-core system would be 100% CPU utilization (ideally). Of course in actuality, a 1.00 one minute load average isn’t likely to keep a single CPU core at 100% usage for the full minute, because for a small fraction of that minute there was likely a 2.0 instantaneous load as the average was updated and as other system tasks did their thing, which means there was also a fraction of time when nothing was ready to run and that core was idle. So to get close to 100% CPU utilization on a single core in practice takes a bit more, say a 1.5 load average.

And of course on a modern multi-core, the load necessary to fully utilize all cores goes up along with the number of cores. On my 6-core bulldozer, for instance, a load average of 6.00 would be the (ideal/theoretical) minimum required to keep the CPU fully occupied, but in practice, a load average of 9 or so would be more efficient at it. But get a load average more than about double your number of cores (and of course for CPUs that have it, there’s hyperthreading to figure in as well, noting that each hyperthread is counted like a physical core would be, tho there’s differences…), and your CPU is probably a bottleneck (altho it can be a storage bottleneck as well, since i/o-wait is counted as load). Additionally, the higher the load average goes, the more real CPU cycles the kernel takes to manage scheduling, so while the kernel can /manage/ a load of several hundred per core as long as memory and other resources don’t run out, it’s spending a lot of time switching tasks in and out instead of actually doing them, and is therefore going to take longer in wall-time to complete those tasks, than it would have if load were limited to say 3 per core. (With tasks like kernel builds, it’s possible to tell the system how many jobs to run in parallel. A kernel build is a really nice example here, as if allowed to schedule unlimited jobs and given enough memory, it can paralllel several hundred, perhaps over a thousand. I do that here routinely and get up to about 600, tho it takes my full 16 gig of ram and goes into swap to do so. But while it’s fun to watch the runnables climb, even if limited a bit so it doesn’t swap TOO much, letting it schedule say 20 parallel make jobs per core, 120 jobs total on my 6-core, does take longer in wall time to complete than something more reasonable, say four jobs per core, 24 jobs total on my 6-core.)

OK, so people reading this comment can now /understand/ what the load average numbers are, and can make better guesses about the tasks line (tho the zombie entry needs more explanation). But what about the rest? The title of the article suggests the reader will /understand/ top when he’s done reading it. Hardly! If they can read labels (and if they couldn’t, how could they read this article) and had run the command before, they barely know more about it whan when they started, let alone UNDERSTAND it. Now the article needs to be expanded to do pretty much what I did for load, to nearly every single entry (the uptime, etc, seems pretty self explanatory). What does user vs system vs nice vs wait vs… actually MEAN, for instance. What’s do nice and priority mean in the Linux scheduling context, and how do they relate to each other? If the memory line says I have 0 free memory, does that mean the system’s about to crash? (Hint: No, the “free” in “free memory” doesn’t mean what one might intuitively /think/ it means, in this context.) We have the names and the numbers, now we just need to UNDERSTAND them, something the article title and opening blurb suggest the article will help with, but which it unfortunately did a rather poor job at. Where’s the “understand” part? =:^(

Reply
- Mike says:
  
  Saturday August 22nd, 2015 at 02:59 AM
  
  Thank you for making this article actually worth reading. I was very frustrated in the uselessness of it until I got to your comment.
  
  Reply
- Steve says:
  
  Friday January 1st, 2016 at 01:11 AM
  
  Thanks, I learned stuff.
  
  Interesting that the author has responded to other comments but not this one 😛
  
  Reply
woo says:

Thursday August 30th, 2012 at 02:50 PM

there are some lines in Italian that slipped through untranslated:

%CPU – indica la percentuale del processo sul carico sulla cpu (0.3)
%MEM – indica la percentuale del processo sul carico della RAM (0.7)
TIME+ – indica il tempo di attività del processo (0:17.75)
COMMAND – indica il nome del processo (bb_monitor.pl)

I _do_ understand them 🙂 but for the sake of consistency, please translate those…

Reply
- linuxari says:
  
  Thursday August 30th, 2012 at 03:25 PM
  
  Thanks Woo.
  Fixed
  
  Reply
  - jimmy the destroyer says:
    
    Wednesday March 8th, 2017 at 01:15 PM
    
    ahaha kill yourself
    
    Reply
ammaro says:

Sunday December 2nd, 2012 at 03:18 PM

thanks for your explanations, but which is better htop or top

Reply
- linuxari says:
  
  Sunday December 2nd, 2012 at 07:55 PM
  
  You can find top in every Linux system, while you have to install htop that IMO give much more information
  
  Reply
euro-space.net says:

Saturday March 29th, 2014 at 04:40 PM

Useful article, helping diagnose server load problems. Using with atop is much recommended.

Reply
Kenny says:

Monday October 13th, 2014 at 01:56 PM

Thank you.

Reply
PC says:

Wednesday November 19th, 2014 at 12:41 AM

Thank you..

Reply
TecGeeks says:

Wednesday March 8th, 2017 at 11:12 AM

If server has multiple processor suppose (3) is CPU usage will show 300% in top command or consolidate under 100%, and How can we see individual CPU Usage if server has multiple processors

Reply
Ashwin says:

Thursday April 27th, 2017 at 06:01 AM

Thanks for sharing! Keep it up

Reply
LTTR says:

Wednesday November 15th, 2017 at 03:59 PM

Big help with figuring out what the linux ‘top’ column headers all meant. Thank you!!

Reply
Vidmate 2018 New Update says:

Saturday January 6th, 2018 at 09:00 AM

I LIKE NICE POST.

Reply