Articolo di AlexioBash pubblicato sul suo portale dedicato ad ArchLinux
Sapere cosa accade “real time” nel proprio sistema è secondo me la base per poter sfruttare e ottimizzare al meglio il proprio OS. Su ArchLinux o meglio su GNU/Linux in generale ci viene in aiuto il comando “top
”, un monitor di sistema molto utile e semplice da utilizzare, che ci permette anche di capire perché il nostro OS subisce dei rallentamenti. Il comando da lanciare sul terminale è:
$ top |
ed avremo una schermata simile a quella presenta sulla destra:
Vediamo ora di spiegare al meglio ogni riga di tutte le informazioni che troviamo all’interno della schermata.
1° Riga — TOP
La prima riga indica in ordine:
- ora attuale (11:37:19)
- uptime della macchina (up 1 day, 1:25)
- utenti attualmente connessi (3 users)
- media del carico di sistema (load average: 0.02, 0.12, 0.07) i 3 valori si riferiscono all’ultimo minuto, gli ultimi 5 minuti, l’ultimo quarto d’ora.
2° Riga – task
La seconda riga indica in ordine:
- Processi totali in esecuzione (73 total)
- Processi attivi (2 running)
- Processi dormienti (71 sleeping)
- Processi in stop (0 stopped)
- Processi che aspettano di essere stoppati dal processo padre (0 zombie)
3° Riga — cpu
La terza riga indica la modalità di utilizzo della cpu. Se notate bene la somma di tutte le percentuali date è il 100% della cpu. Vediamo in ordine cosa indicano:
- Percentuale del carico dei processi utente (0.3%us)
- Percentuale del carico dei processi di sistema (0.0%sy)
- Percentuale del carico dei processi con priorità di aggiornamento nice (0.0%ni)
- Percentuale di inattività della cpu (99,4%id)
- Percentuale dei processi in attesa di operazioni I/O (0.0%wa)
- Percentuale della CPU che servono interrupt hardware (0.3% hi — Hardware IRQ
- Percentuale della CPU che servono interrupt software(0.0% si — Software Interrupts
- La quantità di CPU ‘rubata’ da questa macchina virtuale dall’hypervisor per altre attività (come far funzionare un altra macchina virtuale) questo sarà 0 su desktop e server senza macchine virtuali. (0.0%st — Steal Time)
4° e 5° Riga — uso della memoria
La quarta e quinta riga indicano rispettivamente l’utilizzo di memoria fisica (Ram) e della swap. In ordine: Memoria totale, in uso, libera, buffers o cached. Su questo argomento potete leggere il seguente approfondimento
Righe seguenti — Lista dei Processi
In fine vengono elencati, in ordine di carico, i processi attualmente in uso. Vediamo cosa indicano in ordine:
- PID – l’ID del processo (4522)
- USER – l’utente proprietario del processo (root)
- PR – la priorità in cui il processo è stato lanciato (15)
- NI – Valore rappresentante il valore “NICE” (0)
- VIRT – memoria virtuale utilizzata dal processo (132m)
- RES – memoria fisica utlizzata dal processo (14m)
- SHR – memoria condivisa dal processo (3204)
- S – indica lo stato del processo: S=sleep R=esecuzione Z=zombie (S)
- %CPU – indica la percentuale del processo sul carico sulla cpu (0.3)
- %MEM – indica la percentuale del processo sul carico della RAM (0.7)
- TIME+ – indica il tempo di attività del processo (0:17.75)
- COMMAND – indica il nome del processo (bb_monitor.pl)
Conclusione
Ora che abbiamo visto nel dettaglio tutte le informazioni che restituisce il comando “top”, sarà più facile comprendere il motivo di un eccessivo carico e/o rallentamento del sistema.
Una valida alternativa a “TOP” è “HTOP“, una evoluzione di top con delle funzionalità davvero strepitose.
Popular Posts:
- None Found
‘atop’ is *way* better and should replace top as a standard
I use top, but more often use htop, which is like top on steriods. It’s likely in most repos for any distro.
OK, so you say what each line “means”, but at that level, the information is pretty much there for the reading in the labels… or the top manpage, anyway.
The sort of questions I remember asking myself when I first started were things like:
OK, we have “Load”. I know what average means and I /think/ I know what load means, and it goes up as the system gets busier, and down as the system is less busy, but what does “0.02” (the load 1 minute average in the example above) actually MEAN? It OBVIOUSLY can’t be percent CPU load, because that’s reported too, and it’s WAYY too low for that!
I KNOW it’s load average, because I can READ the labels. And it’s easy enough to find that the three numbers are 1/5/15 minute averages. What I did NOT know was what load average was.
FWIW, if I understand correctly but not getting /too/ technical. “load”, on Linux/Unix, refers the the number of threads that would be ready to run if given CPU time at that instant, as opposed to those that couldn’t run, because they’re waiting for some event. For instance, as I type this, firefox is spending a lot of time waiting for me to type the next letter in the textbox, so it’s mostly idle and not runnable, thus not contributing to load very much. The instantaneous load can be seen on top’s task line as “running” (note that at the moment the measurement is taken, top itself is running, so top should always report at least one running, more if anything else is running at that instant as well, also note that “running” actually means “runnable”, it’s quite possible to have a “running” load well above the actual number of CPU cores available on your system, so it’s NOT reporting actually RUNNING, but “runnable”, despite the label). For those who like a file interface, it’s also available as the “procs_running” line near the bottom of /proc/stats. Also see /proc/loadavg for the averages, a runnable/total threads ratio as the 4th number, and I don’t know what the 5th is.
“Load average” indicates how many threads were runnable on average during the time in question, and a one-minute load average of 1.00, under ideal conditions, would mean that exactly one cpu core was running at 100% during the last full minute, which on a single-core system would be 100% CPU utilization (ideally). Of course in actuality, a 1.00 one minute load average isn’t likely to keep a single CPU core at 100% usage for the full minute, because for a small fraction of that minute there was likely a 2.0 instantaneous load as the average was updated and as other system tasks did their thing, which means there was also a fraction of time when nothing was ready to run and that core was idle. So to get close to 100% CPU utilization on a single core in practice takes a bit more, say a 1.5 load average.
And of course on a modern multi-core, the load necessary to fully utilize all cores goes up along with the number of cores. On my 6-core bulldozer, for instance, a load average of 6.00 would be the (ideal/theoretical) minimum required to keep the CPU fully occupied, but in practice, a load average of 9 or so would be more efficient at it. But get a load average more than about double your number of cores (and of course for CPUs that have it, there’s hyperthreading to figure in as well, noting that each hyperthread is counted like a physical core would be, tho there’s differences…), and your CPU is probably a bottleneck (altho it can be a storage bottleneck as well, since i/o-wait is counted as load). Additionally, the higher the load average goes, the more real CPU cycles the kernel takes to manage scheduling, so while the kernel can /manage/ a load of several hundred per core as long as memory and other resources don’t run out, it’s spending a lot of time switching tasks in and out instead of actually doing them, and is therefore going to take longer in wall-time to complete those tasks, than it would have if load were limited to say 3 per core. (With tasks like kernel builds, it’s possible to tell the system how many jobs to run in parallel. A kernel build is a really nice example here, as if allowed to schedule unlimited jobs and given enough memory, it can paralllel several hundred, perhaps over a thousand. I do that here routinely and get up to about 600, tho it takes my full 16 gig of ram and goes into swap to do so. But while it’s fun to watch the runnables climb, even if limited a bit so it doesn’t swap TOO much, letting it schedule say 20 parallel make jobs per core, 120 jobs total on my 6-core, does take longer in wall time to complete than something more reasonable, say four jobs per core, 24 jobs total on my 6-core.)
OK, so people reading this comment can now /understand/ what the load average numbers are, and can make better guesses about the tasks line (tho the zombie entry needs more explanation). But what about the rest? The title of the article suggests the reader will /understand/ top when he’s done reading it. Hardly! If they can read labels (and if they couldn’t, how could they read this article) and had run the command before, they barely know more about it whan when they started, let alone UNDERSTAND it. Now the article needs to be expanded to do pretty much what I did for load, to nearly every single entry (the uptime, etc, seems pretty self explanatory). What does user vs system vs nice vs wait vs… actually MEAN, for instance. What’s do nice and priority mean in the Linux scheduling context, and how do they relate to each other? If the memory line says I have 0 free memory, does that mean the system’s about to crash? (Hint: No, the “free” in “free memory” doesn’t mean what one might intuitively /think/ it means, in this context.) We have the names and the numbers, now we just need to UNDERSTAND them, something the article title and opening blurb suggest the article will help with, but which it unfortunately did a rather poor job at. Where’s the “understand” part? =:^(
Thank you for making this article actually worth reading. I was very frustrated in the uselessness of it until I got to your comment.
Thanks, I learned stuff.
Interesting that the author has responded to other comments but not this one 😛
there are some lines in Italian that slipped through untranslated:
%CPU – indica la percentuale del processo sul carico sulla cpu (0.3)
%MEM – indica la percentuale del processo sul carico della RAM (0.7)
TIME+ – indica il tempo di attività del processo (0:17.75)
COMMAND – indica il nome del processo (bb_monitor.pl)
I _do_ understand them 🙂 but for the sake of consistency, please translate those…
Thanks Woo.
Fixed
ahaha kill yourself
thanks for your explanations, but which is better htop or top
You can find top in every Linux system, while you have to install htop that IMO give much more information
Useful article, helping diagnose server load problems. Using with atop is much recommended.
Thank you.
Thank you..
If server has multiple processor suppose (3) is CPU usage will show 300% in top command or consolidate under 100%, and How can we see individual CPU Usage if server has multiple processors
Thanks for sharing! Keep it up
Big help with figuring out what the linux ‘top’ column headers all meant. Thank you!!
I LIKE NICE POST.