Article by Santosh Sivaraj, originally published on fossix.org
Introduction
The computer system is becoming a complex beast which cannot be tamed easily. In the new world the operating systems have become too big and complex for one to learn everything in depth. Most new aspiring system programmers do not have a picture of what is happening in a system when you type ./a.out. This article is an attempt to provide the picture and also the necessary details for a Linux newcomer to grasp so that he/she can refer to more detailed books for further learning. This article/paper is just a starter so when newcomers start with bigger books don’t get overwhelmed without knowing the natural flow of the Linux system.
This paper tries to explain the life of a process with all low level details laid down. It is an attempt to explain to new Linux users and for me to understand as well the complete process life covering both kernel and user-space aspects. The explanations and details are based on the Linux based operating systems only, other systems might follow different mechanisms which I might not be aware of or have no intention of discussing those systems.
The Birth Of a Process
A process is born when a program is executed. So let us back-track a little more and start from the program birth. The program is born when there is a need for the programmer. So now I have a need to write a program so that I could create a process out of it and start explaining what happens along the way. The following is the sample code used for explaining various concepts in the rest of the article.
#include <stdio.h> #include <math.h> int main() { float d; d = cos(20); printf("%f\n", d); }
The source is trivial, which just finds the cosine of a number and prints it. Now compiling the program should give us an executable from which we will start the journey of tracking the process down.
# cc sample-source.c -lm
which should give us a.out. Note that we are linking with the math library. Now executing this should create a process, on which our study will be based on.
The Program and the Shell
# ./a.out
When typing ./a.out
in the shell, the shell first creates a process of its own using the fork()
system call. This fork()
system call will create a new process. This new process will overlay itself with the executable image given through the execv()
set of system calls. We will go more into each of these system calls in the coming sections. Roughly what the shell will do is in the following listing.
#include <unistd.h> #include <stdio.h> int shell_exec (char *command) { pid_t pid; pid = fork(); if (pid == 0) { execlp(command, command, NULL); } /* parent process continues running */ return 0; } int main (int count, char **command) { if (count < 2) printf("Need a command to execute\n"); return shell_exec(command[1]); }
The above code is a major simplification of what the shell does, which handles pipes, permissions, job control and more.
The fork system call
As we know system calls take us from the user-land to the kernel-land. As mentioned earlier this article will describe even the very obvious details, so pardon the gory details. The different things that happen in the kernel during the start up of a process is what we will discuss in this section.
As seen in the last code listing, the shell does a fork()
and calls exec family of system calls to overlay the command image onto the newly created child process’ address space. Once the fork()
system call is called, the kernel creates a copy of the executing process, during which the following happens:
fork()
creates a new stack, and copies shared resources such as open file descriptors.- the kernel checks for resource limit of the calling process. The resource limits, like if the number of process created has exceeded the system set limit for a user (ulimit)
- resets the process statistics such as execution times
- The process is given a new process ID and starts executing the newly created process.
In this context there is a copy-on-write policy. Ideally the child process and the parent process (which had called fork()
) should have different data areas. But Linux for efficiency does not create a new data area for the child, but uses the same area of the parent’s until one of the processes start writing to it. Since this paper is not a kernel commentary, I have intentionally left out some functions that are called internally by the kernel. Please see the bibliography for further reading.
Fork returns twice, once in the parent with return value of the child process PID and once in the child with a return value of zero.
The newly created process is uniquely identified by the process ID (PID). This process belongs to the same process group as the parent. The group ID is is used for job control in shells. There is also another kind of ID called the session ID. All processes in the same group will, generally be in the same session ID unless the process calls setsid()
system call. The current process ID and its parent process ID can be found using the ps
command.
ps -e $ ps -f UID PID PPID C STIME TTY TIME CMD santosh 3939 15592 0 Apr17 pts/3 00:00:03 bash santosh 25841 3939 0 07:17 pts/3 00:00:00 ps -f
The parent process of all commands executed in a shell is the shell itself. So far, our to be process, the code written above, has not yet come into our big picture.
Popular Posts:
- None Found