Unix based operating systems like Linux offer a unique approach to join two commands on the terminal, with it you can take the output of the first command and use it as input of the second command, this is the concept of pipe or | . Pipes allow two separate process to communicate with each other also if they were not created to do it, so this open an infinite series of opportunity.
A basic example is:
ls -l | grep rwxrwxrwx |
This command will print the list of all the files in the local directory that have permission rwxrwxrwx (or that have rwxrwxrwx in their name).
The way this works is that when the shell sees the pipe symbol, it creates a temporary file on the hard disk. Although it does not have a name or directory entry, it takes up physical space on the hard disk. Because both the terminal and the pipe are seen as files from the perspective of the operating system, all we are saying is that the system should use different files instead of standard input and standard output, the pipe.
So in pipes the information in the output is actually refined by every command then passed on to the subsequent command. Such things happen because of a couple of things:
1. Most UNIX commands get input coming from stdin as well as pass output through stdout
2. The UNIX pipe connect the stdout from the first command. to the stdin of the 2nd command, and so on if you have multiple pipes.
Other examples of pipes are:
ps -ef |grep http|wc -l |
With this command you ask the list of all processes, then you filter for the one that contains the string “http” and at last you got the number of lines.
So in short you count the number of http processes running.
diff < (cd dir1 && find | sort) <(cd dir2 && find | sort) |
This is useful to compare two directory trees.
It uses Bash’s “process substitution” feature to compare (using diff) the output of two different process pipelines.
tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz |
What happens here is we tell tar to create “-c” an archive of all files in current dir “.” (recursively) and output the data to stdout “-f -“. Next we specify the size “-s” to pv of all files in current dir. The “du -sb . | awk ‘{print $1}'” returns number of bytes in current dir, and it gets fed as “-s” parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way “pv” knows how much data is still left to be processed.
Thanks to commandlinefu
These are all example of Unnamed or anonymous pipe.
The pipe exists only inside the kernel and cannot be accessed by processes that created it, in these examples, the bash shell.
The other type of pipe is the Named Pipe
Named Pipe
A named pipe (also known as a FIFO for its behavior), is system-persistent and exists beyond the life of the process that use it and must be deleted once it is no longer being used. Processes generally attach to the named pipe (usually appearing as a file) to perform inter-process communication (IPC).
A named pipe it’s a real file in the filesystem with a particular permission, p as first letter when you list the files with ls -l
ls -l mypipe prw-r--r-- 1 linuxaria linuxaria 0 2011-09-25 21:21 mypipe |
The named pipe acts as a unnamed pipe so you put something in a file on one side and it get out from the other. Hence the name FIFO, or First-In-First-Out: the first thing you put in the pipe is the first to leave.
If you start a process and it writes to a named pipe, the process will not terminate until the information written is read from the pipe. If you start a process of reading from the pipe, the process will wait for something to read before terminating . The size of the pipe is always zero — it does not store data, it just links two processes like the shell | . However, since this pipe has a name, the two processes do not have to be on the command line, or even be run by the same user.
To create a named pipe you must use the command mkfifo
syntax:
mkfifo filename
mkfifo mypipe |
Once you create the named pipe you can use it to share the information among 2 processes, such as:
mkfifo my_pipe cat file > my_pipe gzip -9 -c < my_pipe > out.gz |
In this example you read a file with cat
and you tell to cat
to send the output to the named pipe, if you give this command you’ll see that the shell will go in hang, waiting for the named pipe to be emptied, this is done with gzip
, you can run this in another terminal, that reads the information from the named pipe, compress them and puts the result in the file out.gz
Another example of named pipe:
mkfifo my_pipe
script -f my_pipe
cat my_pipe
This is useful if you want to share your terminal session with someone else connected to the Linux server you are working on.
Basically you send all your terminal output to the pipe, thanks to script
, while the other user can look at what you are doing simply with a cat
.
Once used a pipe can be deleted like any file with the rm
command.
Pipe Capacity
A pipe has a limited capacity. If the pipe is full, then a write will block or fail, depending on whether the pipe was opened in non-blocking mode. Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on x86). Since Linux 2.6.11, the pipe capacity is 65536 bytes.
With recent kernels (>= 2.6.35), you can change the size of a pipe with
fcntl(fd, F_SETPIPE_SZ, size) |
where size is a long. The maximum size is in /proc/sys/fs/pipe-max-size.
References
Popular Posts:
- None Found
[…] has published an excellent lesson about Linux pipes. Unix based operating systems like Linux offer a unique approach to join two commands on the […]
ps -ef |grep http|wc -l
is a useless use of wc, a better way would be:
ps -ef |grep -c http
which achieves the same thing with one less process call.
similarly, in
diff < (cd dir1 && find | sort) <(cd dir2 && find | sort)
“cd dir1 && find” can be rewritten “find dir1”.
(Note: to compare directory _contents_, you can use diff -R)
I highly recommend http://mywiki.wooledge.org/BashGuide/InputAndOutput#Pipes for more detailed info
Thanks for the link.
In the particular case you might replace
awk '{print $1}'
with
cut -f1
into
tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
Thanks for the tip.
beautiful howto! thanks
Pol