Sep 252011

Unix based operating systems like Linux offer a unique approach to join two commands on the terminal, with it you can take the output of the first command and use it as input of the second command, this is the concept of pipe or | . Pipes allow two separate process to communicate with each other also if they were not created to do it, so this open an infinite series of opportunity.

A basic example is:

ls -l | grep rwxrwxrwx

This command will print the list of all the files in the local directory that have permission rwxrwxrwx (or that have rwxrwxrwx in their name).

The way this works is that when the shell sees the pipe symbol, it creates a temporary file on the hard disk. Although it does not have a name or directory entry, it takes up physical space on the hard disk. Because both the terminal and the pipe are seen as files from the perspective of the operating system, all we are saying is that the system should use different files instead of standard input and standard output, the pipe.

So in pipes the information in the output is actually refined by every command then passed on to the subsequent command. Such things happen because of a couple of things:

1. Most UNIX commands get input coming from stdin as well as pass output through stdout
2. The UNIX pipe connect the stdout from the first command. to the stdin of the 2nd command, and so on if you have multiple pipes.

Other examples of pipes are:

ps -ef |grep http|wc -l

With this command you ask the list of all processes, then you filter for the one that contains the string “http” and at last you got the number of lines.
So in short you count the number of http processes running.

diff < (cd dir1 && find | sort) <(cd dir2 && find | sort)

This is useful to compare two directory trees.
It uses Bash’s “process substitution” feature to compare (using diff) the output of two different process pipelines.

tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz

What happens here is we tell tar to create “-c” an archive of all files in current dir “.” (recursively) and output the data to stdout “-f -“. Next we specify the size “-s” to pv of all files in current dir. The “du -sb . | awk ‘{print $1}'” returns number of bytes in current dir, and it gets fed as “-s” parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way “pv” knows how much data is still left to be processed.

Thanks to commandlinefu

These are all example of Unnamed or anonymous pipe.
The pipe exists only inside the kernel and cannot be accessed by processes that created it, in these examples, the bash shell.
The other type of pipe is the Named Pipe

Named Pipe

A named pipe (also known as a FIFO for its behavior), is system-persistent and exists beyond the life of the process that use it and must be deleted once it is no longer being used. Processes generally attach to the named pipe (usually appearing as a file) to perform inter-process communication (IPC).

A named pipe it’s a real file in the filesystem with a particular permission, p as first letter when you list the files with ls -l

ls -l mypipe
prw-r--r-- 1 linuxaria linuxaria 0 2011-09-25 21:21 mypipe

The named pipe acts as a unnamed pipe so you put something in a file on one side and it get out from the other. Hence the name FIFO, or First-In-First-Out: the first thing you put in the pipe is the first to leave.

If you start a process and it writes to a named pipe, the process will not terminate until the information written is read from the pipe. If you start a process of reading from the pipe, the process will wait for something to read before terminating . The size of the pipe is always zero — it does not store data, it just links two processes like the shell | . However, since this pipe has a name, the two processes do not have to be on the command line, or even be run by the same user.

To create a named pipe you must use the command mkfifo

mkfifo filename

mkfifo mypipe

Once you create the named pipe you can use it to share the information among 2 processes, such as:

mkfifo my_pipe
cat file > my_pipe
gzip -9 -c < my_pipe > out.gz

In this example you read a file with cat and you tell to cat to send the output to the named pipe, if you give this command you’ll see that the shell will go in hang, waiting for the named pipe to be emptied, this is done with gzip, you can run this in another terminal, that reads the information from the named pipe, compress them and puts the result in the file out.gz

Another example of named pipe:

mkfifo my_pipe
script -f my_pipe
cat my_pipe

This is useful if you want to share your terminal session with someone else connected to the Linux server you are working on.
Basically you send all your terminal output to the pipe, thanks to script, while the other user can look at what you are doing simply with a cat.

Once used a pipe can be deleted like any file with the rm command.

Pipe Capacity

A pipe has a limited capacity. If the pipe is full, then a write will block or fail, depending on whether the pipe was opened in non-blocking mode. Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.

In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on x86). Since Linux 2.6.11, the pipe capacity is 65536 bytes.

With recent kernels (>= 2.6.35), you can change the size of a pipe with

fcntl(fd, F_SETPIPE_SZ, size)

where size is a long. The maximum size is in /proc/sys/fs/pipe-max-size.


Named pipe on Wikipedia

Popular Posts:

Flattr this!

  6 Responses to “Pipes – what are they and Example of Use”

  1. […] has published an excellent lesson about Linux pipes. Unix based operating systems like Linux offer a unique approach to join two commands on the […]

  2. ps -ef |grep http|wc -l

    is a useless use of wc, a better way would be:

    ps -ef |grep -c http

    which achieves the same thing with one less process call.

    similarly, in

    diff < (cd dir1 && find | sort) <(cd dir2 && find | sort)

    “cd dir1 && find” can be rewritten “find dir1”.

    (Note: to compare directory _contents_, you can use diff -R)

    I highly recommend for more detailed info

  3. In the particular case you might replace
    awk '{print $1}'
    cut -f1
    tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz

  4. beautiful howto! thanks


 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>