Piping and Redirection - PluggedIn IT Ministry

Capital City Christian Church
PluggedIn IT Ministry

Introduction

So far we have been looking at individual tools and techniques, and there are many more to come. But here we begin to look into combining individual tools to get more precise or specific results. Remember the that one of the Unix principals is that every tool should do one thing, but do it the best.

That brings us to the tinker-toy principal. Each individual tool or technique can be thought of as a specific tinker-toy. They come is lots of specific and somewhat simple shapes, but their power comes from being able to be combined any number of ways to create solutions of all sorts. You can use a type of tinker-toy over and over, or just once. You can plug them together is different orders. It is limited to your imagination and innovation.

Piping and Redirection

We have touched on this before, but this topic is worth its own page.

Piping is the process of sending the output of one command into another command as its input. This is done with the pipe symbol (|).

You can create a chain of commands using multiple pipes. In each case, the command on the left of the pipe sends its output to the command on the left of the pipe making it the input for the command on the right.

$ cmd1 | cmd2 | cmd3 | cmd4

Examples

For each of these examples, the pipe symbol (|) is used to pipe the output of the program on the left over to become input for the program on the right. You can change any number of programs together to build a solution to a problem.

It would help illustrate each example if you ran each on your one system, but ran them progressively. That is, if the example is 'cmd1 | cmd2 | cmd3', then first run 'cmd1' alone and see what the output is. Then run 'cmd1 | cmd2' to see how the output is altered. Then add the next command and run 'cmd1 | cmd2 | cmd3' to see the progression stepping through this chain of piped commands, and how the results are refined by each command.

In this case we first create a listing of the files and directories in the current director. The -l parameter results in the 'long output format'.

That material is then piped, or fed intto the 'wc', 'word count', command which reports the number of lines, words, characters.

$ ls -l | wc
     53     470    3224

We can be more specific and search only for files with names that end with 'txt' and pipe those results over to the wc utility.

$ ls *txt -l | wc
      9      81     638

Now lets use the grep command. This is an extremely powerful tool you should become familiar with. It reports sub-strings based on exact match or regex patterns. We will begin to touch on grep and regex here, but will cover those topics in more depth in upcoming sessions.

Here the contents of a file, arp-scan.txt, are fed to the grep program which will select the lines that contain the string 'Cisco'. This helps identify the switches on the network. Notice that the search term 'Cisco' is capitalized. If you try this with a lowercase 'C' you will get no returns. So, with be intentional with the letter cases you provide, or include the '-i' parameter to make the search case-insensitive.

$ cat arp-scan.txt | grep 'Cisco'
or
$ cat arp-scan.txt | grep -i 'cisco'

Here we do the same thing, except we have grep display only the lines that contain 'Visio' which identifies most of the TVs on the network.

$ cat arp-scan.txt | grep 'Vizio'

The following command will first gather a list of all the files and directories in the current directory, and then we pipe that material to grep which selects only the lines that begin with the letter 'd', indicating directories. Note that the carrot (^) symbol tells us that we are taking advantage of grep's support for 'regular expressions'. 'Regular Expressions' is a psydo-language that is supported by many of the tools and programs we use including Geany. Regular Expressions, or regex, is a shorthand for how to describe a pattern to search for rather than a specific string/keyword.

In the example below, the carrot (^) symbol is an 'anchor' that says/instructs our search to look for the letter lower-case 'd', but only if it is the first character on the line being searched.

$ ls -al | grep '^d'

Again, we generate a long format list of files and directories in the current directory, and this time we have grep report only the lines where the 4th character is an 'x', meaning the file is executable. The periods each represent a character placeholder, but we do not care what that character is. In this way, '^...x' means we only want lines where the 4th character is an 'x' regardless of the characters before or after it. Note that this will include sub-directories which are also executable.

$ ls ~/ -l | grep '^...x'

Next we add a second instance of grep to process the output from the previous example and from it report only the files whose names end in 'sh', meaning that they are shell script files. Notice that rather than construct a complex regex to describe this pattern, we are simply filtering and refiltering omitting or including things until we get what we are after. Also notice that in the same way that the carrot (^) anchor means from the beginning of the line, the dollar sign ($) anchor means the end of the line. So'sh$' means lines that end with the letters 'sh'.

$ ls ~/ -l | grep '^...x' | grep 'sh$'

As you can see, grep is a VERY powerful program. But in the examples so far, it is grep's use of Regular Expressions that is making all of the difference. Be sure to review the page on Grep and on Regular Expressions for more information. These will be out shortly.

This next example illustrates how you can chain any number of commands together passing the output of the one on the left over to become input for the command on the right of the pipe symbol (|). Here we run the ps (report processes) command to list all current processes. That information is then passed to the grep command to pull out only the lines that contain the word 'systemd'. But that gives us a bit too much information. So for each line that grep selects, we use the awk program to pull out the 2nd and 8th columns only, which are the PID and Command respectively.

$ ps -ef | grep systemd | awk '{ print $2, $8 }'

This example is similar to the previous. We run a utility, but rather than its normal report and layout, we select only the 1th and 6th columns. But then, we sort and display the material we have at that point. Something new here is that rather than simple list the two columns to print and let them be displayed with the default 'space' seperating them, I have inserted '\t' between the two column identifiers ($1 and $6). In quotes the backslash indicates that the letter 't' represents a tab. This cosmetically spaces the displayed columns wider and less squished together.

$ df | awk '{ print $1, '\t', $6 }' | sort

For this next example, a little needs to be said about 'mounts'.

In windows you have A:, B:, C:, D:, E: drives and so on. Each represents a separate file system, likely on a separate disk media of some sort. Floppy disks, usb drives, hard drives, CD/DVDs...

In the *nix world it is different. Rather than a letter being assigned to each file system, each file system is 'mounted to', associated with, bound to a sub-directory. These are usually located in the /media and /mnt directories. When mounted, you can then change to that mount-point (sub-directory) and will see that file system and its files and be able to use it as you would any file system. When you are done, that mount point is 'unmounted' using the 'umount' command (note the spelling) at which point it goes back to being a sub-directory.

For this example, plug-in a usb drive of any sort and if it does not automatically mount, then manually mount it.

The mount command will report the file systems that have been mounted, but is often a very dense and hard to read report. Each line reported by mount is some sort of file system that is being used, often by the system itself, for some aspect of operation. Most mount points are not intended for people to bother with.

But since each line is a record, meaning it follows a format that is often comprised of variable length columns/fields we can manipulate the raw output into something more readable and useful.

$ mount | sort | awk '{ print $1, '\t', $3 }'

At this point we have a list of file system names and where their mount points are.

So, lets pull just the lines we want. First we want to see the mounts that use a sub-directory in the /dev/ directory. These are the ones that correlate to the 'block devices' and what not in the /dev directory.

$ mount | grep '/dev' |  sort | awk '{ print $1, '\t', $3 }'

Now lets be more specific and exclude the file systems that are not usually used by people. The ones we want are in the /media/ directory. This will report things like USB drives.

$ mount | grep '/media' |  sort | awk '{ print $1, '\t', $3 }'

One very powerful use of piping is to take the output of the first program and pipe it to the 'less' program. Very often a command will result is multiply screens of output. To make that output instantly readable and searchable, we pipe it into less and review it there. That is exactly what the 'man' command does when providing documentation, and for that matter the --help parameter for less opens its documentation in less itself.

Here we dump the 'dmesg' report into the less command for review. With less, use the spacebar and pgup/pgdn keys to scroll through the material. Type a foreword slash followed by a string and it will search for the string you provided. Subsequent forward slashes will repeat that search.

$ sudo dmesg | less

Another similiar example is to dump the results of the tree command, which can be dozens of screens long.

$ tree | less

In this example the strings tool first extracts printable alphanumeric material from each of the files in the /bin/ directory that is 24 characters or greater. This produces 1,308,864 lines of output. Too much to read through. So I send that output to the grep program to pull out only the lines that contain the word 'hack', which cuts the number of lines down to 129. Of those, the following strings of interest were found in these programs.

This might take quite awhile to run, so just give it some time. In the example below, there are 1303 lines found that contain the string 'hack'. Play around with the string you are looking for ('hacker', 'hacked', 'secret'...) and notice the number of times each string if found. When you want to actually look at the results, remove the ' | wc' portion and the actual lines will be dumped to the screen. But wait, we just saw something that might be helpful here! 'less'! So replace ' wc ' with ' less ' and page through the results at your leisure.

$ strings -f -n 12 /bin/* | grep -i 'hack' | wc
strings: '/bin/db_sql': No such file
strings: Warning: '/bin/X11' is a directory
   1303   10422 162928029

$ strings -f /bin/* | grep -i 'hack'

    /bin/mytop: print "\n" x 90; ## dumb hack for now. Anyone know how to
    /bin/rsync: attempt to hack rsync failed.
    /bin/rsync: Attempt to hack rsync thwarted!
    /bin/screen: Welcome to hacker's treasure zoo - Column %d Line %d(+%d) (%d,%d)
    /bin/tasksel:   # XXX FIXME ugly hack -- loop until enhances settle to handle
    /bin/upx:   this file has possibly been modified/hacked; take care!

Redirection

Redirection is the process of changing the default flow of data streams. The following operators are used in redirection.

In this example the contents of the 'arp-scan.txt' file are pulled into the wc program as input.

$ wc < arp-scan.txt

Here again, the contents of the 'arp-scan.txt' is pulled in as input for the wc program, but then the results of the wc program are redirected and written to the 'rpt.txt' file. If 'rpt.txt' already exists it will be over written. In order to append the new output to the existing rpt.txt file you would double the > symbol.

$ wc < arp-scan.txt > rpt.txt
or
$ wc < arp-scan.txt >> rpt.txt

There is also a program named 'tee' that can be used to the same effect but is much more flexible. 'Tee' specifically accepts input and displays in on screen, but also writes that same material to a file at the same time. This allows you to build a long chain of commands and take snapshots of the material as it moves through the chain.

Here we produce a directory listing in long format. But then we pipe that output to the 'tee' program with displays it on screen while also writing it to the 'filelist.txt' file.

$ ls -l | tee filelist.txt

Returning to the tree utility, here we use tee to capture the results to a file while also displaying them on-screen.

$ tree | tee treelist.txt

Try applying the techniques discussed throughout this page with these additional commands.

more with strings
ps aux
df
du
sudo dmesg
host google.com
file [filename(s)]
head [filename]
tail [filename]
md5sum [filename(s)
sort [filename]
uniq [filename]
sleep [number of seconds]
uname
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/partitions
cat /proc/version