Text Processing and Manipulation - Command Line Kung Fu (2014)

Command Line Kung Fu (2014)

Text Processing and Manipulation

Strip out Comments and Blank Lines

$ grep -E -v "^#|^$" file

To strip out all the noise from a configuration file get rid of the comments and blank lines. These two regexes (regular expressions) do the trick. "^#" matches all lines that begin with a "#". "^$" matches all blank lines. The -E option to grep allows us to use regexes and the -v option inverts the matches.

[jason@www conf]$ grep -E -v '^#|^$' httpd.conf | head

ServerTokens OS

ServerRoot "/etc/httpd"

PidFile run/httpd.pid

Timeout 60

KeepAlive Off

MaxKeepAliveRequests 100

KeepAliveTimeout 15

<IfModule prefork.c>

StartServers 8

MinSpareServers 5

[jason@www conf]$

Use Vim to Edit Files over the Network

$ vim scp://remote-host//path/to/file

$ vim scp://remote-user@remote-host//path/to/file

If you want to edit a file with vim over SSH, you can let it do the heavy lifting of copying the file back and forth.

$ vim scp://linuxserver//home/jason/notes.txt

Display Output in a Table

$ alias ct='column -t'

$ command | ct

Use the column command to format text into multiple columns. By using the -t option, column will count the number of columns the input contains and create a table with that number of columns. This can really make the output of many command easier to read. I find myself using this so often that I created an alias for the command.

$ alias ct='column -t'

$ echo -e 'one two\nthree four'

one two

three four

$ echo -e 'one two\nthree four' | ct

one two

three four

$ mount -t ext4

/dev/vda2 on / type ext4 (rw)

/dev/vda1 on /boot type ext4 (rw)

$ mount -t ext4 | ct

/dev/vda2 on / type ext4 (rw)

/dev/vda1 on /boot type ext4 (rw)

$

Grab the Last Word on a Line of Output

$ awk '{print $NF}' file

$ cat file | awk '{print $NF}'

You can have awk print fields by using $FIELD_NUMBER notation. To print the first field use $1, to print the second use $2, etc. However, if you don't know the number of fields, or don't care to count them, use $NF which represents the total number of fields. Awk separates fields on spaces, but you can use the -F argument to change that behavior. Here is how to print all the shells that are in use on the system. Use a colon as the field separator and then print the last field.

$ awk -F: '{print $NF}' /etc/passwd | sort -u

If you want to display the shell for each user on the system you can do this.

$ awk -F: '{print $1,$NF}' /etc/passwd | sort | column -t

adm /sbin/nologin

apache /sbin/nologin

avahi-autoipd /sbin/nologin

bin /sbin/nologin

bobb /bin/bash

...

View Colorized Output with Less

$ ls --color=always | less -R

$ grep --color=always file | less -R

Some linux distributions create aliases for ls and grep with the --color=auto option. This causes colors to be used only when the output is going to a terminal. When you pipe the output from ls or grep the color codes aren't emitted. You can force color to always be displayed by ls or grep with --color=always. To have the less command display the raw control characters that create colors, use the -R option.

$ grep --color=always -i bob /etc/passwd | less -R

$ ls --color=always -l /etc | less -R

Preserve Color When Piping to Grep

$ ls -l --color=always | grep --color=never string

If you pipe colorized input into grep and grep is an alias with the --color=auto option, grep will discard the color from the input and highlight the string that was grepped for. In order to preserve the colorized input, force grep to not use colors with the --color=never option.

$ ls -l --color=always *mp3 | grep --color=never jazz

-rw-r--r--. 1 jason jason 21267371 Feb 16 11:12 jazz-album-1.mp3

Append Text to a File Using Sudo

$ echo text | sudo tee -a file

If you have ever tried to append text to a file using redirection following a "sudo echo" command, you quickly find this doesn't work. What happens is the echo statement is executed as root but the redirection occurs as yourself.

$ sudo echo "PRODUCTION Environment" >> /etc/motd

-bash: /etc/motd: Permission denied

Fortunately, use can use sudo in combination the tee command to append text to a file.

$ echo "PRODUCTION Environment" | sudo tee -a /etc/motd

PRODUCTION Environment

Change the Case of a String

$ tr [:upper:] [:lower:]

$ tr [:lower:] [:upper:]

When you need to change the case of a string, use the tr command. You can supply ranges to tr like "tr a-z A-Z" or use "tr [:lower:][:upper]".

$ ENVIRONMENT=PRODUCTION

$ DIRECTORY=$(echo $ENVIRONMENT | tr [:upper:] [:lower:])

$ echo $ENVIRONMENT | sudo tee -a /etc/motd

$ tail -1 /etc/motd

PRODUCTION

$ sudo mkdir /var/www/$DIRECTORY

$ sudo tar zxf wwwfiles.tgz -C /var/www/$DIRECTORY

Display Your Command Search Path in a Human Readable Format

$ echo $PATH | tr ':' '\n'

Reading a colon separated list of items isn't as easy for us humans as it is for computers. To substitute new lines for colons, use the tr command.

$ echo $PATH

/usr/bin:/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

$ echo $PATH | tr ':' '\n'

/usr/bin

/bin

/usr/local/bin

/bin

/usr/bin

/usr/local/sbin

/usr/sbin

/sbin

$

Create a Text File from the Command Line without Using an Editor

$ cat > file

<ctrl-d>

If you need to make a quick note and don't need a full blown text editor, you can simply use cat and redirect the output to a file. Press <ctrl-d> when you're finished to create the file.

$ cat > shopping.list

eggs

bacon

coffee

<ctrl-d>

$ cat shopping.list

eggs

bacon

coffee

$

Display a Block of Text between Two Strings

$ awk '/start-pattern/,/stop-pattern/' file.txt

$ command | awk '/start-pattern/,/stop-pattern/'

The grep command is great at extracting a single line of text. But what if you need to capture an entire block of text? Use awk and provide it a start and stop pattern. The pattern can simply be a string or even a regular expression.

$ sudo dmidecode | awk /Processor/,/Manuf/

Processor Information

Socket Designation: SOCKET 0

Type: Central Processor

Family: Core i5

Manufacturer: Intel

$ awk '/worker.c/,/^$/' httpd.conf

<IfModule worker.c>

StartServers 4

MaxClients 300

MinSpareThreads 25

MaxSpareThreads 75

ThreadsPerChild 25

MaxRequestsPerChild 0

</IfModule>

$

Delete a Block of Text between Two Strings

$ sed '/start-pattern/,/stop-pattern/d' file

$ command | sed '/start-pattern/,/stop-pattern/d' file

You can delete a block of text with the sed command by providing it a start and stop pattern and telling it to delete that entire range. The patterns can be strings or regular expressions. This example deletes the the first seven lines since "#" matches the first line and "^$" matches the seventh line.

$ cat ports.conf

# If you just change the port or add more ports here, you will likely also

# have to change the VirtualHost statement in

# /etc/apache2/sites-enabled/000-default

# This is also true if you have upgraded from before 2.2.9-3 (i.e. from

# Debian etch). See /usr/share/doc/apache2.2-common/NEWS.Debian.gz and

# README.Debian.gz

NameVirtualHost *:80

Listen 80

<IfModule mod_ssl.c>

# If you add NameVirtualHost *:443 here, you will also have to change

# the VirtualHost statement in /etc/apache2/sites-available/default-ssl

# to <VirtualHost *:443>

# Server Name Indication for SSL named virtual hosts is currently not

# supported by MSIE on Windows XP.

Listen 443

</IfModule>

<IfModule mod_gnutls.c>

Listen 443

</IfModule>

$ sed '/#/,/^$/d' ports.conf

NameVirtualHost *:80

Listen 80

<IfModule mod_ssl.c>

<IfModule mod_gnutls.c>

Listen 443

</IfModule>

$

Fix Common Typos with Aliases

$ alias typo='correct spelling'

If you find yourself repeatedly making the same typing mistake over and over, fix it with an alias.

$ grpe root /etc/passwd

bash: grpe: command not found

$ echo "alias grpe='grep'" >> ~/.bash_profile

$ . ~/.bash_profile

$ grpe root /etc/passwd

root:x:0:0:root:/root:/bin/bash

$

Sort the Body of Output While Leaving the Header on the First Line Intact

Add this function to your personal initialization files such as ~/.bash_profile:

body() {
IFS= read -r header
printf '%s\n' "$header"
"$@"
}

$ command | body sort

$ cat file | body sort

I find myself wanting to sort the output of commands that contain headers. After the sort is performed the header ends up sorted right along with the rest of the content. This function will keep the header line intact and allow sorting of the remaining lines of output. Here are some examples to illustrate the usage of this function.

$ df -h | sort -k 5

/dev/vda2 28G 3.2G 25G 12% /

tmpfs 504M 68K 504M 1% /dev/shm

/dev/vda1 485M 444M 17M 97% /boot

Filesystem Size Used Avail Use% Mounted on

$ df -h | body sort -k 5

Filesystem Size Used Avail Use% Mounted on

/dev/vda2 28G 3.2G 25G 12% /

tmpfs 504M 68K 504M 1% /dev/shm

/dev/vda1 485M 444M 17M 97% /boot

$ ps -eo pid,%cpu,cmd | head -1

PID %CPU CMD

$ ps -eo pid,%cpu,cmd | sort -nrk2 | head

675 12.5 mysqld

PID %CPU CMD

994 0.0 /usr/sbin/acpid

963 0.0 /usr/sbin/modem-manager

958 0.0 NetworkManager

946 0.0 dbus-daemon

934 0.0 /usr/sbin/fcoemon --syslog

931 0.0 [bnx2fc_thread/0]

930 0.0 [bnx2fc_l2_threa]

929 0.0 [bnx2fc]

$ ps -eo pid,%cpu,cmd | body sort -nrk2 | head

PID %CPU CMD

675 12.5 mysqld

994 0.0 /usr/sbin/acpid

963 0.0 /usr/sbin/modem-manager

958 0.0 NetworkManager

946 0.0 dbus-daemon

934 0.0 /usr/sbin/fcoemon --syslog

931 0.0 [bnx2fc_thread/0]

930 0.0 [bnx2fc_l2_threa]

929 0.0 [bnx2fc]

$

Remove a Character or set of Characters from a String or Line of Output

$ command | tr -d "X"

$ command | tr -d [SET]

$ cat file | tr -d "X"

$ cat file | tr -d [set]

The tr command is typically used to translate characters, but with the -d option it deletes characters. This example shows how to get rid of quotes.

$ cat cities.csv

1,"Chicago","USA","IL"

2,"Austin","USA","TX"

3,"Santa Cruz","USA","CA"

$ cat cities.csv | cut -d, -f2

"Chicago"

"Austin"

"Santa Cruz"

$ cat cities.csv | cut -d, -f2 | tr -d '"'

Chicago

Austin

Santa Cruz

$

You can also let tr delete a group of characters. This example removes all the vowels from the output.

$ cat cities.csv | cut -d, -f2 | tr -d [aeiou]

"Chcg"

"Astn"

"Snt Crz"

$

Count the Number of Occurrences of a String

$ uniq -c file

$ command | uniq -c

The uniq command omits adjacent duplicate lines from files. Since uniq doesn't examine an entire file or stream of input for unique lines, only unique adjacent lines, it is typically preceded by the sort command via a pipe. You can have the uniq command count the unique occurrences of a string by using the "-c" option. This comes in useful if you are trying to look through log files for occurrences of the same message, PID, status code, username, etc.

Let's find the all of the unique HTTP status codes in an apache web server log file named access.log. To do this, print out the ninth item in the log file with the awk command.

$ tail -1 access.log

18.19.20.21 - - [19/Apr/2014:19:51:20 -0400] "GET / HTTP/1.1" 200 7136 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36"

$ tail -1 access.log | awk '{print $9}'

200

$ awk '{print $9}' access.log | sort | uniq

200

301

302

404

$

Let's take it another step forward and count how many of each status code we have.

$ awk '{print $9}' access.log | sort | uniq -c | sort -nr

5641 200

207 301

86 404

18 302

2 304

$

Now let's see extract the status code and hour from the access.log file and count the unique occurrences of those combinations. Next, lets sort them by number of occurrences. This will show us the hours during which the website was most active.

$ cat access.log | awk '{print $9, $4}' | cut -c 1-4,18-19 | uniq -c | sort -n | tail

72 200 09

76 200 06

81 200 06

82 200 06

83 200 06

83 200 06

84 200 06

109 200 20

122 200 20

383 200 10

$