Shell Scripting Tutorial – Part 3

GNU

3.1. Interactive editing

What is sed?

A Stream Editor (sed) is used to perform basic transformations on text read from a file or a pipe. The result is sent to standard output. The syntax for the sed command has no output file specification, but results can be saved to a file using output redirection. The editor does not modify the original input.

What distinguishes sed from other editors, such as vi and ed, is its ability to filter text that it gets from a pipeline feed. You do not need to interact with the editor while it is running; that is why sed is sometimes called a batch editor. This feature allows use of editing commands in scripts, greatly easing repetitive editing tasks. When facing replacement of text in a large number of files, sed is a great help.

sed commands

The sed program can perform text pattern substitutions and deletions using regular expressions, like the ones used with the grep command.

The editing commands are similar to the ones used in the vi editor:

Sed editing commands
Command Result
a\ Append text below current line.
c\ Change text in the current line with new text.
d Delete text.
i\ Insert text above current line.
p Print text.
r Read a file.
s Search and replace text.
w Write to a file.

Apart from editing commands, you can give options to sed. An overview is in the table below:

Sed options
Option Effect

-e SCRIPT Add the commands in SCRIPT to the set of commands to be run while processing the input.
-f Add the commands contained in the file SCRIPT-FILE to the set of commands to be run while processing the input.
-n Silent mode.
-V Print version information and exit.

Printing lines containing a pattern
This is something you can do with grep, of course, but you can’t do a “find and replace” using that command. This is just to get you started.

Activity:
o Create a file with some content in it.
vi file.txt
This is the first line of my example
Here I have spelled wrongly as an exampel
I hope this exampel will make u understand better
So many exampels with mistakes, I don’t like this exampel
This example has got the correct spelling
Let this be the last line of my example
o Use sed now to search a pattern called erors.
sed ‘/exampel/p’ file.txt
o Remove repeated outputs using sed while searching for a pattern.
sed -n ‘/exampel/p’ file.txt

Exclude lines containing a pattern

Activity:
o Use sed now to exclude a pattern called erors.
sed -n ‘/exampel/d’ file.txt

Range of lines

Activity:
o Remove the lines from 2 to 4 from the file in the output.
sed ‘2,4d’ file.txt
o Print the lines starting from 3rd line until end.
sed ‘3,$d’ file.txt
o Print from one pattern until next pattern in a file.
sed -n ‘/I have/,/this/p’ file.txt

Find and replace

Activity:
o Find pattern erors and replace with errors.
sed ‘s/exampel/example/’ file.txt – only first occurrences in the line will be replaced.
o To replace all the patterns in all the lines.

sed ‘s/exampel/example/g’ file.txt
o To insert string at beginning of every line.
sed ‘s/^/> /’ file.txt
o To insert pattern at ending of every line.
sed ‘s/$/EOL/’ file.txt

3.2. Non interactive editing

Reading sed commands from a file
Multiple sed commands can be put in a file and executed using the -f option. When creating such a file, make sure that:
➢ No trailing white spaces exist at the end of lines.
➢ No quotes are used.
➢ When entering text to add or replace, all except the last line end in a backslash.
➢ AWK programming

Activity:
o Redirect the output of the sed command to the different file.
sed ‘s/exampel/example/’ file.txt > corrected.txt
rm file.txt

3.3. AWK programming

What is gawk?

Gawk is the GNU version of the commonly available UNIX awk program, another popular stream editor. Since the awk program is often just a link to gawk, we will refer to it as awk.

The basic function of awk is to search files for lines or other text units containing one or more patterns. When a line matches one of the patterns, special actions are performed on that line.

Programs in awk are different from programs in most other languages, because awk programs are “data-driven”: you describe the data you want to work with and then what to do when you find it. Most other languages are “procedural.”

You have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk programs are often refreshingly easy to read and write.

Gawk commands

When you run awk, you specify an awk program that tells awk what to do. The program consists of a series of rules. (It may also contain function definitions, loops, conditions and other programming constructs, advanced features that we will ignore for now.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern.

There are several ways to run awk. If the program is short, it is easiest to run it on the command line:
awk PROGRAM inputfile(s)

If multiple changes have to be made, possibly regularly and on multiple files, it is easier to put the awk commands in a script. This is read like this:

awk -f PROGRAM-FILE inputfile(s)

3.4. AWK print program

Printing selected fields

The print command in awk outputs selected data from the input file.

When awk reads a line of a file, it divides the line in fields based on the specified input field separator, FS, which is an awk variable. This variable is predefined to be one or more spaces or tabs.

The variables $1, $2, $3, …, $N hold the values of the first, second, third until the last field of an input line. The variable $0 (zero) holds the value of the entire line. This is depicted in the image below, where we see six colums in the output of the df command:

Activity:
o Understand the column positional numbers
df -h
o Use awk programming to print specified columns.
ls -l | awk ‘{ print $5 $9 }’

Formatting fields
Without formatting, using only the output separator, the output looks rather poor. Inserting a couple of tabs and a string to indicate what output this is will make it look a lot better:

Activity:
o Print the fields with formatted output.
ls -ldh * | grep -v total | awk ‘{ print “Size is ” $5 ” bytes for ” $9 }’
o To format the output from df command.
df -h | sort -rnk 5 | head -3 | awk ‘{ print “Partition ” $6 “\t: ” $5 ” full!” }’

Formatting characters for gawk
Sequence Meaning
\a Bell character
\n Newline character
\t Tab

The print command and regular expressions
A regular expression can be used as a pattern by enclosing it in slashes. The regular expression is then tested against the entire text of each record. The syntax is as follows:

awk ‘EXPRESSION { PROGRAM }’ file(s)

Activity:
o The following example displays only local disk device information, networked file systems are not shown:
df -h | awk ‘/dev\/hd/ { print $6 “\t: ” $5 }’
o Below another example where we search the /etc directory for files ending in “.conf” and starting with either “a” or “x”, using extended regular expressions:
ls -l | awk ‘/\<(a|x).*\.conf$/ { print $9 }’

Special patterns
In order to precede output with comments, use the BEGIN statement.
ls -l | awk ‘BEGIN { print “Files found:\n” } /\<[a|x].*\.conf$/ { print $9 }’
The END statement can be added for inserting text after the entire input is processed.
ls -l | awk ‘/\<[a|x].*\.conf$/ { print $9 } END { print “Can I do anything else for you?” }’

Gawk scripts
As commands tend to get a little longer, you might want to put them in a script, so they are reusable. An awk script contains awk statements defining patterns and actions.

As an illustration, we will build a report that displays our most loaded partitions.

Activity:
o Write AWK script.
cat diskrep.awk
BEGIN { print “*** WARNING WARNING WARNING ***” }
/ \<[8|9][0-9]% / { print “Partition ” $6 “\t: ” $5 ” full!” }
END { print “*** Give money for new disks URGENTLY! ***” }
o Use this script now for df command.
df -h | awk -f diskrep.awk

3.5. AWK variables

As awk is processing the input file, it uses several variables. Some are editable, some are read-only.

The input field separator

The field separator, which is either a single character or a regular expression, controls the way awk splits up an input record into fields. The input record is scanned for character sequences that match the separator definition; the fields themselves are the text between the matches.

The field separator is represented by the built-in variable FS. Note that this is something different from the IFS variable used by POSIX-compliant shells.

The value of the field separator variable can be changed in the awk program with the assignment operator =. Often the right time to do this is at the beginning of execution before any input has been processed, so that the very first record is read with the proper separator. To do this, use the special BEGIN pattern.

Activity:
o In the example below, we build a command that displays all the users on your system with a description:
awk ‘BEGIN {FS=”:”} { print $1 “\t” $5 }’ /etc/passwd

The default input field separator is one or more whitespaces or tabs.

The output field separator

Fields are normally separated by spaces in the output. This becomes apparent when you use the correct syntax for the print command, where arguments are separated by commas:

Activity:
o Let us create a file with content as below.
cat test
record1 data1
record2 data2
o Let us print output without field separator.
awk ‘{ print $1 $2}’ test
o Let us print output with field separator ,
awk ‘{ print $1, $2}’ test

The output record separator
The output from an entire print statement is called an output record. Each print command results in one output record, and then outputs a string called the output record separator, ORS. The default value for this variable is “\n”, a newline character. Thus, each print statement generates a separate line.

Activity:
o To change the way output fields and records are separated, assign new values to OFS and ORS:
awk ‘BEGIN {OFS=”;” ; ORS=”\n–>\n” }{ print $1,$2}’ test
o If the value of ORS does not contain a newline, the program’s output is run together on a single line.

The number of records
The built-in NR holds the number of records that are processed. It is incremented after reading a new input line. You can use it at the end to count the total number of records, or in each output record.

Activity:
o Write an awk script as below.
cat processed.awk
BEGIN { OFS=”-” ; ORS=”\n–> done\n” }
{ print “Record number ” NR “:\t” $1,$2 }
END { print “Number of records processed: ” NR }

o Let us run this on test file.
awk -f processed.awk test

User defined variables
Apart from the built-in variables, you can define your own. When awk encounters a reference to a variable which does not exist (which is not predefined), the variable is created and initialized to a null string. For all subsequent references, the value of the variable is whatever value was assigned last. Variables can be a string or a numeric value. Content of input fields can also be assigned to variables.

Values can be assigned directly using the = operator, or you can use the current value of the variable in combination with other operators.

Activity:
o Create a file with content in it as.
cat revenues
20021009 20021013 consultancy BigComp 2500
20021015 20021020 training EduComp 2000
20021112 20021123 appdev SmartComp 10000
20021204 20021215 training EduComp 5000
o Write an awk script now.
cat total.awk
{ total=total + $5 }
{ print “Send bill for ” $5 ” dollar to ” $4 }
END { print “———————————\nTotal revenue: ” total }
o Run the awk script now on this file.
awk -f total.awk revenues

To continue with Tutorial 4, Please click here

To start reading from Chapter 1, please click here

Shell Scripting Tutorial – Part 3

Submit a Comment Cancel reply