|
|||
Department of Engineering | |
University of Cambridge > Engineering Department > computing help |
<space>
).
When awk reads in a line, the first field can be referred to as
`$1
', the
second `$2
' etc. The whole line is `$0
'.
A short awk program can be written on the command line. eg
cat file | awk '{print NF,$0}'which prepends each line with the Number of Fields (ie, words) on the line. The quotes are necessary because otherwise the shell would interpret special characters like `
$
' before awk had a chance to read
them.
Longer programs are best put into files.
Two examples in /export/Examples/Korn_shell
(wordcount
and awker
) should give CUED users a start (the
awk manual
page has more examples).
Once you have copied over wordcount and
text, do
wordcount textyou will get a list of words in text and their frequency. Here is wordcount
awk ' {for (i = 1; i<=NF ; i++) num[$i]++ } END {for (word in num) print word, num[word] } ' $*
The syntax is similar to that of C. awk lines take form
<pattern> { <action> }Each input line is matched against each awk line in turn. If, as here in wordcount, there is no target pattern on the awk line then all input lines will be matched. If there is a match but no action, the default action is to print the whole line.
Thus, the for loop is done for every line in the input. Each word in
the line (NF is the number of words on the line) is used as an index
into an array whose element is incremented
with each instance of that word. The ability to have strings as array
`subscripts' is very useful.
END
is a special pattern, matched by the end of the input file. Here its
matching action is to run a different
sort of for loop that prints out the words and their frequencies. The
variable word takes successively the value of the string
`subscripts' of the array num.
Example 2 introduces some more concepts.
Copy
/export/Examples/Korn_shell/data
(shown below)
NAME AMOUNT STATUS Tom 1.35 paid Dick 3.87 Unpaid Harry 56.00 Unpaid Tom 36.03 unpaid Harry 22.60 unpaid Tom 8.15 paid Tom 11.44 unpaidand
/export/Examples/Korn_shell/awker
if you haven't done so already. Here
is the text of awker
awk ' $3 ~ /^[uU]npaid$/ {total[$1] += $2; owing=1} END { if (owing) for (i in total) print i, "owes", total[i] > "invoice" else print "No one owes anything" > "invoice" } ' $*
Typing
awker datawill add up how much each person still owes and put the answer in a file called invoice. In awker the 3rd field is matched against a regular expression (to find out more about these, type man 5 regexp ). Note that both 'Unpaid' and 'unpaid' will match, but nothing else. If there is a match then the action is performed. Note that awk copes intelligently with strings that represent numbers; explicit conversion is rarely necessary. The `total' array has indices which are the people's names. If anyone owes, then a variable `owing' is set to 1. At the end of the input, the amount each person owes is printed out.
Other awk facilities are:-
n = split(field,new_array,separator)
substr(string,first_pos,max_chars), index(string,substring)
==
, !=
, >
, >=
, <
, <=
, ~
(meaning ``contains''), !~
).
As you see, awk is almost a language in itself, and people used to C syntax can soon create useful scripts with it.