Text Processing with Awk
Objective
Use awk to process structured text by fields, apply patterns, perform calculations, and generate reports from command-line data.
Tools & Technologies
awkgawkfield processingpatternsactions
Key Commands
awk '{print $1}' fileawk -F: '{print $1,$3}' /etc/passwdawk '/ERROR/ {count++} END {print count}' logawk '{sum+=$2} END {print sum}'Architecture Overview
flowchart TD
BEGIN[BEGIN block\nInitialize vars] --> READ[Read record]
READ --> MATCH{Pattern\nmatches?}
MATCH -->|yes| ACTION[Execute action]
MATCH -->|no| NEXT
ACTION --> NEXT[Next record]
NEXT -->|more records| READ
NEXT -->|EOF| END[END block\nFinal output]
style BEGIN fill:#1a1a2e,stroke:#ffd700,color:#ffd700
style END fill:#1a1a2e,stroke:#ffd700,color:#ffd700
style ACTION fill:#1a1a2e,stroke:#00d4ff,color:#e0e0e0
Step-by-Step Process
01
Fields and Records
awk splits each line into fields by whitespace (or custom delimiter). $0=whole line, $1=first field, NF=number of fields.
awk '{print $1}' file # first field
awk '{print $NF}' file # last field
awk '{print NR, $0}' file # line numbers
awk -F: '{print $1}' /etc/passwd # custom delimiter
awk -F, '{print $2}' data.csv
02
Patterns
Patterns filter which lines the action applies to. Regex, comparisons, or BEGIN/END.
awk '/ERROR/ {print}' log # lines matching ERROR
awk '$3 > 1000 {print}' data # field comparison
awk 'NR>=5 && NR<=10' file # line range
awk 'BEGIN {print "Header"} {print} END {print "Footer"}'
03
Calculations & Aggregation
awk maintains variables across lines — perfect for totals and counts.
# Sum a column
awk '{sum += $2} END {print "Total:", sum}' data
# Count pattern matches
awk '/FAIL/ {count++} END {print count}' log
# Average
awk '{sum+=$1; n++} END {print sum/n}' numbers
04
Practical Reports
Combine awk features to generate useful reports from system data.
# Top 10 most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head
# Disk usage by user
du -s /home/* | awk '{print $2, $1/1024 "MB"}'
# Process memory usage
ps aux | awk 'NR>1 {sum+=$6} END {print "Total RSS:", sum/1024 "MB"}'
Challenges & Solutions
- awk field separator must be set correctly for files with multiple spaces
- BEGIN/END blocks execute once regardless of input
Key Takeaways
- awk is a complete programming language — it has arrays, functions, and loops
- printf in awk gives precise output formatting