Objective

Use awk to process structured text by fields, apply patterns, perform calculations, and generate reports from command-line data.

Tools & Technologies

  • awk
  • gawk
  • field processing
  • patterns
  • actions

Key Commands

awk '{print $1}' file
awk -F: '{print $1,$3}' /etc/passwd
awk '/ERROR/ {count++} END {print count}' log
awk '{sum+=$2} END {print sum}'

Architecture Overview

flowchart TD BEGIN[BEGIN block\nInitialize vars] --> READ[Read record] READ --> MATCH{Pattern\nmatches?} MATCH -->|yes| ACTION[Execute action] MATCH -->|no| NEXT ACTION --> NEXT[Next record] NEXT -->|more records| READ NEXT -->|EOF| END[END block\nFinal output] style BEGIN fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style END fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style ACTION fill:#1a1a2e,stroke:#00d4ff,color:#e0e0e0

Step-by-Step Process

01
Fields and Records

awk splits each line into fields by whitespace (or custom delimiter). $0=whole line, $1=first field, NF=number of fields.

awk '{print $1}' file          # first field
awk '{print $NF}' file         # last field
awk '{print NR, $0}' file       # line numbers
awk -F: '{print $1}' /etc/passwd  # custom delimiter
awk -F, '{print $2}' data.csv
02
Patterns

Patterns filter which lines the action applies to. Regex, comparisons, or BEGIN/END.

awk '/ERROR/ {print}' log        # lines matching ERROR
awk '$3 > 1000 {print}' data     # field comparison
awk 'NR>=5 && NR<=10' file       # line range
awk 'BEGIN {print "Header"} {print} END {print "Footer"}'
03
Calculations & Aggregation

awk maintains variables across lines — perfect for totals and counts.

# Sum a column
awk '{sum += $2} END {print "Total:", sum}' data

# Count pattern matches
awk '/FAIL/ {count++} END {print count}' log

# Average
awk '{sum+=$1; n++} END {print sum/n}' numbers
04
Practical Reports

Combine awk features to generate useful reports from system data.

# Top 10 most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head

# Disk usage by user
du -s /home/* | awk '{print $2, $1/1024 "MB"}'

# Process memory usage
ps aux | awk 'NR>1 {sum+=$6} END {print "Total RSS:", sum/1024 "MB"}'

Challenges & Solutions

  • awk field separator must be set correctly for files with multiple spaces
  • BEGIN/END blocks execute once regardless of input

Key Takeaways

  • awk is a complete programming language — it has arrays, functions, and loops
  • printf in awk gives precise output formatting