File Filters & Text Processing
Objective
Process and transform text data using the Unix filter toolkit — extract fields, sort, count, and search through files and streams.
Tools & Technologies
headtailcutsorttrwcgrepuniq
Key Commands
head -20 filetail -f /var/log/syslogcut -d: -f1 /etc/passwdsort -u list.txtwc -l fileArchitecture Overview
flowchart LR
FILE[Input File] --> HEAD[head -n\nFirst N lines]
FILE --> TAIL[tail -n\nLast N lines]
FILE --> CUT[cut -d -f\nSelect fields]
FILE --> SORT[sort\nOrder lines]
FILE --> GREP[grep pattern\nFilter lines]
GREP --> UNIQ[uniq -c\nCount duplicates]
SORT --> UNIQ
style FILE fill:#1a1a2e,stroke:#00d4ff,color:#e0e0e0
Step-by-Step Process
01
head & tail
Extract the first or last N lines. tail -f follows a file in real-time — essential for log monitoring.
head -5 /etc/passwd # first 5 lines
tail -20 /var/log/auth.log # last 20 lines
tail -f /var/log/syslog # live follow
tail -f /var/log/nginx/access.log | grep '404'
02
cut — Extract Fields
Cut out selected columns from each line using a delimiter.
cut -d: -f1 /etc/passwd # usernames
cut -d: -f1,3 /etc/passwd # user + UID
cut -c1-10 file.txt # first 10 chars
ps aux | cut -c1-80 # trim wide output
03
sort & uniq
Sort lines alphabetically or numerically, then use uniq to collapse/count duplicates.
sort names.txt # alphabetical
sort -n numbers.txt # numerical
sort -rn scores.txt # reverse numeric
sort -u list.txt # unique lines only
sort log.txt | uniq -c | sort -rn # frequency count
04
tr & wc
tr translates or deletes characters. wc counts lines, words, and bytes.
tr 'a-z' 'A-Z' < input.txt # to uppercase
tr -d '\r' < dos.txt > unix.txt # remove CR
tr -s ' ' < file # squeeze spaces
wc -l file.txt # line count
wc -w essay.txt # word count
ls | wc -l # count files
Challenges & Solutions
- tail -f does not work on rotated logs — use journalctl -f instead
- sort must come before uniq for uniq to work correctly
Key Takeaways
- Combining head/tail/grep/cut/sort/uniq builds powerful text analysis pipelines
- wc -l is the quickest way to count anything countable