Objective

Process and transform text data using the Unix filter toolkit — extract fields, sort, count, and search through files and streams.

Tools & Technologies

  • head
  • tail
  • cut
  • sort
  • tr
  • wc
  • grep
  • uniq

Key Commands

head -20 file
tail -f /var/log/syslog
cut -d: -f1 /etc/passwd
sort -u list.txt
wc -l file

Architecture Overview

flowchart LR FILE[Input File] --> HEAD[head -n\nFirst N lines] FILE --> TAIL[tail -n\nLast N lines] FILE --> CUT[cut -d -f\nSelect fields] FILE --> SORT[sort\nOrder lines] FILE --> GREP[grep pattern\nFilter lines] GREP --> UNIQ[uniq -c\nCount duplicates] SORT --> UNIQ style FILE fill:#1a1a2e,stroke:#00d4ff,color:#e0e0e0

Step-by-Step Process

01
head & tail

Extract the first or last N lines. tail -f follows a file in real-time — essential for log monitoring.

head -5 /etc/passwd        # first 5 lines
tail -20 /var/log/auth.log  # last 20 lines
tail -f /var/log/syslog     # live follow
tail -f /var/log/nginx/access.log | grep '404'
02
cut — Extract Fields

Cut out selected columns from each line using a delimiter.

cut -d: -f1 /etc/passwd    # usernames
cut -d: -f1,3 /etc/passwd  # user + UID
cut -c1-10 file.txt        # first 10 chars
ps aux | cut -c1-80        # trim wide output
03
sort & uniq

Sort lines alphabetically or numerically, then use uniq to collapse/count duplicates.

sort names.txt             # alphabetical
sort -n numbers.txt        # numerical
sort -rn scores.txt        # reverse numeric
sort -u list.txt           # unique lines only

sort log.txt | uniq -c | sort -rn  # frequency count
04
tr & wc

tr translates or deletes characters. wc counts lines, words, and bytes.

tr 'a-z' 'A-Z' < input.txt   # to uppercase
tr -d '\r' < dos.txt > unix.txt # remove CR
tr -s ' ' < file                 # squeeze spaces

wc -l file.txt    # line count
wc -w essay.txt   # word count
ls | wc -l        # count files

Challenges & Solutions

  • tail -f does not work on rotated logs — use journalctl -f instead
  • sort must come before uniq for uniq to work correctly

Key Takeaways

  • Combining head/tail/grep/cut/sort/uniq builds powerful text analysis pipelines
  • wc -l is the quickest way to count anything countable