Tutorial: Building log-miner

A step-by-step guide to building a forensic log analysis tool. Stop grepping until your eyes bleed. Let's build something professional.

What We're Building

log-miner will be able to:

  • Analyze logs with filters (levels, time, services)
  • Extract patterns (IPs, emails)
  • Watch logs in real-time
  • Generate HTML reports
  • Stats mode for overviews

Part 1: Setting Up

We start with a simple log reader.

>
parseArger generate \
  --help-message "Analyze and filter log files like a boss" \
  --set-version "1.0.0" \
  --pos 'log-file "log file to analyze"' \
  --opt 'output-format "output format" --short o --default-value table --one-of table --one-of json --one-of yaml' \
  --flag 'verbose "verbose output"' \
  --output log-miner

Make it executable:

>
chmod +x log-miner

Now add the logic to the bottom of the file:

# Check if file exists
if [ ! -f "$_arg_log_file" ]; then
  die "Log file not found: $_arg_log_file" 1
fi

log "Reading log file: $_arg_log_file" 4

# Output based on format
case "$_arg_output_format" in
  json)
    echo "["
    first=true
    while IFS= read -r line; do
      if [ "$first" = true ]; then first=false; else echo ","; fi
      printf '  "%s"' "$(echo "$line" | sed 's/"/\"/g')"
    done < "$_arg_log_file"
    echo ""
    echo "]"
    ;;
  yaml)
    while IFS= read -r line; do
      printf '- "%s"
' "$(echo "$line" | sed 's/"/\"/g')"
    done < "$_arg_log_file"
    ;;
  table|*)
    echo "┌─────────────────────────────────────────────────────────────┐"
    echo "│ Log File: $_arg_log_file                                     │"
    echo "├─────────────────────────────────────────────────────────────┤"
    while IFS= read -r line; do
      printf "│ %-60s │
" "$line"
    done < "$_arg_log_file"
    echo "└─────────────────────────────────────────────────────────────┘"
    ;;
esac

Part 2: Adding Filters

Let's use parseArger parse to inject new options without touching our custom code.

>
parseArger parse log-miner -i \
  --opt 'filter-level "filter by log level" --one-of debug --one-of info --one-of warn --one-of error --one-of critical --repeat' \
  --opt 'filter-service "filter by service name"' \
  --opt 'filter-time "filter by time range (format: start..end)"' \
  --flag 'ignore-case "case insensitive matching"' \
  --flag 'invert-match "show non-matching lines"'

Update your logic to handle filters:

# Build grep pattern from filters
grep_pattern=""
grep_opts=""

if [ "$_arg_ignore_case" = "on" ]; then grep_opts="-i"; fi

if [ ${#_arg_filter_level[@]} -gt 0 ]; then
  level_pattern=$(IFS="|"; echo "${_arg_filter_level[*]}")
  grep_pattern="$level_pattern"
fi

if [ "$_arg_filter_service" != "" ]; then
  if [ "$grep_pattern" != "" ]; then
    grep_pattern="$grep_pattern.*$_arg_filter_service|$_arg_filter_service.*$grep_pattern"
  else
    grep_pattern="$_arg_filter_service"
  fi
fi

if [ "$grep_pattern" != "" ]; then
  if [ "$_arg_invert_match" = "on" ]; then
    grep_cmd="grep -vE $grep_opts '$grep_pattern'"
  else
    grep_cmd="grep -E $grep_opts '$grep_pattern'"
  fi
else
  grep_cmd="cat"
fi

# Execute with eval (carefully!)
eval "$grep_cmd '$_arg_log_file'" | while ...

Part 3: Multiple Sources & Patterns

We change --pos log-file to --pos log-files --repeat to accept multiple files.

>
parseArger parse log-miner -i \
  --pos 'log-files "log files to analyze (supports glob)" --repeat' \
  --opt 'extract-ip "extract IP addresses"' \
  --opt 'extract-email "extract email addresses"' \
  --opt 'extract-custom "custom regex pattern"' \
  --flag 'unique "show only unique values"' \
  --flag 'count "count occurrences"'

And add pattern extraction logic:

extract_patterns() {
  local input="$1"
  if [ "$_arg_extract_ip" != "" ]; then
    grep -oE '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' <<< "$input"
  fi
  # ... other extractions ...
}

for log_file in "${_arg_log_files[@]}"; do
  # Process each file
done

Part 4: Project Mode

The script is getting too big. Let's convert it to a project with subcommands: analyze, extract, stats, watch.

>
parseArger project log-miner \
  --description "Forensic log analysis tool" \
  --project-subcommand analyze \
  --project-subcommand extract \
  --project-subcommand stats \
  --project-subcommand watch

This creates a full directory structure. You can now configure each subcommand independently.

Configure Subcommands

>
parseArger parse bin/analyze -i \
  --pos 'files "log files" --repeat' \
  --opt 'filter-level "levels" --repeat' ...

parseArger parse bin/stats -i \
--pos 'files "log files" --repeat' \
  --opt 'group-by "level, service, hour"' ...

Part 5: HTML Reports

Generate a web interface for your tool. Yes, really.

>
parseArger html-form bin/analyze > analyze-form.html

Part 6: Completion & Docs

Make it professional.

Bash Completion

>
parseArger completely log-miner ./log-miner --subcmd-dir ./bin

Documentation

>
parseArger document \
  --file ./log-miner \
  --directory ./bin \
  --out documentation.md

Part 7: Final Logic

Implement the specific logic for each subcommand in `bin/`.

For stats:

# Count by level
if [ "$_arg_group_by" = "level" ]; then
  for level in debug info warn error critical; do
    count=$(grep -ic "$level" "${_arg_files[@]}" 2>/dev/null | awk '{s+=$1} END {print s}')
    printf "  %-10s: %d
" "$level" "${count:-0}"
  done
fi

For watch:

eval "tail -f '$_arg_file' | $filter"

TL;DR: The "I'm Lazy" Script

Too many steps? Just run this. It does literally everything we just talked about.

#!/bin/bash
# fast-forward.sh - Because life is too short for manual labor.

echo ">> INITIATING DOPAMINE_SAVING_PROTOCOL..."

# 1. Initialize Project
echo ">> [1/5] Scaffolding project structure (making it look professional)..."
parseArger project log-miner \
  --description "Forensic log analysis tool" \
  --project-subcommand analyze \
  --project-subcommand extract \
  --project-subcommand stats \
  --project-subcommand watch

cd log-miner || exit 1

# 2. Configure Subcommands & Inject Logic
echo ">> [2/5] Injecting logic (fixing your future bugs)..."

# --- Analyze ---
echo "   -> Configuring 'analyze'..."
parseArger parse bin/analyze -i \
  --pos 'files "log files" --repeat' \
  --opt 'filter-level "filter by log level" --one-of debug --one-of info --one-of warn --one-of error --one-of critical --repeat' \
  --opt 'filter-service "filter by service name"' \
  --flag 'ignore-case "case insensitive matching"' \
  --flag 'invert-match "show non-matching lines"'

cat >> bin/analyze << 'EOF'
grep_opts=""
[ "$_arg_ignore_case" = "on" ] && grep_opts="-i"
pattern=""

if [ ${#_arg_filter_level[@]} -gt 0 ]; then
  pattern=$(IFS="|"; echo "${_arg_filter_level[*]}")
fi

if [ -n "$_arg_filter_service" ]; then
  [ -n "$pattern" ] && pattern="$pattern.*$_arg_filter_service|$_arg_filter_service.*$pattern" || pattern="$_arg_filter_service"
fi

cmd="cat"
if [ -n "$pattern" ]; then
  flag="-E"
  [ "$_arg_invert_match" = "on" ] && flag="-vE"
  cmd="grep $flag $grep_opts '$pattern'"
fi

for f in "${_arg_files[@]}"; do
  echo "--- $f ---"
  eval "$cmd" "$f"
done
EOF

# --- Extract ---
echo "   -> Configuring 'extract'..."
parseArger parse bin/extract -i \
  --pos 'files "log files" --repeat' \
  --opt 'pattern "custom regex"' \
  --flag 'ip "extract IPs"' \
  --flag 'email "extract emails"'

cat >> bin/extract << 'EOF'
for f in "${_arg_files[@]}"; do
  if [ "$_arg_ip" = "on" ]; then grep -oE '[0-9]{1,3}(.[0-9]{1,3}){3}' "$f"; fi
  if [ "$_arg_email" = "on" ]; then grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}' "$f"; fi
  if [ -n "$_arg_pattern" ]; then grep -oE "$_arg_pattern" "$f"; fi
done
EOF

# --- Stats ---
echo "   -> Configuring 'stats'..."
parseArger parse bin/stats -i \
  --pos 'files "log files" --repeat' \
  --opt 'group-by "level"'

cat >> bin/stats << 'EOF'
if [ "$_arg_group_by" = "level" ]; then
  for level in debug info warn error critical; do
    count=$(grep -ic "$level" "${_arg_files[@]}" 2>/dev/null || echo 0)
    printf "  %-10s: %d\n" "$level" "$count"
  done
fi
EOF

# --- Watch ---
echo "   -> Configuring 'watch'..."
parseArger parse bin/watch -i \
  --pos 'file "log file to watch"' \
  --opt 'filter "grep filter"'

cat >> bin/watch << 'EOF'
cmd="tail -f '$_arg_file'"
[ -n "$_arg_filter" ] && cmd="$cmd | grep --line-buffered '$_arg_filter'"
eval "$cmd"
EOF

# 3. Generate Bash Completion
echo ">> [3/5] Generating tab-completion (saving your keystrokes)..."
parseArger completely log-miner ./log-miner --subcmd-dir ./bin

# 4. Generate Documentation
echo ">> [4/5] Writing documentation (so you don't have to)..."
parseArger document --file ./log-miner --directory ./bin --out documentation.md

# 5. Generate HTML Form
echo ">> [5/5] Generating HTML forms (for the GUI weaklings)..."
parseArger html-form bin/analyze > analyze-form.html

echo ">> PARSEARGER_PROTOCOL: COMPLETE. SYSTEM_READY."
echo ">> You have saved approximately 4 hours of life. Go touch grass."

Wrapping Up

You now have a tool with:

  • Multiple subcommands
  • Argument validation
  • Bash completion
  • Auto-generated docs
  • HTML forms

And you didn't have to write a single line of argument parsing code. You're welcome.