L20. Incident Response on Linux: First Steps When Something Goes Wrong
Video generating
Check back soon for the video lesson on Incident Response on Linux: First Steps When Something Goes Wrong
Learn to recognize signs of compromise on a Linux system, collect volatile data before it disappears, check for unauthorized users and persistence mechanisms, preserve evidence properly, and decide when to isolate versus investigate.
Recognizing the Signs of Compromise
Before you can respond to an incident, you need to recognize that something is wrong. Compromised Linux systems often show one or more of these warning signs:
- Unexpected processes consuming CPU or memory
- Unfamiliar network connections to external IP addresses
- Modified system files (especially binaries in /usr/bin, /usr/sbin)
- New user accounts or SSH keys you did not create
- Unusual cron jobs or systemd timers
- Log gaps where entries were deleted or logging was stopped
- Files with recent modification timestamps in directories that rarely change
Not every anomaly is a breach, but each one deserves investigation. The cost of checking and finding nothing is far lower than the cost of missing a real compromise.
Initial Triage: What to Do First
When you suspect a compromise, the order of your actions matters. Volatile data (running processes, active connections, logged-in users) disappears the moment you reboot or the attacker notices your investigation. Start by collecting this data before making any changes.
The Golden Rule
Do not reboot the system. Rebooting destroys volatile evidence in memory, clears temporary files, and may trigger attacker-installed persistence mechanisms that modify their footprint.Step 1: Document the Current Time
# Record the exact time you started the investigation
date -u
uptimeThis establishes a timeline anchor for everything you discover afterward.
Collecting Volatile Data
Work through these commands systematically. Copy the output to a file on a separate system if possible:
Running Processes
# Full process listing with command arguments
ps auxww# Process tree showing parent-child relationships
ps auxwwf
# Look for suspicious processes
# Check for: unfamiliar names, processes running as root that should not be, high CPU usage
ps auxww | sort -rk 3 | head -20 # top CPU consumers
Network Connections
# All connections with process info
sudo ss -tnpa# Listening ports (what services are exposed?)
sudo ss -tlnp
# Established connections (who is connected?)
sudo ss -tnp state established
Logged-In Users
# Who is logged in right now?
who# Recent login history
last -20
# Failed login attempts
sudo lastb -20
# Currently active SSH sessions
sudo ss -tnp | grep ':22'
Open Files and Network Sockets
# Files opened by a suspicious process
sudo lsof -p <PID># All network connections with associated processes
sudo lsof -i -nP
Checking for Unauthorized Access
Unauthorized User Accounts
# List all user accounts with shells (potential interactive users)
grep -v '/nologin\|/false' /etc/passwd# Check for accounts with UID 0 (root-equivalent)
awk -F: '$3 == 0 {print $1}' /etc/passwd
# Recently modified user files
ls -la /etc/passwd /etc/shadow /etc/group
stat /etc/passwd
Unauthorized SSH Keys
# Check every user's authorized_keys file
for dir in /home/*; do
echo "=== $dir ==="
cat "$dir/.ssh/authorized_keys" 2>/dev/null
done# Check root's authorized keys
cat /root/.ssh/authorized_keys 2>/dev/null
Suspicious Cron Jobs
Cron jobs are a favorite persistence mechanism for attackers because they survive reboots and run on a schedule:
# List cron jobs for all users
for user in $(cut -f1 -d: /etc/passwd); do
echo "=== $user ==="
crontab -l -u "$user" 2>/dev/null
done# Check system-wide cron directories
ls -la /etc/cron.d/
ls -la /etc/cron.daily/
ls -la /etc/cron.hourly/
# Check systemd timers
systemctl list-timers --all
Modified System Binaries
# Debian/Ubuntu: verify installed package file integrity
sudo debsums -c 2>/dev/null | head -20# RHEL/Fedora: verify package files
sudo rpm -Va | head -20
# Check common binaries for unexpected modification times
ls -la /usr/bin/ssh /usr/bin/curl /usr/bin/wget /usr/sbin/sshd
Preserving Evidence
If this is a real incident, preserving evidence properly is critical. Poor evidence handling can make forensic analysis impossible and may affect any legal proceedings.
Saving Volatile Data
# Create an evidence directory (ideally on an external/mounted drive)
mkdir -p /mnt/evidence/$(hostname)_$(date +%Y%m%d)# Dump process listing
ps auxwwf > /mnt/evidence/$(hostname)_$(date +%Y%m%d)/processes.txt
# Dump network connections
ss -tnpa > /mnt/evidence/$(hostname)_$(date +%Y%m%d)/connections.txt
# Dump logged-in users and login history
who > /mnt/evidence/$(hostname)_$(date +%Y%m%d)/who.txt
last > /mnt/evidence/$(hostname)_$(date +%Y%m%d)/last.txt
# Copy relevant log files
cp /var/log/auth.log /mnt/evidence/$(hostname)_$(date +%Y%m%d)/
cp /var/log/syslog /mnt/evidence/$(hostname)_$(date +%Y%m%d)/
Hashing Evidence Files
# Create checksums of all evidence files for integrity verification
cd /mnt/evidence/$(hostname)_$(date +%Y%m%d)
sha256sum * > checksums.sha256
Isolate vs Investigate: Making the Decision
One of the hardest decisions during incident response is whether to isolate the system immediately or continue investigating while it runs.
| Approach | When to Use | Trade-off |
|---|---|---|
| Isolate immediately | Active data exfiltration, ransomware spreading, attacker is logged in | Stops the damage but may alert the attacker and lose volatile data |
| Investigate first | Suspected compromise without active damage, need to understand scope | Preserves evidence but the attacker may continue operating |
| Isolate network, keep running | Best middle ground for most incidents | Prevents lateral movement while preserving memory and processes |
Network Isolation Without Shutdown
# Drop all traffic except your investigation session
sudo iptables -I INPUT -s <your-ip> -j ACCEPT
sudo iptables -I OUTPUT -d <your-ip> -j ACCEPT
sudo iptables -A INPUT -j DROP
sudo iptables -A OUTPUT -j DROPThis keeps the system running (preserving volatile evidence) while preventing the attacker from communicating with command-and-control servers or moving laterally to other systems.
Basic Timeline Reconstruction from Logs
Once you have collected volatile data and secured the system, start building a timeline of what happened:
# Authentication events
sudo grep -E 'Accepted|Failed|session opened|session closed' /var/log/auth.log | tail -50# Sudo usage
sudo grep 'sudo:' /var/log/auth.log | tail -20
# Recent file modifications in sensitive directories
sudo find /etc -mtime -7 -type f -ls
sudo find /usr/bin -mtime -7 -type f -ls
sudo find /tmp -mtime -3 -type f -ls
# Journal entries around a specific time
sudo journalctl --since "2026-06-19 02:00" --until "2026-06-19 04:00"
Building the Narrative
As you review logs and evidence, build a timeline document that answers:
- When did the suspicious activity start? (earliest log entry)
- How did the attacker gain access? (SSH brute force, exploited service, stolen credentials)
- What did they do? (new accounts, installed tools, accessed data)
- What is still running? (persistence mechanisms, backdoors)
- What else might be affected? (lateral movement to other systems)
This timeline becomes the foundation of your incident report and guides the remediation steps that follow.
- ✓Never reboot a compromised system: volatile data (processes, connections, memory) is destroyed on restart
- ✓Collect volatile data first: ps, ss, who, last, lsof capture evidence that disappears quickly
- ✓Check for persistence: unauthorized SSH keys, cron jobs, systemd timers, and modified system binaries
- ✓Preserve evidence with checksums: hash all evidence files with sha256sum for integrity verification
- ✓Network isolation without shutdown preserves evidence while stopping lateral movement and C2 communication
1. Why should you avoid rebooting a system you suspect has been compromised?
2. Which of the following is the best approach when you discover an active compromise but no data is currently being exfiltrated?
3. An attacker wants their access to survive a system reboot. Which persistence mechanism should you check for during incident response?