beginner-sysadmin

Linux Troubleshooting Commands: Beginner Sysadmin Cheat Sheet

Linux Troubleshooting Commands: Beginner Sysadmin Cheat Sheet

Linux troubleshooting gets much less scary when you stop trying random fixes and start gathering facts in the same order every time.

If you are a help desk tech, Windows admin, or beginner sysadmin, you do not need to memorize every Linux command on earth. You need a reliable first-pass checklist for the classic ticket: “the Linux server is slow,” “the app is down,” “SSH is weird,” or “something changed and now everyone is staring at you.”

Here is the useful beginner flow:

  1. Confirm where you are.
  2. Check disk, memory, and CPU.
  3. Check processes.
  4. Check services.
  5. Read logs.
  6. Check networking.
  7. Document what you found before changing anything.

That last step is not glamorous, but neither is explaining to a senior admin that you restarted three services and forgot what the original error was.

The quick Linux troubleshooting command list

Start with this cheat sheet when a Linux box feels broken:

pwd                         # where am I?
hostname                    # what machine is this?
uptime                      # how long has it been up, and load average?
df -h                       # disk space by filesystem
du -sh /path                # size of a folder
free -h                     # memory usage
ps aux                      # running processes
top                         # live CPU/memory view
systemctl status service    # service health
journalctl -u service       # service logs
ip addr                     # IP addresses
ping -c 4 host              # basic reachability
ss -tulpn                   # listening ports
curl -I URL                 # HTTP response headers

Do not run these like a keyboard smash. Run them with a question in mind. “Is the disk full?” “Is the service running?” “Is the port listening?” “Can this server reach the thing it depends on?”

That mindset matters more than memorizing flags.

Step 1: confirm the machine and your location

Before you troubleshoot anything, make sure you are on the right box and in the right directory.

hostname
pwd
whoami

hostname tells you which machine you are on. pwd prints your current directory. whoami tells you which user you are using.

This sounds painfully basic until you have three SSH tabs open and realize you nearly edited production while meaning to poke around in staging. Linux will not pop up a friendly “are you sure this is the right server?” dialog. It will simply let you be wrong with confidence.

For a little more context:

uname -a
cat /etc/os-release

uname -a shows kernel/system info. /etc/os-release tells you the distro, which helps when package commands, service names, or log locations differ between Ubuntu, Debian, Fedora, Rocky, and friends.

Step 2: check disk space before chasing ghosts

A full disk can make a server look haunted. Apps fail to write logs. Databases get grumpy. Deploys break. Users say “the site is down” because apparently they refuse to say “the root filesystem is at 100%.”

Start here:

df -h

df -h shows filesystem usage in human-readable units. Look for anything near 90-100%.

If a filesystem is full, find the large directories carefully:

sudo du -sh /* 2>/dev/null
sudo du -sh /var/* 2>/dev/null
sudo du -sh /var/log/* 2>/dev/null

Beginner mistake: deleting files before understanding what they are. Logs, caches, old backups, and temporary files are not all equal. If you are not sure, capture the evidence and ask before removing anything important.

A safer first move is often to identify the biggest offender:

sudo du -ah /var/log 2>/dev/null | sort -h | tail -20

That shows the largest entries under /var/log. If one log file is enormous, you have a clue. You may also have an app screaming the same error every second.

For a deeper walkthrough, the disk-space guide is here: how to find what is eating disk space on Linux.

Step 3: check memory and load

Next, see whether the server is short on memory or under heavy load.

free -h
uptime

free -h shows memory usage. Do not panic just because Linux uses memory for cache. That is normal. The useful signs are low available memory, swap getting hammered, or a process eating far more than expected.

uptime shows load average:

load average: 0.42, 0.78, 1.10

Those numbers are the average runnable workload over 1, 5, and 15 minutes. Interpreting load depends on CPU count, but as a beginner, the pattern matters:

  • High 1-minute load only: maybe a brief spike.
  • High 1-, 5-, and 15-minute load: something has been busy for a while.
  • Load climbing while users complain: pay attention.

To see CPU count:

nproc

A load of 4 on a 4-core system is different from a load of 4 on a tiny 1-core VM.

Step 4: inspect running processes

When a box is slow, use top and ps to see what is actually running.

top

Inside top, sort by CPU or memory depending on what looks bad. Press q to quit.

For a snapshot:

ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

These show the processes using the most CPU or memory.

If a process is stuck, do not jump straight to kill -9 because you saw it in a forum comment from 2008. Start with a graceful signal when appropriate:

kill PID

Only use the forceful version when you understand the consequence:

kill -9 PID

kill -9 does not let the process clean up. It is the “fine, we are done here” option. Sometimes needed. Not your first reflex.

There is a legacy process-killing guide on the site too: how to kill a process on Linux. Use this cheat sheet for the bigger troubleshooting flow; use that one when the exact job is stopping a runaway process.

Step 5: check the service status

Most beginner sysadmin tickets eventually include a service name: nginx, apache2, ssh, postgresql, docker, or some app-specific service.

Check the status:

systemctl status nginx

Replace nginx with the service you care about.

Look for:

  • active (running) — service is currently running.
  • failed — service failed and usually left a clue.
  • recent log lines — often shown right inside the status output.
  • the exact unit name — service names vary by distro and install method.

If you need logs for that service:

journalctl -u nginx --since "30 minutes ago"

If the service failed to start, add -xe for more context:

journalctl -xeu nginx

Do not restart the service just because you can. First read the status and logs. A restart can clear useful state, hide the original symptom, or briefly make the problem worse for users.

If you need a service-management primer, read systemctl explained for beginners.

Step 6: read logs without dumping the whole universe

Logs are where Linux often tells you what is wrong, just not always in a polite voice.

Useful commands:

sudo journalctl -p err --since "1 hour ago"
sudo journalctl -u ssh --since today
tail -50 /var/log/syslog
tail -f /var/log/syslog
grep -i "error" /var/log/syslog

Use tail when you want recent lines. Use tail -f when you want to watch new lines appear live. Use grep when you are searching for a word or pattern.

Beginner mistake: opening a huge log file in an editor and then wondering why the terminal locked up. Use tail, less, and grep first.

A simple log workflow:

sudo journalctl -u service-name --since "1 hour ago" | tail -50
sudo journalctl -u service-name --since "1 hour ago" | grep -i error

That narrows the time window and searches for obvious failures.

For more detail, use how to check Linux logs and grep explained with real IT examples.

Step 7: check networking like a help desk person

Network troubleshooting gets messy fast, so start with simple questions:

  • Does the machine have an IP address?
  • Can it reach the gateway or internet?
  • Is DNS working?
  • Is the service listening on the expected port?
  • Does HTTP return anything useful?

Commands:

ip addr
ip route
ping -c 4 8.8.8.8
ping -c 4 example.com
ss -tulpn
curl -I https://example.com

ip addr shows local addresses. ip route shows the default route. ping 8.8.8.8 checks basic connectivity without DNS. ping example.com adds DNS into the test. ss -tulpn shows listening ports. curl -I checks web headers without downloading the whole page.

If IP ping works but domain ping fails, suspect DNS. If the service is running but no port is listening, suspect service config, firewall, binding address, or the wrong service entirely. If the port is listening locally but remote clients cannot connect, now you are looking at firewall, routing, security group, or upstream network controls.

That is the real value of a troubleshooting checklist: it turns “network is broken” into smaller facts.

Step 8: capture evidence before changing things

Before you restart services, delete logs, change configs, or paste commands from a search result, capture what you found.

hostname
uptime
df -h
free -h
systemctl status service-name
journalctl -u service-name --since "30 minutes ago" | tail -100

Paste the relevant output into the ticket or your notes. You do not need a novel. You need enough detail that the next person can see your logic.

A good help desk update sounds like this:

Checked disk, memory, service status, and logs. Disk is fine, memory has 2.1 GB available, service is failed, logs show config error on line 42 after today’s deploy. Escalating with the exact error and timestamp.

That is much better than:

Server broken. Tried stuff.

The first update makes you look calm and useful. The second makes everyone quietly update their expectations.

A safe first-pass troubleshooting script

Once you know the commands, you can put a small read-only checklist in a script. This does not fix anything. It just collects basics.

#!/usr/bin/env bash
set -u

echo "== host =="
hostname
uptime

echo "== disk =="
df -h

echo "== memory =="
free -h

echo "== top cpu =="
ps aux --sort=-%cpu | head

echo "== listening ports =="
ss -tulpn

Save it as quick-check.sh, then run:

bash quick-check.sh

Do not add destructive commands to a beginner troubleshooting script. No deletes. No restarts. No “temporary” fixes that become permanent because nobody remembers why they exist.

How to practice this without touching production

Reading a cheat sheet helps, but the commands only stick after you use them on a machine where it is safe to be awkward.

Practice this sequence:

  1. Check where you are with hostname, whoami, and pwd.
  2. Check disk with df -h.
  3. Check memory with free -h.
  4. List processes with ps aux and top.
  5. Read logs with journalctl or tail.
  6. Check ports with ss -tulpn.
  7. Write a short ticket-style summary.

Shell Samurai is built for those reps. You can practice Linux commands, make small mistakes, and build the muscle memory before a real outage has your name on it.

Practice Linux troubleshooting commands in Shell Samurai

Start with the boring checks. The boring checks are how you find the interesting problem without turning the ticket into performance art.

FAQ

What Linux command should beginners use first when troubleshooting?

Start with hostname, pwd, uptime, df -h, and free -h. Those commands quickly tell you where you are, how long the system has been up, whether disk is full, and whether memory is tight.

Should I restart a Linux service before checking logs?

Usually no. Check systemctl status service-name and journalctl -u service-name first. A restart can hide the original error or change the symptoms before you understand the problem.

Is kill -9 safe?

kill -9 force-stops a process without cleanup, so it should be a last resort. Try a normal kill PID first when possible, and understand what the process does before terminating it.

What is the best way to learn Linux troubleshooting commands?

Use a repeatable checklist on a safe practice system. Run the commands, read the output, and write short ticket-style summaries. That builds real troubleshooting judgment faster than memorizing a massive command list.

Practice This in a Real Terminal

Shell Samurai gives you safe Linux missions so the commands actually stick. Chapter 1 is free; the full practice path is a one-time purchase, not another subscription.