Basic troubleshooting in Linux
This is very basic to me but I get questions like these so often that I had to start writing them down.
Where is foo with reference Bar?
When you don't know everything there is to know about your system a large part of troubleshooting is simply finding information.
And you'll have something to work with like a service they're using or project name.
The service they're using will have dedicated directories to search for configuration files. The project name might appear inside files as comments or hostnames.
$ find /etc -name "iptables*" -type f
$ grep -ri project /etc
Of course this topic can be relatively broad but I'll stop there until I get the next question that needs writing down.
System is under very high load
What is load?
Load in Linux is simply put calculated on how long processes have to wait for CPU time.
That means that if you have very high network congestion and all your processes have to wait a long time for looking up DNS names or connecting to other services then that will increase your load.
Same if you run out of RAM and your system is forced to use swap, swap is normally on disk so disk latency compared to RAM latency will increase your wait times and your load.
With this in mind it's best to first check all your resources. Disk full? Memory full? Network congested?
Could be as simple as running a manual DNS query or using netcat to connect to another service.
Shutdown services causing load
This goes hand in hand with knowing what is on your system, so I'm assuming you know what the system does. Let's say its primary function is e-mail filtering.
Then first and foremost shutdown all filtering services to stop the load from rising.
This is a decision that must be based on what type of service you're running and your required uptime. E-mail is a very forgiving system and can afford to be shutdown a while.
A website is less forgiving so it's advisable to have some sort of offline notification you can easily move into place to shutdown new traffic.
We do this with the aim to calm the system and give us time to breathe and troubleshoot. So start eliminating sources of load one by one until the load goes down.
- Shutdown e-mail filtering service
- Block new users from your website application server with a static HTML offline notice
- Failing that, simply shut it down
Databases are sensitive
When it comes to database engines like MySQL or PostgreSQL I would rather advise you shutdown the service using the database than the database itself.
If the database server is causing a very high load then you require a DBA which is out of scope for this page.
Logs and other text files
Checking logs is always important but searching through any text file is part of Linux troubleshooting 101.
Finding logs
The obvious thing is checking the config files of the services you're troubleshooting.
For example;
$ sudo grep -ri 'log' /etc/httpd
grep filtering
If you don't know grep just use pipelines.
$ grep 'status=' /var/log/maillog | grep -v 'user@domain.tld'
But grep can be very powerful.
$ grep -E 'status=(foo@domain\.tld|bar@domain\.tld)' /var/log/maillog