Yesterday i faced a problem trying to access one of my web pages in my server (At that time, no monitoring service had been installed in the server).
- I was trying to access my Blog, but i was getting 503 error
- Then i tried to access via ssh. And the connection could not be stablished
- Then i logged into my Digital Ocean’s account and i saw the graphs of the server with a CPU usage of 100% and a Memory usage of 66%. That was strange
Then, i was trying to remember, what were my last movements throug the server, and i realized that the last thing i was doing, was to review my docker containers.
Trying to fix it as soon as possible, i reboot the server and expanded the ram memory (i didn’t know what was the root cause of the problem at that time).
Today, i needed to be sure of what had been the main problem, and for that purpose i tried the following:
First i review my logs:
sudo nano /var/log/syslog
Since Digital Ocean’s graphs showed the probem was from 20:48 to 23:49 (UTC-4):
20:48 (UTC-4) = 00:48 UTC
23:49 (UTC-4) = 03:49 UTC
I realized that “syslog” file hadn’t that interval.
Then i reviewed:
sudo nano /var/log/syslog.1
But i found a huge amount of UFW messages (that was ok, since my UFW was working and blocking access to several ports of the server)
So, in order to avoid those message and to refine the output of the log:
# Avoid some string: sudo grep -v "UFW" /var/log/syslog # Avoid some string and filter sudo grep -v "UFW" /var/log/syslog | grep "dockerd" # Avoid: UFW & level=info and level=warning # Filter dockerd sudo grep -v "UFW" /var/log/syslog.1 | grep -v "level=info" | grep -v "level=warning" | grep "dockerd"
Then, i realized that he root cause was an error with my dockerd:
level=error msg=”libcontainerd: error restarting containerd: fork/exec /usr/bin/docker-containerd: cannot allocate memory”
Searching in the web, i found that was a Bug in my Kernel.
My kernel version was 4.4.0-130-generic x86_64
And the Bug was fixed for kernel 4.5