Case Study: Out of Memory
Website randomly goes down a few times a week
Server stopped responding
Network and CPU logs show a small spike, but not enough to lock up a server
Stopping and starting the server resolves the problem
This pattern repeated several weeks until the customer grew tired of rebooting the server. The evidence did not seem to lead to a system issue or network or security security problem such as a denial of service. The application logs were clean as well. Also of importance is that this server was a Linux EC2 instance in AWS.
Being that rebooting the server resolved the problem every time, it was decided to duplicate the EC2 instance from its snapshot image. This was completed quickly, but the issue appeared again that night and several times the following day. Finally, an error was seen in the system logs which pointed directly to a memory issue.
After this, it was discovered that the server was a T2.micro instance with 1GB of RAM and no SWAP memory. With only a single small hard disk the best course of action was to create a 4GB SWAP file. Here are the commands that were executed.
Since the creation of the SWAP file the server has remained online and stable. This is another great reminder that system (specifically memory) issues can cause temporary server unavailability and isn’t always necessarily the network or infrastructure.