Contents

Case Study: Out of Memory

Contents

Symptoms

  • Website randomly goes down a few times a week

  • Server stopped responding

  • Network and CPU logs show a small spike, but not enough to lock up a server

    Ec2_memory_spike

  • Stopping and starting the server resolves the problem

Details

This pattern repeated several weeks until the customer grew tired of rebooting the server. The evidence did not seem to lead to a system issue or network or security security problem such as a denial of service. The application logs were clean as well. Also of importance is that this server was a Linux EC2 instance in AWS.

Troubleshooting

Being that rebooting the server resolved the problem every time, it was decided to duplicate the EC2 instance from its snapshot image. This was completed quickly, but the issue appeared again that night and several times the following day. Finally, an error was seen in the system logs which pointed directly to a memory issue.

Ec2_memory_error

Solution

After this, it was discovered that the server was a T2.micro instance with 1GB of RAM and no SWAP memory. With only a single small hard disk the best course of action was to create a 4GB SWAP file. Here are the commands that were executed.

Swap file commands

Since the creation of the SWAP file the server has remained online and stable. This is another great reminder that system (specifically  memory) issues can cause temporary server unavailability and isn’t always necessarily the network or infrastructure.