We spend a lot of time monitoring our internal networks. Obviously, this is where we have the most tools at our disposal and where our actual responsibility lies. But, to provide good service to our customers and/or end users we also need to be aware of what is happening at our Internet providers and above. If you have global services then I recommend you monitor the submarine cables as well. For example, this was the latest submarine cable damage that impacted regions in Africa.
I love packets and tracing issues at a micro level. However, like I stated in Preparing for the Capture you need to know where to capture before you can dig into the bits an bytes. In order to know where to capture you must understand your service/app/network. The best way to do that is to diagram your service.
I generally avoid creating posts that are specific to my employer, but this is already public knowledge and it was fun to be involved even in a small way. So often us “packet junkies” only get to see the results of our work through the lens of smoothly flowing packets. If we’re lucky we might hear the delight in our customer’s voice over the phone or get a nice email sharing the results.
Have you ever had a nightmare where you are being chased and you can’t just seem to run away fast enough? No? Well, maybe you’ve tried running through snow up to your knees or swimming while wearing jeans. All of those examples point to situations that feel like something isn’t quite right. Cases where there could be better performance if only something was changed or improved. Sometimes this same thing happens to network devices.
In order to understand application performance across the network, we first have to understand the basic mechanisms. In this case that foundation is built on TCP, and, more specifically, the built-in TCP Performance Options. There are many things that can be done in an application to improve performance. There are also several options from a network perspective, and more still in the operating systems. However, these all rely on the underlying protocol.
*Disclaimer: all captures in this post were anonymized using TraceWrangler. I was recently asked to help with a performance issue. I was informed a transfer was going to take weeks instead of a couple days as expected. The transfer rate was getting 80Mbps throughput max on a 10Gbps connection. So, I setup captures at both ends and got to work. This is just a quick summary of that work with the classic tell-tale signs of a performance problem.