*Disclaimer: all captures in this post were anonymized using TraceWrangler.
I was recently asked to help with a performance issue. I was informed a transfer was going to take weeks instead of a couple days as expected. The transfer rate was getting 80Mbps throughput max on a 10Gbps connection. So, I setup captures at both ends and got to work. This is just a quick summary of that work with the classic tell-tale signs of a performance problem. The first thing I noticed were 30 zero window segments in a matter of seconds in the “Expert Information” window. One or two might be tolerable under normal circumstances, but 30 is something of interest. Small TCP window sizes and zero window packets generally mean there is a problem with one of the end devices.
This grabbed my attention, so I moved back to the packet list. When looking at the Zero Window packets I noticed delays anywhere from 100 to 300 ms before the Window Update. (If you don’t have a TCP Delta/Delay column setup already in your instance, I highly recommend it!)
TCP Zero Window follow by a delay
This was also clearly evident in the Time Sequence Delay graph. This is a classic example of the “stair step” graph. This should look more like a diagonal line up to the right. The receiving end cannot keep up with the data flow and is slowing the traffic.
Reviewing the Window Size graph revealed an even more disturbing picture. It seems the server couldn’t keep up at all with the incoming data. The window sizes dropped rapidly, and they all delayed before the acknowledgement and window update.
I decided to glance at the TCP options in the packets of the TCP handshake. The calculated window sizes and maximum segment size looked good. The Window scaling leaved something to be desired.
These were all classic symptoms of a performance issue on the receiving server. In this case, the server admin performed a few network tweaks and adjusted settings in the security software resulting in the disappearance of the reducing window sizes/zero window packets (the visible network symptom of root cause). Consequently, the delays were shortened and performance increased. Unfortunately, the day took a quick turn and I was unable to capture the new data to get a “before and after” snapshot or ask what “tweaks” he made specifically, but the results speak for themselves. Another performance issue resolved, another happy customer, and a cookie cutter example of performance indicators in Wireshark.
For 11 years networking was my profession with a specialized focus on proactive and reactive performance analysis. More recently I have embraced the AWS platform. This blog reflects my experience both past and present.
Being that I recently took the plunge to a static site hosted on AWS S3 I thought I would create a post outlining the high-level process for future reference. There are quite a few blogs in the interwebs outining this process, but if this helps someone else too then it’s a win-win. If you are curious as to WHY I migrated, you can find a short bit about that in this post.
This is old news, but I recently found out that Jeff Bezos considered “Relentless” as a possible name for his company instead of Amazon. If this is news to you as well, you can read a short article about it from Business Insider here.
Webster Dictionary defines “relentless” this way:
showing or promising no abatement of severity, intensity, strength, or pace
For some reason this word has stuck with me the past couple of weeks.
CloudShark released a new packet capture challenge for this Christmas season! Unfortunately, I don’t have the time to participate right now, but I wanted to reshare this for those of you that do. I also wanted this to serve as a reminder for me to come back to it later. Good luck!
Moving to S3 I’m currently in the process of moving my blog to Amazon S3. I’m not the first to do this and I won’t be the last. My reasons for this are similar to everyone else. Wordpress is an excellent blogging and site platform. I have really enjoyed working with it and getting to know its innards. As I continuously evaluate the purpose of my site though, I have to keep the technology behind it in sync.
Well, Tom and the team at CloudShark have put together an excellent packet capture challenge on their blog once again. It has actually been awhile since I’ve dug into a capture due to my recent shift in focus to Amazon Web Services, so this was a lot of fun for me. I feel like once you’re a “packet junkie” you are always one!
<span style="color: #ff0000;">*SPOILER ALERT*</span> The rest of this post describes the challenge and the process I followed for solving the challenge.
I no longer have a need for the Cisco Meraki MX64. It was only used for testing. It is in working condition. It has been reset to defaults and is unclaimed. It comes in the original box with the power and network cable. See the listing here.
We spend a lot of time monitoring our internal networks. Obviously, this is where we have the most tools at our disposal and where our actual responsibility lies. But, to provide good service to our customers and/or end users we also need to be aware of what is happening at our Internet providers and above. If you have global services then I recommend you monitor the submarine cables as well. For example, this was the latest submarine cable damage that impacted regions in Africa.
Performance and security is always a balancing act, but in the case of DNSSec it’s a no-brainer. In short, DNSSec allows a client to trust the domain owner when performing DNS queries. It’s another step to defending your domain (and subsequently your content and network) from the bad guys. An added benefit is there is no noticeable impact to performance!
CloudFlare just released a great blog post on their DNSSec offerings and how they are expanding.
So, all credit goes to Colm MacCárthaigh for this one. I think his recent post on Shuffle Sharding is so go it deserves a share and a place on my blog to serve as a reminder for me from time-to-time. This is one way AWS achieves the level of reliability and stability it has for its customers. Some of the methodology can easily be applied to traditional and on-prem infrastructure though as well.