Case of an FQDN Issue

csereno included in Network Performance Case Study

2018-06-02 557 words 3 minutes

Contents

The phrase, “I can’t access my shared drive” was intermittent, but becoming common for a remote location connected via an MPLS circuit. Without hesitation the finger was pointed at the network and my phone rang. People connect to shared drives everyday, but it is one of those things they take for granted. Behind the scenes there are many layers of technology, protocols, and devices working together to make those connections happen. I can’t count the number of ways to hinder performance of a network share or prevent it from working altogether.

*SPOILER* If you’re like me and you like to know the big picture first, the problem in this case was DNS. Read on for the details, or just know a great test is to place a ‘.’ at the end of your DNS path.

To kick things off I asked for a screen shot of the error. This is what I received:

Obviously, this was not very helpful (are error messages ever?). After a quick check of client connectivity and circuit performance (remembering good practice to start at the bottom of the OSI model) it was determined that the network itself was not the problem. There were no indications of packet loss, high latency, or other errors. Experience told me this was not one of the typical issues I have seen, so I decided to jump right to a packet capture.

The capture was done without any capture filters, so we could see everything that was happening and filter down as needed. It turns out that this was a great idea, because a couple of packets caught my eye that were not between the PC and NAS. Here is something similar to what I saw.

The DNS request packet was showing the fully qualified path twice. If you are still reading this post my guess is you know what DNS is, but if not, at the most basic level it is the system used for translating IP addresses into friendlier names that we can read and remember. I think we can all agree that “chris.theserenos.com” sounds like a typical name, but “chris.theserenos.com.theserenos.com” does not. Obviously, this is incorrect and that would lead to the client PC’s problem. If the PC is calling the wrong name then the shared drive won’t receive the request and respond.

If you have an inquisitive mind, you are now asking yourself the same question I was; “why would the PC append the domain name twice?” Unfortunately, I did not have time to dig into root cause, but I did know a thing or two to try to restore service. The first thing I suggested was to add a period to the end of the fully qualified domain name (FQDN) (i.e. chris.theserenos.com.). Essentially, this informs the client that the FQDN is present and there is no need to append it again. Service was immediately restored and the end users were happy. This was a mapped drive that automatically reconnected at login, so they didn’t mind the “extra” period.

I found a decent discussion on this over at serverfault if you’d like a few more opinions on the matter.

This is more proof that it pays to know your protocols and expected behavior, and take good baseline captures. Here are a few good resources for capture repositories including proper DNS requests and responses.