top of page

DNS Lookups in Kubernetes Workloads

A dive into the importance of properly constructing domain names in workloads running on Kubernetes.


TL;DR: It is most efficient to use absolute domain names (ends in a dot) when accessing resources from a container of a Kubernetes pod; this is particularly important for off-cluster resources. If this is not possible, we can use the pod specification field dnsConfig to modify the behavior of the container’s resolver to make using relative domain names (does not end in a dot) to off-cluster resources equally efficient.


If this is new to you, it was for me, then let us walk through why this it true.


Let us start with something familiar; what happens under the hood with DNS when we use an Internet browser on our workstation to visit github.com?


Please note: Here we are only considering DNS from the perspective of the local operating system; not the complex system of servers serving up results.


We can use the nslookup to simulate how our workstation’s operating system, or more precisely its resolver, resolves DNS names.


Please note: nslookup only simulates how the resolver works; it does not actually use the resolver itself. As such there are some subtle differences that we will see later.

$ nslookup -debug github.com
Server:  2603:8001:cb01:8fd9:c641:1eff:fee7:d45f
Address: 2603:8001:cb01:8fd9:c641:1eff:fee7:d45f#53------------
    QUESTIONS:
 github.com, type = A, class = IN
    ANSWERS:
    ->  github.com
 internet address = 192.30.255.112
 ttl = 33
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name: github.com
Address: 192.30.255.112

Things to observe:

  • The resolver asks its configured nameserver for the address for github.com (the question) and the nameserver responds with an IP address (the answer)

Now let us repeat this same test when logged into an Ubuntu container of a typical pod running on a Kubernetes cluster.


Please note: First, we had to install the apt dnsutils package in order to use nslookup.

# nslookup -debug github.com
Server:  10.8.0.10
Address: 10.8.0.10#53
[OBMITTED]
    QUESTIONS:
 github.com.default.svc.cluster.local, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.default.svc.cluster.local: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com.svc.cluster.local, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.svc.cluster.local: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com.cluster.local, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.cluster.local: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com.us-central1-c.c.red-forklift-301112.internal, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.us-central1-c.c.red-forklift-301112.internal: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com.c.red-forklift-301112.internal, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.c.red-forklift-301112.internal: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com.google.internal, type = A, class = IN
    ANSWERS:
[OBMITTED]
** server can't find github.com.google.internal: NXDOMAIN
[OBMITTED]
    QUESTIONS:
 github.com, type = A, class = IN
    ANSWERS:
    ->  github.com
 internet address = 140.82.114.3
 ttl = 42
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name: github.com
Address: 140.82.114.3
------------
    QUESTIONS:
 github.com, type = AAAA, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  github.com
 origin = dns1.p08.nsone.net
 mail addr = hostmaster.nsone.net
 serial = 1611586688
 refresh = 43200
 retry = 7200
 expire = 1209600
 minimum = 3600
 ttl = 1799
    ADDITIONAL RECORDS:
------------

Things to observe:

  • Notice here that the resolver asks its configured nameserver six questions, all responding with a NXDOMAIN message, before it asks it for github.com that responds with an IP address

  • NXDOMAIN is a DNS message type received by the DNS resolver when a request to resolve a domain is sent to the DNS and cannot be resolved to an IP address

Before digging into what happened here, let us perform a different test from the Ubuntu container.

# nslookup -debug github.com.
Server:  10.8.0.10
Address: 10.8.0.10#53------------
    QUESTIONS:
 github.com, type = A, class = IN
    ANSWERS:
    ->  github.com
 internet address = 140.82.112.3
 ttl = 59
    AUTHORITY RECORDS:
    ADDITIONAL RECORDS:
------------
Non-authoritative answer:
Name: github.com
Address: 140.82.112.3
------------
    QUESTIONS:
 github.com, type = AAAA, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  github.com
 origin = dns1.p08.nsone.net
 mail addr = hostmaster.nsone.net
 serial = 1611586688
 refresh = 43200
 retry = 7200
 expire = 1209600
 minimum = 3600
 ttl = 1799
    ADDITIONAL RECORDS:
------------

Things to observe:

  • Notice the extra dot at the end of the domain name

  • As in the test from our workstation, the resolver asks its configured nameserver for the address for github.com (the question) and the nameserver responds with an IP address (the answer)

To better understand what happened, we first explore the difference between an absolute and a relative domain name.

When a user needs to type a domain name, the length of each label is omitted and the labels are separated by dots (“.”). Since a complete domain name ends with the root label, this leads to a printed form which ends in a dot. We use this property to distinguish between: - a character string which represents a complete domain name (often called “absolute”). For example, “poneria.ISI.EDU.” - a character string that represents the starting labels of a domain name which is incomplete, and should be completed by local software using knowledge of the local domain (often called “relative”). For example, “poneria” used in the ISI.EDU domain.

— IETF — Domain Names — Concepts and Facilities

Let see also look at our workstation’s resolver configuration.

$ cat /etc/resolv.conf
#
# macOS Notice
#
# This file is not consulted for DNS hostname resolution, address
# resolution, or the DNS query routing mechanism used by most
# processes on this system.
#
# To view the DNS configuration used by this system, use:
#   scutil --dns
#
# SEE ALSO
#   dns-sd(1), scutil(8)
#
# This file is automatically generated.
#
search natmtn.rr.com
nameserver 2603:8001:cb01:8fd9:c641:1eff:fee7:d45f
nameserver 192.168.1.1

And the Ubuntu container’s resolver configuration.

# cat /etc/resolv.conf 
nameserver 10.8.0.10
search default.svc.cluster.local svc.cluster.local cluster.local us-central1-c.c.red-forklift-301112.internal c.red-forklift-301112.internal google.internal
options ndots:5

Things to observe:

  • The relevant difference between the configurations is the ndots option

The ndots option:

Sets a threshold for the number of dots which must appear in a name given to res_query(3) (see resolver(3)) before an initial absolute query will be made. The default for n is 1, meaning that if there are any dots in a name, the name will be tried first as an absolute name before any search list elements are appended to it.

— Linux manual page— resolv.conf(5)


Thinking about the workstation’s and Ubuntu container’s ndots option values we can explain the three different test results:

  • Workstation with github.com: The domain name is a relative name with one dot. With an ndots of one (one is greater than or equal to one) the domain name is sent to the nameserver to be resolved as an absolute domain name

  • Ubuntu container with github.com: As before the domain name is a relative name with one dot. With an ndots of five (one is less than five), the domain is first appended with each of the six search list elements and sent to the nameserver to be resolved (here, all returning the NXDOMAIN message). Then it finally sends the domain name to be resolved as an absolute domain name

  • Ubuntu container with github.com.: The domain name is an absolute name. The domain name is sent to the nameserver to be resolved

The ndots of five is a Kubernetes feature that is designed to optimize for relative name lookups of cluster resources; see a response to a Kubernetes issue for a discussion of the rationale.

So what is the big deal? To see, let us setup a number of workloads to explore the impact of having a ndots of five. The workloads are:

  • nginx: A deployment with an nginx container exposed by service with the absolute domain name of nginx.default.svc.cluster.local.

  • app-1: A deployment and service with a Node.js web application that first accesses the nginx service using the relative domain name nginx.default.srv.cluster.local and then returns a response.

  • app-2: The same Node.js web application; just using the absolute domain name nginx.default.svc.cluster.local.

  • load-1: A job that puts a load on app-1; 10 queries per second (qps) for 30 seconds

  • load-2: A job that puts a load on app-2; 10 queries per second (qps) for 30 second

Please note: While nginx.default.svc.cluster.local is a domain name for a cluster resource, it will resolve when sent as an absolute domain. This simulates the behavior of using domain names for off-cluster resources like one accessed from the domain name github.com

Let us examine the performance of the two jobs. First the job putting a load on app-1; the one using a relative domain name.

$ kubectl logs load-app-1-92gxz
[OBMITTED]
Code 200 : 300 (100.0 %)
Response Header Sizes : count 300 avg 227 +/- 0 min 227 max 227 sum 68100
Response Body/Total Sizes : count 300 avg 239 +/- 0 min 239 max 239 sum 71700
All done 300 calls (plus 4 warmup) 13.309 ms avg, 10.0 qps

Then the job putting a load on app-2; the one using an absolute domain name.

$ kubectl logs load-app-2-vzf8f
[OBMITTED]
Code 200 : 300 (100.0 %)
Response Header Sizes : count 300 avg 227 +/- 0 min 227 max 227 sum 68100
Response Body/Total Sizes : count 300 avg 239 +/- 0 min 239 max 239 sum 71700
All done 300 calls (plus 4 warmup) 6.843 ms avg, 10.0 qps

Things to observe:

  • app-1, using the relative domain name, has about a 6 ms performance penalty as compared to app-2, using the absolute domain name

  • The penalty increases linearly, 6 ms each, with the number of relative domain names (with less than five dots) for off-cluster resource accessed by the application

Thus, it is best to use absolute domain names when accessing off-cluster resources from a container of a Kubernetes pod. While it is also most efficient to use absolute domain names for on-cluster resources, the impact of using relative names depends on how deep the resolver needs to iterate through the search list element to get a resolution.


This approach works fine when we can control the domain names accessed by a workload; but there are circumstances when that is not possible. For example, when we are using pre-built container images or third-party libraries in our containers that access off-cluster resources. For situation like this, we can provide a pod specification field dnsConfig to control the value of ndots in the container, e.g.,

dnsConfig:
  options:
  - name: ndots
    value: "1"

In this case, domain name resolution in the container will behave like our workstation; any domain name with a dot being resolved as an absolute domain first.


Please note: When the resolver first sends a relative domain name as an absolute domain name, when the dots are greater than or equal to ndots, and it receives a NXDOMAIN response, the resolver will then continue to send names suffixed with the search list elements to the nameserver for resolution. This is an example where nslookup behaves differently than the resolver; definitely threw me off for some time.


With this dnsConfig, we would want to ensure that we do not use relative domain names with a one or more dots in it when accessing cluster resources; otherwise we would pay a penalty with an NXDOMAIN response for every such access. In this case, it is best to use to use absolute domain names for accessing cluster resources.


To illustrate this situation, we setup two additional workloads:

  • app-3: Pod with dnsConfig (shown above) running same Node.js web application; using the relative domain name nginx.default.svc.cluster.local

  • load-3: A job that puts a load on app-3; 10 queries per second (qps) for 30 seconds

Please note: As before nginx.default.svc.cluster.local simulates the behavior of using domain names for off-cluster resources.


Let us examine the performance of the this job.

$ kubectl logs load-app-3-lgfvw
[OBMITTED]
Code 200 : 300 (100.0 %)
Response Header Sizes : count 300 avg 227 +/- 0 min 227 max 227 sum 68100
Response Body/Total Sizes : count 300 avg 239 +/- 0 min 239 max 239 sum 71700
All done 300 calls (plus 4 warmup) 7.121 ms avg, 10.0 qps

Things to observe:

  • Even though app-3 uses a relative domain name, it has essentially the same performance as the one using an absolute domain name; app-2


Wrap Up

Having used Kubernetes for some time, I was surprised it took me this long to learn about this topic. Hope you found it useful.


Source: Medium


The Tech Platform

www.thetechplatform.com

0 comments
bottom of page