Who is resolving my names?

Who is resolving my names?
Image generated by Bing Image Creator. I felt like getting in on the fad.

Let's say I have a Linux network namespace, with just a Wireguard interface moved from the main netns, as described in the Wireguard documentation; all traffic must go through Wireguard, as it's the only way out. This solution is fairly neat, since it lets you isolate some processes and force them to run through a Wireguard VPN, fully transparently to the application.

Now, onto name resolution. I use NetworkManager, and I don't use systemd-resolved yet, so in theory, name resolution should go through whatever nameserver is provided in /etc/resolv.conf, by glibc.

I want to make sure DNS goes to the correct nameserver, the one on the other side of the VPN, when I enter into this namespace. OK, easy enough: ip netns exec already has functionality for this. If you add a file at /etc/netns/[namespace]/resolv.conf, it will bind mount it in a new filesystem namespace to /etc/resolv.conf. So if we want to do this, it's as easy as using ip netns exec, or at least, doing the same steps manually.

Right?

Wrong

Using online DNS leak detection tools, I could tell from a browser running within the namespace that it was somehow calling out to the other, original nameserver, discovered via DHCP. At this point it occurred to me that maybe it wouldn't be a bad idea to just force all unencrypted DNS traffic within the netns to go to the desired nameserver, for good measure. So I tried to patch this leak using iptables, like you would.

ip netns exec "${NAMESPACE}" iptables -t nat -A OUTPUT -p udp --dport 53 -j DNAT --to "${NAMESERVER}"
ip netns exec "${NAMESPACE}" iptables -t nat -A OUTPUT -p tcp --dport 53 -j DNAT --to "${NAMESERVER}"

It's a pretty heavy-handed solution, but it's essentially guaranteed to work. Applications can of course use DoT or DoH or any other mechanism they choose over TCP/UDP to resolve names, if they want, but at the very least, our glibc resolver should definitely, 100%, absolutely for sure use the one that we want it to.

...

End of the blog post, right? ...Right?

Still wrong

I thought I knew how my machine's DNS was configured. After all, it's simple: glibc is configured via /etc/nsswitch.conf. I have some crap in there (like mdns and etc.) but mainly, the hosts: line is just dns. So in theory, that means an application using glibc is going to read /etc/resolv.conf, then go and call out to the nameserver using UDP or TCP and everything is good.

Except it's not. Because, when I run this command: sudo ip netns exec [namespace] curl example.com, I can see clearly via tcpdump port 53 that it is in fact, somehow, in a namespace with only Wireguard access, making a DNS request via the primary network interface in the default network namespace:

19:31:11.573718 IP xxxxx.xxxxx > xxxxx.local.domain: xxxxx+ [1au] A? example.com. (40)

Oh dear. This is not just a matter of using the wrong DNS server. This is straight-up leaking through the network namespace!

Finding out what's going on

At this point it feels like something impossible is happening. Clearly, the process doing the name resolution is not the one requesting the name resolution. However, using applications I know for sure are using glibc to resolve names, I can see it magically teleport outside of the namespace and resolve the names using the default interface. What exactly is going on here?

I decided the next plan of attack was to go in with strace and see what exactly its calling on a syscall level. I am, at this point, paranoid that this rabbit-hole will go on for weeks until someone just simply tells me the solution to the riddle.

However, much to my surprise, and frankly, relief, this immediately led me to the answer.

connect(3, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 

...It was nscd.

What's Nscd?

I've been using Linux for a disturbing percentage of my life and nscd is a familiar name from a time long ago. The thing is though, I thought it was generally considered obsolete. Why was it running on my machine? I don't believe I configured it explicitly. The description for nscd is pretty much what you'd expect guessing letters out of the name:

Nscd is a daemon that provides a cache for the most common name service requests. The default configuration file, /etc/nscd.conf, determines the behavior of the cache daemon. See nscd.conf(5).

But I don't think my requests are cached. In fact, they pretty obviously aren't, at least for DNS requests. So what's going on here?

I use NixOS, so most likely I can figure out what's going on by grepping for nscd in Nixpkgs. And almost immediately, I got my answer:

Whether to enable the Name Service Cache Daemon. Disabling this is strongly discouraged, as this effectively disables NSS Lookups from all non-glibc NSS modules, including the ones provided by systemd.

Of course. Dynamically linked NSS modules constitute global state, so NixOS doesn't want to provide NSS modules that way. So instead, they rely on nscd: nscd can be provided all of the NSS modules, and everything else can just connect to it over a socket. At least I think that's the idea here.

This was surprisingly easy to figure out once I realized what was going on, but how do we fix it?

A workaround

For the most part, I don't really need a fancy solution to the NSS problem. I just want to be able to ensure that my namespaces stay isolated in their Wireguard VPN world, so that everything works as expected.

In my case, I'm not actually using ip netns exec, so I can control exactly what happens when we enter the namespace. So here's my solution: after unsharing the filesystem, I bind mount /var/empty to /var/run/nscd. It looks roughly like this in C (don't worry: the actual version of this does error checking and everything like that):

unshare(CLONE_NEWNS);
mount("none", "/" NULL, MS_REC | MS_PRIVATE, NULL);
mount("...", "/etc/resolv.conf", NULL, MS_BIND | MS_PRIVATE, NULL);
mount("/var/empty", "/var/run/nscd", NULL, MS_BIND | MS_PRIVATE, NULL);

Now, when I enter into this namespace, glibc can no longer utilize nscd to perform DNS requests, and will instead go through the intended DNS server.

Conclusion

It goes without saying that you should most likely not be ducktaping together Linux namespaces if you're in a life-or-death scenario. If you absolutely need security or privacy, you should consider an operating system like Qubes that has a layered approach, or at least use software set up to have a good configuration out of the box like Tor Browser or Mullvad Browser. Just because I patched this hole does not mean this is secure or that it doesn't leak. If you forget to disable WebRTC PeerConnection, you have another blatant hole potentially leaking information, and there are plenty more examples.

However, I still find the ability to "jail" processes such that they can only communicate through a VPN to be very useful; there are many practical applications. It's more ergonomic than wrangling virtual machines, more flexible and performant than messing with SOCKS proxies, and certainly less obtrusive than switching your entire machine to get routed through a VPN just so you can do some work using one. It can be a useful tool for using a VPN, so as long as you're not doing anything security critical with it. If you're doing some web scraping or bypassing region detection, it can be a fine solution.

With all of that having been said, I am a bit bothered about the way this played out. I think that even if you're aware that the system has nscd installed and configured, it might not be obvious the consequences that this can have. Someone once created an issue for the nscd issue in NixOS, but it was mostly concluded that there was no particular problem with the approach of using nscd; as far as I could see, nobody raised this particular issue. I do agree that nscd a more elegant solution for the NSS module problem than many of the alternatives, but I wonder if what I just ran into presents a decent case against this approach and for the less elegant approach with global state. Is it possible nobody realized this consequence?

With that said, I'm happy enough with my workaround, so I'm considering this case closed for me personally.