TL/DR: iptables rules are dropping the packets!

The setup

We’re preparing to go through an ISO 27001 security audit at work.
One requirement is to ensure that all operating systems are locked down.

We chose to follow the Center for Internet Security (CIS) Level 1 benchmark for Ubuntu. Conveniently, Microsoft Azure has pre-configured CIS images, but that convenience comes at a cost. Using the images incurs an additional $.02/hr expense.

We could have manually configured the OS, but that takes time, and we’re under the gun to hit our audit date. We’ll eat the 2 cents per hour to hit our date, and then go back and do things manually if we deem it worth the $175/year (per server).

The “Before” picture

We had a vanilla Ubuntu 18.04 VM running the latest version of Docker. Our web server made an API call to Container#1, and Container#1 in turn made an API call to Container#2. Container#2 then did some unit of work. There’s not much to it.

The “After”…

A picture would be redundant; just imagine the link is severed between Container#1 and Container #2.

We stood up the CIS-Level 1 benchmark Ubuntu 18.04 VM, also running the latest version of Docker. On this configuration, Container#1 would experience a timeout when trying to hit Container#2.

Finding the issue…

From our web server, I called https://10.36.20.11:6200 and got a successful response back from Container#1. Okay, that was expected, but you have to start somewhere. Unplug it, and plug it back in, right?

From our web server, I called https://10.36.20.11:6201 and got a successful response back from Container#2. Okay, that was sort of unexpected. I really thought I would get a timeout. However, at this point, at least I know Container#2 is running, and responsive.

At this point, I tried the exact same tests, but from within the Ubuntu VM itself.

I ssh’d in and ran a ~$curl https://10.36.20.11:6200. Still worked.
I ran ~$curl https://10.36.20.11:6201 and that worked as well. Okay, so the Containers themselves are all good. This is 99.9% a networking issue, leaving .01% for gremlins, and other uncertainties. Is the firewall on? I checked Ubuntu’s Uncomplicated Firewall (UFW) and that was disabled. Hmm…

Peeling the onion

Okay, so what does a CIS Level 1 benchmark actually do? Where do we start? For me, Google. I came across this fantastic resource which detailed (in section 3) all of the network changes that are made.

Disclaimer: I’m not a Network Admin.

How do Docker containers talk to each other? Is there some low-level inter-process communication going on? Looking at the CIS benchmark, I saw that there were changes to packet redirect sending, source routed packets, and reverse path filtering. I had no idea what any of these were, but they sounded promising. I undid those changes & rebooted. No luck.

Continuing on, there were changes to hosts.allow & hosts.deny. Simple enough to change…no luck. Oh, what about Stream Control Transmission Protocol or Reliable Datagram Sockets? Nope. None of those settings did anything either.

Scanning down the next section…

omg I’m an idiot.

It was the firewall. It just wasn’t the UFW flavor - it was iptables. CIS makes a bunch of changes to DROP packets by default instead of ACCEPT. Some of these packets, no surprise, were my Docker packets :(

These are the default rules put in place by the CIS Level 1 benchmark.

We can see that iptables DROPS all traffic for both INPUT/OUTPUT chains, and then adds in some exceptions. Looking at (i.e. Googling) the Docker network stack, we find that containers are running on the 172.17.0.0/16 subnet. Therefore, we can add one more exception to our iptables ruleset:

-A INPUT -s 172.17.0.0/16 -j ACCEPT

Boom! Everything works! And with that, we’re in business!

These are a bunch of commands I used when troubleshooting - including some Docker stuff, some iptables stuff, tcpdump, and a little bit of scp: