I was talking to a friend (let’s call him Jack) about Docker and what it could be used for, how the management platforms were cropping up, etc. We were nattering on happily until I said:
If you are installing a complete and different operating system from the hosting system, you’re probably doing it wrong. –me
That’s where we started spinning our wheels.
For me there’s two sides of this coin:
If you are installing Ubuntu on RHEL in order to run an application, you’re going to have a bad time. When it comes to building a shippable application container, I’m fairly certain that there’s a whole lot of dependency resolution you could do without installing a new Linux distro. Easily shippable, composed layers doesn’t mean we get to be lazy.
You also now have 2 layers of OS that needs to be managed, maintained, patched, and secured. You’re just doing it at the layer level instead of a VM level. You also have a new set of entanglements ensuring that patched layer gets deployed, updated in registries, and actually used. And a new workflow, different from all the other workflows, to do something standardized at other layers. It’s overall messier and can be avoided by sticking to appropriately defining the “application” that is going in the container.
The flip side was Jack’s argument, that Docker presents a standard set of calls that a developer can code to and that so long as code is written to Docker’s exposed calls, code will be ultimately portable.
Now that’s a big departure from my understanding of what Docker presents to containers. My stance was that Docker will control access to system calls on the host kernel via capabilities, but all syscalls are based on the host kernel.
We got stuck there and never came to a good answer over breakfast (not that the conversation wasn’t good!), so I decided to do more research.
I’m not the first one to have this question about kernel namespaces and Docker. And the answers come from Jerome Pettazoni, whose Docker security talk I saw at OSCON. I’m not so sure about his comment that the kernel ABI is stable across multiple versions, but it seems to support my point. Docker isn’t writing masking calls for kernels, it just manages access.
So the next question is how does it manage that access? Through the kernel. Since 2.2, the kernel has a mechanism called capabilities that allow or deny privileged process calls. By default, Docker only exposes certain calls to the containers, limiting the privileged calls available to the processes on the inside. So Docker may deny access to sched_setaffinity() by a process, but how sched_setaffinity() gets called is based on the host kernel. There isn’t a docker_sched_setaffinity() against which your apps need to be written.
- If you drag a whole distro along in your containers, you’re going to have a bad time
- Mind your kABIs if you are writing code that needs to access kernel syscalls
Am I off base? Did I miss something or misstate something?
UPDATE 8/29: I emailed someone I know through work who lives and breathes Docker and SELinux right now and posed the question to him to settle. I’m pleased to report that I’m right, but I’ll be nice and not rub it in Jack’s face. So feel free to disagree with my opinion on how to manage containers.