Docker service fails to start when using devicemapper

This has tripped me up too many times now, I need to make a note for future reference.

With Ubuntu 16.04 and devicemapper as the docker storage driver, occasionally after a system boot the docker service won’t startup. The logs (systemd status docker.service) show something like:

Dec 05 15:18:23 xxx systemd[1]: Failed to start Docker Application Container Engine.
Dec 05 15:18:23 xxx systemd[1]: docker.service: Unit entered failed state.
Dec 05 15:18:23 xxx systemd[1]: docker.service: Failed with result 'exit-code'.
Dec 05 15:18:23 xxx systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Dec 05 15:18:23 xxx systemd[1]: Stopped Docker Application Container Engine.
Dec 05 15:18:23 xxx systemd[1]: docker.service: Start request repeated too quickly.
Dec 05 15:18:23 xxx systemd[1]: Failed to start Docker Application Container Engine.
Dec 05 15:18:39 xxx systemd[1]: Stopped Docker Application Container Engine.

You can react to this one of two ways. Spend ages poking around, trying to start the service and find an explanation, or just run lvscan and discover that that volume being used by devicemapper is inactive and docker therefore can’t start:

  ACTIVE            '/dev/ubuntu-vg/root' [51.76 GiB] inherit
  ACTIVE            '/dev/ubuntu-vg/swap_1' [8.00 GiB] inherit
  inactive          '/dev/docker/thinpool' [57.00 GiB] inherit

The solution is to run vgchange -ay <name_of_volume_group> (or just vgchange -ay if you don’t need to be specific) which activates the volume and will then allow you to start the docker service. <name_of_volume_group> can be retrieved via vgs.

Worth noting that the fix here is nothing magical, it’s just the standard way you activate a volume. The crux of the issue is why the volume was deactivated in the first place – it’s seemingly a race condition during the boot process (I’ve seen suggestions of various culprits), but I’ve yet to pin down the exact context.

Posted in Fixes | Tagged , | Leave a comment

Thoughts on the journey towards containers

I spent a couple of days this week at ContainerSched 2016 where I took part in a panel discussion, sharing insights from our journey into containerisation. All the sessions are available to watch online.

The event wrapped up with a cracking keynote by Adrian Colyer entitled “Making sense of it all” in which he presented his perspective on where the world of containerisation is today, and where it’s headed, supported by evidence gleaned through direct discussions with many movers and shakers in the industry. You should definitely go check out the recording!

Having spent more time talking about containers in two days than the last two months, I thought I’d share some thoughts based on takeaways from the conference and my own experience, in case it helps others undertaking a similar journey. There’s a TL;DR at the end if you need it 🙂

Containers are in production

A common concern about using containers is that they’re simply not ready for mainstream production use yet – they’re too new, too immature, too insecure etc. That’s simply not the case. An O’Reilly survey carried out in May last year found 40% of respondents were using containers in production, with more than half planning to enter production within 6-12 months.

A similar survey by ClusterHQ around the same time found similar results, with 38% of respondents currently in production. Their 2016 survey (due for release shortly) shows this figure to have climbed to 76%, supporting last year’s predictions (see Adrian’s talk for more details).

So containers are no longer confined to dev/QA environments – they’re very much out there in the wild.

Beware the concept count

For many (most?) people, the first encounter they have with containers will be via Docker. When you read/watch the various introductory Docker tutorials out there, it can seem pretty simple to spin one up. There is a huge difference between running a container on your laptop and deploying a containerised application stack to production however.

Once you start down the containerisation path, the concepts come thick and fast, with an ever-increasing number of concerns you have to investigate, learn, make judgments and decisions about, and build confidence in (both within your team and the wider organisation). You need solutions to networking, security, service discovery, storage, logging, monitoring and so on.

Don’t underestimate the learning curve. If you, your team or your organisation shies away from learning new things, or has a tight deadline, chances are it’s not the right time for you to explore containers. If you don’t have the problems that containers solve, then it’s definitely not the right time.

Be pragmatic

As alluded to above, the Docker/container ecosystem is huge, with new tools and approaches emerging every day. It can be tempting to feel like you have to learn everything.

The reality is you don’t need everything. As exciting and shiny as a solution may seem, do you actually have the problem it’s trying to solve? A funky auto-scaling setup may sound perfect, but if your fluctuations are mild and predictable enough, then maybe a manual approach would suffice for now, leaving you with one less concept to care about?

Just because there are lots of people choosing Kubernetes, or extolling the virtues of Weave, or advocating Project Calico doesn’t mean that each of those is suitable or essential for your circumstances, especially if your environment is relatively simple rather than involving 1000s of containers across dozens of hosts.

Speaking for myself, after surveying the various options, I settled on Rancher for our particular orchestration needs – it’s pretty simple to setup, with a lot of stuff available out of the box (overlay network, service discovery, load balancing, clean UI). Sure there are lots of capabilities it doesn’t have that you can find elsewhere, some of which a very strong case could be made for. However we just don’t need anything more right now, so the added complexity has no justification.

Obviously Rancher might not be right for you – ultimately there is no single “best solution”. Context is king – be realistic about what your needs really are and take care not to over-engineer.

Miniliths vs microservices

No that isn’t a typo, it’s a word I made up… One of the main drivers for the adoption of containers is the adoption of a microservices architecture in place of a monolithic one. However, just like containers, microservices bring with them a whole world of painful overheads. For every microservice you introduce, there’s a cost to pay, and you need to be careful any benefits aren’t swallowed by the downsides.

Where appropriate, consider a “minilith” approach i.e. keep the number of microservices to a minimum, grouping related functionality as pragmatically as possible whilst leaving doors open to allow them to be split out further later if required (which may possibly be never).

Yes, this won’t be pure microservices, and it may hurt a bit inside, but it’ll mean reduced complexity with fewer processes to monitor, log, deploy, recover, scale and so on.

Stand on the shoulders of giants

With the ecosystem moving so fast, rolling your own solutions should be a last resort. It might be tempting to build a custom orchestration framework, but unless you keep it actively developed it won’t take long until it falls behind third party solutions and you’ll miss out on capabilities you would otherwise get for free. Not to mention having a harder job of getting others up to speed on how it works.

Once the pace of change slows down and the ecosystem matures, then sure, give it a go (if you’re sure it’s absolutely essential). But until then, don’t be tempted into striking out alone unless you can clearly quantify the benefits.

Don’t forget the humans

All the talk of automation makes it easy to overlook the human element of developing and maintaining systems. However slick a solution you may have in mind, without the right culture and mindset amongst your teams, you’re going to have problems.

Remember that DevOps is about communication and collaboration as much as automation, and can require a cultural shift to facilitate this. Having people on your team who don’t work well with others is a much more important problem to solve than a bit of automation. And if you can’t change behaviours, perhaps it’s time to manage people out.

Networking (the human kind)

Watching session recordings, reading blog posts etc is all well and good, but you really can’t beat networking with others and sharing thoughts and experiences. This might sound obvious – you could say the same about any topic. However, given how early-on we are in the process of adopting containers, it feels especially applicable at the moment.

There’s a lot of collective wisdom that is yet to be distilled down into blog posts and best practices that you can only really access by talking to people who have been there and done it (and quite possibly got burnt by it!). Put aside your British reserve (if you are British, obviously) and get out there and interact!

Storage is a persistent problem

When asked in the panel discussion which area we felt Docker is most lacking in, each of the three of us picked storage. This was a common theme amongst the many discussions I had too – it’s pain that is clearly widely felt. Even something as simple as inspecting the contents of a Docker volume from the command line is harder than it needs to be.

Yes, there are various storage solutions out there (e.g. Gluster, ClusterHQ, Convoy), and new ones entering the market all the time. But this is a tough problem to solve – if you want distributed storage you have to bear in mind the laws of physics in terms of latency and architect your application to handle your use cases and available IO accordingly.

Basically, for anything stateful, your life is going to be much harder than if it was stateless. Where possible, farm data off somewhere to a non-containerised external service.

Finding good people is hard

I suspect I’m preaching to the choir here (literally everybody I speak to has agreed with this), but one trend that just won’t go away is that finding good people with relevant skills is hard. Even harder than building distributed storage systems. Bearing in mind how new the ecosystem is, most knowledge and experience lives within developers and engineers that are not yet ready to re-enter the job market.

Consequently this means if you need extra pairs of hands, you’re most likely going to need to turn to the contractor or consultant markets. Which may not be the end of the world (though possibly the end of your budget). But don’t rely on filling a permanent vacancy in a hurry, especially if you’re outside London.

This is another reason to keep your concept count as low as possible and lean on third party offerings – if somebody on your team leaves, how quickly will it take somebody new to get fully up to speed?

TL;DR

I appreciate much of this is fairly conventional wisdom that could be applied to many more areas than containers. But given the penchant developers have for shiny things, it’s easy for it to fall by the wayside sometimes.

The world of containers is huge and can feel daunting, possibly a mountain you daren’t tackle. It’s definitely not a silver bullet, but once you cut through the hype and take the knife to the concept count to eliminate anything you really don’t need, it becomes far more approachable. Don’t be afraid to give it a go (if appropriate), but keep your eyes open and talk to others. And make sure you get that culture right.

I’d love to hear about your experiences – feel free to comment or contact me via whichever channel you prefer.

Posted in Opinion | Tagged , , , | 2 Responses

Show hidden files in Atom sidebar

By default, the Atom editor hides files in its sidebar that match your VCS ignore file. This can be frustrating, as a hidden file isn’t necessarily one you don’t care about – for example, the node_modules folder will typically be excluded from a Node.js application, but you may wish to jump in and examine a module, do some debugging etc.

I wanted to show hidden files, so naturally I went to Preferences to look for an appropriate setting and found one that looked just the ticket: Settings > Core Settings > Exclude VCS Ignored Paths.

atom-global-ignore-setting

I figured that unticking this setting would do the job, but it made no difference – confusingly, it turns out that this is a global setting used by functionality such as fuzzy searching, and there is another setting that overrides this for the sidebar (a.k.a. tree view). Because Atom is modular, most of its components exist as independent packages. The setting you need is found in Settings > Packages > Core Packages > Tree View:

atom-tree-view-ignore-setting

Here you’ll find “Hide Ignored Names” and “Hide VCS Ignored Files” settings – the latter is the one you want to stop the sidebar from ignoring files that your VCS ignores.

The first setting relates to another of the settings back on the main Settings panel (see first screenshot above) – Preferences > Settings > Core Settings > Ignored Names. This allows you to globally ignore file/folder patterns regardless of VCS ignore files. You probably want to leave this on unless you really want to be able to dig around in your `.git` folder within the editor.

Hope this saves somebody a bit of time and frustration as it took me ages to figure out! And I’m clearly not the only one, so hopefully Github will sort this soon.

Posted in Tips | Tagged | 22 Responses

Make Docker play nicely with UFW

I’ve been spending a lot of time working with Docker over the last year, primarily in Ubuntu environments. So long in fact, that I seem to have forgotten this blog exists 🙂

Something it took me a while to figure out was how to stop Docker from bypassing UFW and exposing mapped ports to the world (due to specifying its own iptables chain). More often than not, I want containers to be restricted to private network access only. One option is to specify an ip address when mapping ports, but that’s a bit clunky and doesn’t work so well when you want to be able access the ports via multiple private ip addresses.

The challenge was making sure not to block outbound or inter-container connectivity in the process.

Having had to set up a number of servers and keep finding myself forgetting one of the steps, I figured it was about time I put this blog to good use and list the necessary commands here:

sudo ufw allow in on docker0
sudo sed -i s/DEFAULT_FORWARD_POLICY=\"DROP\"/DEFAULT_FORWARD_POLICY=\"ACCEPT\"/ /etc/default/ufw
sudo ufw enable
iptables -t nat -A POSTROUTING ! -o docker0 -s 172.17.0.0/16 -j MASQUERADE
Posted in Fixes | Tagged , , | 14 Responses

C# compiler error CS0182 when building via TeamCity

When running a build via TeamCity, I started to get the following exception:

CSC error CS0182: An attribute argument must be a constant expression, typeof expression or array creation expression of an attribute parameter type

The same code built fine in Visual Studio 2012, and nothing had changed on the build server so clearly something in the most recent commits was triggering a difference in behaviour between the two environments.

It turns out that there’s a compiler bug in C# 4.0 (which our TeamCity server was using) involving null default parameters for attributes (an instance of which had been introduced in a recent commit) e.g.:

public SomeAttribute(SomeType someParameter = null)

When an attribute contains a parameter in its constructor that defaults to null, the compiler incorrectly treats this as being a typeless null literal rather than a constant expression and throws the exception in question.

The proper fix for this is to upgrade the compiler on the build server. However, as that isn’t always immediately practical, as a workaround you can simply cast the null parameter to the type in question e.g.:

public SomeAttribute(SomeType someParameter = (SomeType)null)

Eric Lippert provides more information about this particular bug in this StackOverflow answer.

Posted in Fixes | Tagged , , | 34 Responses