DevOps isn’t just about pager duty

A recent DevOps conversation I had with a good friend who is a development manager went a little something like this:

H: Can you imagine me having to get Developer A or Developer B to really understand how to subtleties of distributed tablespaces and shards?  That’s why we needed Ops DBA C, to fix all that when it got to Staging…

That’s when I realized this is the fundamental issue I’ve had and continue to have with a lot of DevOps conversations, presentations, and on.  Everything focuses on Developer Responsibility, getting developers on the pager rotation, Infrastructure as Code.

All of those are things I absolutely have and do believe are required in a successful software delivery organization.  How can you possibly care that your software woke ME up at 3AM every night this week if the only consequence is a ticket?

But, there’s also the notion, direct or implied that the Developer is King.  Need a server, there’s an API for that.  Need a network, there’s an API for that.  Need a datacenter, there’s an API for that.  Move aside and let the roll outs flow.  The problem is, you don’t know what you’re doing.  I don’t mean that in a nasty way.  But you don’t.

But you wouldn’t expect to.  If I threw you in a “legacy” datacenter and asked you to design a working network with appropriate subnets, ACLs, firewall rules, routing, and gateways, you’d tell me to get bent that’s not your job.  Heck, if I asked a Java developer to go re-write this EJB in Python, I’d expect the same answer.  Cloud, virtualization, management tooling doesn’t magically supplant the need for the infrastructure operations skills.  It means that ops folks (like me) need to learn a new set of skills and tools.

That’s why horses can’t become unicorns.  And why unicorns can’t tell a horse how to grow a horn.  The unicorn has an innate understanding that infrastructure needs to be part of the design process.  As early as possible (but not too early), the design process needs to start understanding how infrastructure and technology choices impact the overall design.

Why not too early?  Premature optimization.  You may know that you plan to use an IaaS, so you need to understand that ,say,  persistent state is a potential issue.  But you won’t need solve for state options in the IaaS until you really know what state your application will actually need.  Maybe you’ll need some sort of keystore, maybe you can push state all to the client side.  The infrastructure components need to be treated as any other technology choice (language, framework, data store, client platforms) and adapted to as needed.

Software goes through several adaptions to environment; scalability, performance, availability are all achieved through real-world adaption.  You can design for them, you can test for them, but the only true way to scale successfully is when you see what actual users are doing with your software and responding appropriately.  The truth is that software is never “done”, it only stops getting used.  Code complete really means we’ve got the features on the list in the milestone working.  From there, you now need to survive deployment to the environment and contact with the user.  Those require adapting to conditions and evolving the software.  And that’s a constant thing.

This is what the “unicorns” do and don’t talk about directly.  It’s the theme under Amazon’s app lifecycle ownership.  It’s the theme under Netflix’s Simian Army.  Push Ops left and be prepared to continually adapt your software until you retire it.

Leave a Reply

Your email address will not be published. Required fields are marked *