August 27, 2024 13 min read

Shipping Fast Without Breaking Things

George MacRorie

My goal with this post is to paint a picture of what building a developer centred product has meant to us at Flipt. To illustrate some scenarios drawn on past experiences, which have culminated in our latest endeavours to make feature flags more powerful, performant, adaptable and frankly, just delightful. All while being a little bit sarcastic and borderline waxing poetic.

Delivering Continuously

When Mark created Flipt back in 2019, his goal was to put an end to the proliferation of home-rolled feature flag solutions he had come into contact with throughout his career. He was tired of having to learn the ins and outs of something new every time. How did you integrate with this thing? Can it target specific user attributes? Can it do proportional rollouts? He wanted something simple to integrate with, that was easy to deploy and operate. Something that would drop nicely into a modern tech stack.

I was working with Mark at the time on a CI/CD SaaS product called Codeship. Truth be told, like a lot of self proclaimed Continuous Integration (CI) and Continuous Delivery (CD) companies at the time, we really only had a CI product that could run your deploy script. CD was the hot new thing and everyone was racing to define what is actually meant. Gene Kim’s The Phoenix Project was on the reading list and the Accelerate book had just dropped.

Build and release processes hadn’t really caught up with the Agile Manifesto(s) that had been rearranging our org charts for the last decade. Developers spent most of their time spinning in their desk chairs waiting for CI to go green, only to then have to gradually deploy that change through a series of environments designed to mimic the last one (or they were handing it off to someone else to do it for them). The goal was to foster confidence that the change wouldn’t implode when it landed in the end users lap.

Accelerate taught us that our slow release pipelines were… slowing us down. The companies that were succeeding were the ones that were evolving the quickest. How does a tech company evolve quickly, you ask? It ships changes more frequently. It (in)validates its assumptions more often. Then, if things go wrong, it can also recover more quickly.

If you skip all the hard parts, the trick to CD is to embrace trunk based development and just ship everything in your main branch to production. Congratulations, your developers are moving fast and breaking things, just like in that Jessie Eisenberg movie. The problem that soon becomes apparent is that you’re breaking things. Now the customer experience is in free fall. You might be inclined to introduce some more automated checks, maybe some kind of intermediary space you could deploy into and validate those changes in first?

Of course, that would put you right back where you started. This is where feature flags often enter the mix. Instead of just yeeting any change into production, first you give your feature a name and guard the new code with a humble if statement. You pull that code routing decision out into an external system, which is queried continuously. Then gradually you enable your feature flag for different cohorts in your application. Maybe you start by enabling it for yourself, then your internal team, then onto a small proportion of real users and so on. Eventually, because you’re a good developer, when the feature is enabled for everyone you remove the guarding flag altogether (hahaha).

Check out Flipt on GitHub

Like what you're reading? Please consider giving us a star on GitHub.

How do I unlock continuous delivery? How do I integrate this into my codebase? The initial core features of Flipt were designed with these developer problems in mind. The result was a single binary, with pluggable storage options, with pre-built and code-generated clients for various programming languages.

Who, What, Where, When and Why?

Imagine you have embraced features flags and now you have that nice warm fuzzy feeling. You’re shipping changes more frequently and doing so without jeopardising your end users experience.

But then again, how do you know that? Specifically, how do you know that the feature is enabled, who it’s enabled for and how your application is behaving when enabled vs disabled?

Is this thing on?

But you “flicked the switch” already. The flag says enabled for my personal user right there in the dashboard. However, the fact of the matter is, you’re clicking around in your application and you can’t see the new behaviour working as expected.

Have you also got one of those vestigial light switches in your house that doesn’t seem to go anywhere? This feeling is irritatingly familiar.

You start to ask yourself; does this feature just not work? Does the feature flag tool not work? Is it returning true for my requests?

Is this feature flag on? Anakin Skywalker meme.

The only way to rule out the feature flag tool is if you're measuring the results of flag evaluations within the context of your applications (where they're referenced).

Assuming our feature flagging tool integrates nicely with our observability stack (perhaps it publishes telemetry via some broadly adopted open standard), we now drill into the state of evaluations for our feature flag name, only to discover that the flag is always returning false. How is this possible?

It turns out we made a mistake when evaluating our feature flag. We never passed the callers associated organization into the context. This is what our particular flag has been configured to target, and so it never evaluates to true.

How easy was it to miss this during local development? Often our local setup doesn’t have access to a representative feature flagging setup. Particularly if the tool is in some external SaaS platform. Perhaps instead we use some simple override configuration in our local clients to pretend that flag is either on or off. However, this isn’t going to help us identify that we’re passing our context correctly in the first place.

Our feature flagging solution needs to integrate with our observability stack or come baked with an observability solution out of the box. At a minimum we need to understand the distribution of evaluation results on a per flag level. Ideally, we annotate directly into traces and events the result of a flags evaluation for the associated transaction. This way we can understand both what states the flags are exhibiting, and correlate at the individual transaction level how the system behaves when the flag is enabled vs disabled.

Ideally, we can run our feature flagging tool locally, or automatically in our continuous integration environments and ensure that behaviour changes in the expected way as we mutate and change our feature flag configuration.

You have one triggered incident

You wake up at 3am. Who is calling at this hour? I don’t know anyone in San Francisco. Oh that’s right, my colleague forgot about that holiday they had booked 6 months ago and I agreed (5 hours ago) to cover their on-call. An SLO is failing and latency has increased above an acceptable threshold. You acknowledge the page and reach for the light switch. No, that one doesn’t work, the other one.

You crack open your observability tool(s) to identify where and when latency started to increase. Service X started to get slower a couple of hours ago. If you’re lucky, you can see deployments and configuration change events directly in your monitoring tool. Otherwise, it is off to source control or your CI/CD system to learn what’s what.

You happen to commit configuration to version control, so you can see what was deployed around that time and how it was configured. However, no one merged anything around that time. What am I missing? Nothing changed!

Oh that’s right, feature flags. They’re another kind of configuration that (more often) live by their own set of rules. Turning to the feature flag dashboard, or perhaps a channel in your works communication tool of choice, you try to correlate what, when and who made changes. Sure enough you find out that a colleague in a different timezone enabled a feature flag at the end of their day. The change appeared to behave as expected, so they wrapped up and went home.

You review any documentation around the flag to understand the impact of simply disabling it. At this point, disabling the flag outright might be enough. However, the tool might offer a way to revert to the state it was in before the incident occurred. Maybe the flag was on and working fine for a different cohort. You revert the flag and leave a note for the developer to know why their flag state was altered.

At the very least, we need to understand their present and past states. Sometimes it can be useful to know in advance what someone else intends to change them to (peer review and approvals). There should be space to document and describe why changes are being made, when making them. As well as the ability to roll back time if the need arises. Ideally, we can also publish all this information into the rest of our observability stack for ease of discovery.

Reproducing the scene of the crime

The fire was put out by disabling the feature flag, but at some point we need to turn it back on again. We’re not just going to give up on the feature after all. However, you notice that simply turning the flag on for yourself does not reproduce the problem. Attempting to recreate the problem locally produces similar results.

You remember that another team is actively working on a change in a service you’re depending on. Could there be another feature flag in the mix? Enabling both flags finally causes the latency observed during the incident.

We often don’t develop features in isolation. The situation can arise that it is the intersection of two flags being enabled, which causes the alerts to start firing. Ideally, you need to be able to go back and look at the broader set of feature flag configurations at the time of the incident, in order to recreate what was and wasn’t enabled, and for which targets. Ideally, we can take snapshots of all configurations (flags and broader) at that time. Then we can experiment with them somewhere in isolation, whether that is locally, in preview environments or production itself.

Developer first, not Developer first and last

When we think of developers at Flipt, we try to imagine the engineer that builds with a product mindset, that interfaces directly and empathises with the customer, that operates their own code (or others) in production and that paves a golden path for their peers to ship code safely. We also acknowledge these tools are not just for developers alone, and that what we build should empower folks to safely hand over control to other functions in the business, where appropriate (with configurable levels of access). When developers choose the tools they need to get their job done, they have a problem domain in mind and stakeholders they’re going to serve. We strive to account for everyone.

End-user experience

Developers are building products for all sorts of people at the end of day. In a lot of ways feature flags intrinsically acknowledge this fact by definition. They exist to protect the quality of the users experience, while the product evolves beneath them. That was the core goal during Flipt’s inception, and remains an important foundation.

However, we need to acknowledge the fact that adding feature flags sometimes involves integrating external configuration into critical, load-bearing or sensitive parts of our systems. This means we need to cater for situations where we cannot leak sensitive information, introduce transient failures or add (significant) latency.

For Flipt, that means supporting both server and client-side evaluation. Client-side means evaluation takes places within your own applications, over snapshots of complete feature flag configuration delivered asynchronously to your code. Server-side means introducing an API request (HTTP or gRPC) into your code, but it keeps configuration isolated within Flipt itself. Furthermore, being able to take our open-source, self-hosted solution and deploy it into your own infrastructure is key for maintaining predictable stability, scale and cost.

In it together

Developers work alongside a whole host of other disciplines, including operators, managers, designers, product and customer success to name a few. This is a team sport.

At its core we want Flipt to be self-explanatory. Our environments need to be observable, reproducible, evolvable and revertible. As an operator, you should be able to easily understand what the desired state of the system is in any given moment (past, present and future). It should be easy to correlate external systems with changes in flag state, and in case of emergency revert back to a stable state.

We have interpreted this to mean features flags should be declared as code, committed to source control. However, as non-engineer, we don’t expect everyone to learn to code and navigate source control. So we also need graphical controls designed for broader access and these need to contribute directly to source control. Operators and developers should be able to attenuate access to these controls, meaning roles and policies can be defined and enforced through the APIs we expose.

Why are you telling me all this?

My goal here was to give you some context into how we think and what we value at Flipt. To tell some of our story so far, and hint at where we’re going.

So, it probably comes as no surprise that I had another motive to bring you here. Our new cloud product is waiting for you. You can request access today and we will help get you started with how we like to do flags. All of what I described here has gone, and continues to go into what we’re building.

We really want your feedback. We’ve got lots of ideas on where to go next, but your needs are what ultimately matter to us most. So do connect with us and let us know what you’re missing, or how we can do better.

Also, stay tuned as we’re also going to write more about the specifics in the coming weeks. How we integrate directly with Git and GitHub, how peer review and history will be surfaced in the UI and where we’re going with out-of-the-box analytics.

Come chat live with us in person in our Discord, or start a discussion or issue on our GitHub, or ask questions in our Community Forum.

Peace!

Scarf