This blog post is a continuation of our blog post on how we landed at Elixir. You may want to read or at least skim that one before diving in here.
Choosing Elixir as a programming language is not enough – like a lot of powerful systems, it comes as a collection of parts and tools rather than as a completely pre-assembled solution. While choosing the bits and pieces and fitting them together is neither hard nor a lot of work, decisions still need to be made.
For example, we needed to answer the questions: “Monolith or microservices? API+frontend or integrated UI? How and where shall we store our data?” Our answer to those questions and combination of systems formed what some call “CELP” (which stands for Commanded, Elixir, PostgreSQL, and Live View).
Let’s talk about how we came to this CELP “secret sauce” below.
C: CQRS/ES and Commanded
To start with data storage, we experimented with [CQRS]/[ES] using a fully “serverless” AWS architecture. Events were stored in DynamoDB, we used SQS and Lambda for event notification and processing, all glued together with bits of custom C# code.
The setup was very simple and we saw it as an experiment, but it showed us two things:
- “Serverless” comes with very high latencies. It took a long time before things finished their round trip through the system, requiring extra work for interactive flows. Not a showstopper, but it definitely made things a bit more complicated than we wanted.
- How the principle of CQRS/ES influenced our thinking and design. We really liked this way of thinking so we decided to stick with it for the time being (every decision is, of course, only good until invalidated :-)).
Luckily, the Elixir ecosystem sports an excellent package called Commanded which makes it extremely easy to get going with CQRS/ES on the platform. It is in production at various places and a cursory check showed that, at the very least, it supported everything we thought we’d need at the time – Commanded gave us the “C” of CELP.
If you’d like to read up on how we chose Elixir, you can see the blog dedicated to our process here. And now back to our regularly scheduled programming.
L: Live View
Over the years, browser-based UIs have progressed in the same way as their “classical” siblings. Where these evolved from forms on dumb terminals to rich client-server graphical user interfaces, browser-based UIs or web apps in short started as forms on dumb browsers and are now often rich client-server applications. In these, most of the user interface runs in the browser and instead of whole pre-rendered pages just data is transported back and forth through what are essentially remote procedure calls.
specialization. Either you form two teams and now have a coordination headache, or you keep one team and have developers switching between two entirely different modes of thinking – the situation is hardly ideal.
Elixir was born partially out of frustration with limitations of the Ruby programming language, and therefore it should not be a surprise that from the very beginning it came with an excellent “classical” (server-side templates) web application framework called Phoenix to ensure that Ruby people, used to its flagship web framework Rails, would not look in vain for a similar experience when switching to Elixir.
It is not in scope of this post to delve into all the details, but hopefully this YouTube video by one of its creators, Chris McCord, will help in convincing you of its powers. For us, it was a no-brainer to go with LiveView as it had already seen a good dozen releases since its introduction and this was stable enough to get going. We have the “L” in “CELP” now, almost done.
Commanded supports a couple of storage backends, but the most widely used one is a backend that stores events in PostgreSQL. PostgreSQL is an excellent open source relational database with a pedigree spanning now almost 5 decades (if you account for the jump start it got through borrowing from its predecessor, Ingres).
Given that our hosting/cloud provider, AWS, has excellent first-class support for PostgreSQL with its RDS product, this was a box that was easily ticked and we with “P” the acronym is complete.
The Power of Having Everything Asynchronous
In the CELP stack, asynchronous communication is key. Where in a typical “CRUD” app, a transaction is often a mix of reads and writes where everything waits for each other, CELP wants you to trust the system and just fire-and-forget.
You may have multiple steps – in CRUD, a form submit triggers an avalanche of actions, like validation, database update, read of new data, and display of the new data, which are often highly coupled. CELP allows you to have highly decoupled and very simple individual steps that are usually asynchronously coupled.
You don’t need to wait for something to happen and then read a new state and re-display things. Instead, you rather fire off a change in the system, a machinery starts ticking and out comes a notification that will push an update to your UI. And this is normally fast enough that the user never notices there was a delay. And instead of a complex amount of spaghetti, you have a handful of simple, easy to test, and highly reusable steps that the stack chains together.
Decoupling is the holy grail of scaling software complexity, and CELP actively encourages it where other systems make it “doable, but hard”.
Our Experience So Far
In spring 2021 we decided to take the plunge and start redoing our main backend (which back then was simple enough to make a rewrite feasible). Co-workers with no prior experience did some work like going through Dave Thomas’ excellent course Elixir for Programmers while the project got set up by those with prior experience. Learning Elixir if you already know other languages isn’t too hard and pretty quickly, the whole team was going at it.
It took us around six weeks to get most functionality working and, indeed, probably another six to tweak the last bits :-). That’s software development in a nutshell, of course. Cut-over was pretty uneventful, and thanks to our already existing event-based architecture we could just let both event streams converge before making the switch (essentially, we replayed all the events stored in DynamoDB in the new system to make sure we had the same data)
One promise that CELP made true, in comparison with the old system, was speed. Changes are initiated by commands, which when applied to the domain model create events, which in turn cause our read-only projection database to be updated.
The update also sends Phoenix PubSub signals so that active LiveViews can refresh their data. And this all happens in the blink of an eye so that when a user changes something in the UI we can often just let this asynchronous mechanism handle things, the view is updated before the user starts wondering what is going on, the new data is already there.
An added benefit here is that all other users automatically also see the data change; the sort of liveliness that is often hard to implement and get right on other platforms is now the default for us.
Elixir is built on an absolutely rock solid foundation, so the system has given us no surprises in production. It just works, it is frugal with hardware resources, and if it fails it is usually with a loud bang and a very big pointer at the cause (invariably of our own making), Further, its immutability and process model seem to make a whole class of subtle production issues impossible.
We just passed a billion events stored, and while we know that we want to do some work on the platform to make a couple of things nicer (especially around expiring old events), there’s been no reason for us to doubt our decision to go with CELP.
In the next installment, we’ll discuss how Elixir and running things as a monolith allowed us to fire up the DeLorean, check the flux capacitor, floor the pedal, and hit 88 mph in reverse to move us back to a simpler era of infrastructure design.