Note: Everything in this blog post is purely theoretical, treat it as a thought experiment. I haven’t tried this yet.

I’ve been thinking about use cases for Event Sourcing (ES). It’s most often associated with backend applications where you need strong audit logs but I’m starting to wonder if it might be a good fit for some Javascript single-page applications (SPA) as well. There are folks doing ES in node but I haven’t seen anyone try this in the browser yet, so I’ll try to outline a few reasons it might be worthwhile.

This article assumes you’re already familiar with ES. Still, just to be clear, I’m suggesting a move away from JS persistence like this:

// JS
var product = new Product({
'name': 'Foo',
'price': 90
});
product.set('price', 60);
product.add('tag', 'awesome');
product.save();
// HTTP
POST /product
Host: example.com
Content-Type: application/json

{
    'name': 'Foo',
    'price': 60,
    'tags': [ 'awesome' ]
}

to something more like:

// JS
var product = Product.import('Foo', 90);
product.discountPrice(60);
product.tagAs('awesome');

eventStream.append(product.id(), product.getPendingEvents());
eventStream.flush();
// HTTP
POST /events
Host: example.com
Content-Type: application/json

[
    { event: 'ProductImported', name: 'Foo', price: 90, id: 'some-generated-uuid' },
    { event: 'ProductDiscounted', price: 60, id: 'some-generated-uuid' },
    { event: 'TagAdded', 'tag': 'awesome', id: 'some-generated-uuid' },
]

In the first example, we’re using a pretty standard getter/setter, ActiveRecord model. In the latter, events are generated inside the entities, loaded into an event stream and then flushed to the server in one go.

Okay, so that’s the example. Why do this?

To start, when you build an MVC-ish JS app, you often end up duplicating some code on both the server and client, particularly in the model layer.

We’ve all done the dance: You need a Product model in the JS code but to save it in the database, you also need a Product model in your PHP/Ruby/Java/etc. Then when you need to add a new field, you have to update the Javascript, the PHP, the database, etc. The smell of lasagna permeates the room.

On the other hand, if we used ES, the server wouldn’t receive full blown entities. It would only receive serialized events. If the SPA is the only interested party, the server can just pass them to the persistence layer and the entire process stops there. The JS model would authoritative and the server would be much simpler since most event stores are just serializing the events.

That does bring us to a downside: we won’t need the entities but we will need Event classes and there’s going to be a lot more of those then there were entities.

That said, the events are dead simple and actually useful. This is the code you want to write. You might even be able to reduce some of this in clever implementation, especially on the JavaScript side where anonymous objects would work fine for events.

That said, any extra work is offset by doing away with a big bunch of useless code: the REST API. Don’t get me wrong, I love REST. I love hypermedia. Many of the issues this article describes would be best solved with a really well designed RMM Level 3 API. Unfortunately, most JS libraries encourage CRUD style APIs which can be a poor fit and a huge maintenance burden. If you don’t need or want to design a good API for multiple consumers, then I’d argue don’t even try: a single RPC endpoint is easier to refactor than a pile of near identical controllers and anemic models.

There are several other benefits to the server:

  • Security inspections become much simpler. If you’re dealing with a CRUD API, you need to derive and approve user changes from the data structure. With domain events, the user behavior is explicit, so security checks could be as simple as matching the Event class name to an allowed list per role (or voter or whatever you prefer).
  • The fine-grained behavior also makes triggering other server-only side effects a cinch. It’s already an event!
  • Debug logs are a classic ES benefit and doubly so when chasing errors through a complex GUI.
  • You can have one major transaction for several operations. If you were writing to several API resources, it would be nigh impossible to rollback all changes.
  • Good Domain Events are probably reusable.

I think there’s benefits for the JS as well:

  • ES often brings better surrounding architecture as well, like command dispatching and strong domain models. This can only be good for your JS, which is frequently neglected when designing.
  • Many JS UIs are already evented, listening for changes on individual fields in the model. Unfortunately, listening for changes on a single field might not be high-level enough to express what the update should be, turning the UI management code into a mess. Instead, we could publish the domain events not just to the server but to our own UI, leading to more concise updates.
  • Events open the door to some cool JS features which might normally be hard to implement:
    • Saving everything in one request, both for performance and to avoid sequence issues.
    • Saving events in local storage in case the user loses connection.
    • Maybe even an undo or rewind feature. Event Streams should be immutable but you could potentially do this with only your unsaved actions, provided your models support forward and reverse mutators.
    • Replicating changes to other users like in a game or collaborative app.

This might sound great but as they say, “no plan survives contact with the enemy.” For example, there’s still some duplication between the client and server. The JS will certainly be more lines of code than a CRUD/ActiveRecord style. The ES rehydration process (reapplying the entire event stream to the individual models) may take more CPU on load. And how would you resolve two event streams that differ significantly, say from two users?

To counter these points though: the duplicated Event code is straightforward and easier to manage than multiple CRUD controllers. It’s more lines of code but it’s simpler code. ES rehydration is often offset by snapshotting, which may work especially well if you need several items at once for loading: you can maintain CQRS ViewModels to grab all of your interface’s data in tight blobs. As for merging differing streams, I’m not sure how this differs greatly from a standard ES scenario so the usual solutions apply, say optimistic locking with a version number.

That said, there’s little defense against the additional complexity argument. To make this worthwhile, you’d definitely need a reasonably large Javascript app. However, as Javascript and user expectations continue to evolve, this might not be as rare as you’d think. I’ve already worked on at least one or two projects in my career where I’d consider this a valid or improved approach.

In the largest JS project I’ve worked on used Commands, rather than Events, dispatched to a single endpoint. This was a marked improvement and had many of the same advantages (transactions, some debug log, batching) but it also came with a lot of duplicated code and made you sometimes wonder which was the authoritative model: JS or PHP? You could put all of the logic server-side but you may get a laggy interface for your trouble.

Still, this is theoretical for me so if you know anyone who’s tried this approach, please let me know. I wouldn’t recommend it for most projects but If I could do some of them over again, there’s a good chance I’d give this a shot.

Many thanks to Warnar Boekkooi and Shawn McCool for proofreading this article.

Ansible has excellent documentation but one thing I was confused about was the best way to store the configuration for multistage projects: say, different passwords for dev, staging, production. This isn’t really covered in the ansible-examples repo because it’s specific to your project and while the documentation has recommendations, it doesn’t spell it out completely (which I need since I’m an idiot).

In the old days, the easiest place to keep the configuration was in your inventory file. Since these store the server IPs you run your playbooks against, they’re inherently stage-specific. However, storing everything in the inventory can be limiting and is officially discouraged.

Instead, the best practices guide suggests a different structure, one that’s based on keeping your config in the group_vars and host_vars directories. At first glance, the linked example confused me because it seemed to be mixing a lot things together in one file: IP addresses, role assignments, location, datacenter, etc and then mixing these together. However, after some trial & error, talking to some smart folks and a lot of googling, I’ve hit on a structure that’s worked well for my last couple of projects so I’d like to write about it here.

So, let’s take the example above and pare it down to something simpler:

We’ll create an inventories directory and place a “production_servers” inventory file in there.

; inventories/production_servers
[web]
4.2.2.1
4.2.2.2

[database]
8.8.8.8

This file does one thing and does it well, it sorts our server IPs into different groups. Now, “Group” is the magic word here. Whenever we run a playbook with this inventory, Ansible isn’t just loading the inventory. It’s also looking at the group names we set (the header sections of the INI) and then trying to match those to a file with the same name in the group_vars directory. This isn’t explicitly configured, it’s just something Ansible does by default.

So, since we mentioned a “web” group and a “database” group, Ansible will try to load the files “group_vars/web” and “group_vars/database”. These are expected to be YAML key/values lists and we can use them to define all Ansible variables that you likely have sprinkled throughout your roles. For example, the database vars file might look like this:

# group_vars/database
---
db_port: 3306
db_user: app_user
db_password: SuperSecureSecretPassword1
# group_vars/web
---
domain_name: myapp.com
csrf_secret: foobarbaz

Here we’ve defined a few variables you’d use in a role like {{ db_password }} or {{ domain_name }}.

So far, so good. By now, our ansible directory probably looks something like the example below. Keep in mind the group_var file names are based entirely on the header names inside the inventory file, NOT the naming of the inventory file themselves.

    .
    ├── group_vars
    │   ├── database
    │   └── web
    │
    ├── inventories
    │   └── production_servers
    │
    ├── roles
    │   └── ...
    │
    └── my_playbook.yml

Now comes the multistage part. We don’t want to use the same db_password for dev, staging and production, that’d be terrible security. And we probably want to change the domain name. And the SSL certificates. And all sorts of other things, which we’d prefer to maintain in just one place. How can we group the configuration together per stage?

Remember, Ansible will try to load a group_vars file for any group it encounters in your inventory. All it takes to define a group is adding a section for it in the inventory’s INI file. So, why don’t we create a “production” group?

; inventories/production_servers
[web]
4.2.2.1
4.2.2.2

[database]
8.8.8.8

[production]
4.2.2.1
4.2.2.2
8.8.8.8

We’ve now created a production group and assigned all the production servers to live underneath it so they all get the exact same configuration. I haven’t tested it completely but this is really important if your configuration overlaps between roles, such as using the db_password on the web servers.

However, duplicating all of the IP addresses is a real pain and it would super easy to add another web server and forget to update the list at the bottom of the file. Luckily, Ansible has an inheritance syntax to make this easier.

; inventories/production_servers
[web]
4.2.2.1
4.2.2.2

[database]
8.8.8.8

[production:children]
web
database

This example does the exact same thing as the previous version: it creates a group called “production” but now it’s defined as a group of groups. Any IP address added to the “web” or “database” groups is automatically part of the “production” group (at least, when running a playbook with this inventory).

That means we can now create a group_vars/production file where we can group the parts that are specific to this stage:

# group_vars/production
---
domain_name: myapp.com
db_password: SuperSecureSecretPassword1
csrf_secret: foobarbaz

These are the things we’re interested in changing per stage. Other stuff that’s the same 99% of the time like port numbers or standard users, we can leave in group_vars/database.

Now, if we wanted to add a staging setup, we only need to add two files: a new inventory…

; inventories/staging_servers
[web]
8.8.4.4

[database]
4.2.2.3

[staging:children]
web
database

and a group_var/staging config.

# group_vars/staging
---
domain_name: staging.myapp.com
db_password: LessSecretButStillSecurePassword
csrf_secret: St4gingCSRFToken

Notice that the basic format is the same, and we can use this to add any number of stages we like:

    .
    ├── group_vars
    │   ├── all
    │   ├── database
    │   ├── dev
    │   ├── production
    │   ├── staging
    │   └── web
    │
    ├── inventories
    │   ├── dev_servers
    │   ├── production_servers
    │   └── staging_servers
    │
    ├── roles
    │   └── ...
    │
    └── my_playbook.yml

In the above example, we’ve now added a dev stage which probably lists our Vagrant IP as both the web server and db server. You might also notice a group_vars/all file. This is a special file that Ansible loads in every time, no matter what groups you use, making it an excellent place to stash your default config.

So, using this setup we have a working and reasonably well centralized multistage setup in Ansible. We’ve also got the config split out nicely so we can use ansible-vault to encrypt our staging and production settings. This works really well and I’ve used it successfully in a couple projects now.

However, there are a couple gotchas to notice. The big one is inheritance order. If you define a config value in multiple groups (say, “db_port” in both “database” and “all”), then Ansible follows particular rules to determine which one wins. For this setup, the priority from highest to lowest is:

  • type (web, database)
  • stage (dev, staging)
  • the “all” file

This is kind of bad, because we probably want the stage files to take precedence but the type files are overriding. It turns out, this is because we used the “:children” style to define the stage in our inventory. This marks the “web” and “database” servers as children of the “production” group and as the Ansible documentation says “Child groups override parent groups”. We could try to work around it by making more specific groups and controlling the hierarchy more tightly:

[production-web]
4.2.2.1
4.2.2.2

[production-database]
8.8.8.8

[production:children]
web
database

[web:children]
production-web

[database:children]
production-database

But this hasn’t worked in testing for me because when the groups are equal, Ansible does precedence alphabetically so “web” is still overriding “production”. Also, while more specific, this is quite a bit more boilerplate for each ansible file.

In practice, I haven’t used the type specific group_vars files much, instead relying on role defaults or the “all” file. The end result has been much simpler and I don’t have to worry as much about where something is defined.

This brings us to the second gotcha: This is reasonably simple on the small scale but it can be more complex. Super nice guy Ramon de la Fuente told me he’s been running this setup (or one very similar) for a while and has found it a bit awkward to manage as it grows. I haven’t tried it on a very large installation yet but I’m inclined to believe him. You should check out his latest Ansible article for more tips.

Still, for a small to mid-size project, this is a straightforward, practical setup. If you do need very fine-grained control and you’re not running several servers per stage, consider looking into the host_vars directory which is the same thing as group_vars but per server instead of per group. And finally, remember, Ansible’s group system is essentially a free-form tagging system: it’s a simple, powerful way to build any setup you want.

Like, you know, Ansible itself.

Update: Erika Heidi wrote a great add-on post to this that talks about integrating this setup with Vagrant and how to configure your remote servers

The setup presented here is essentially that from the (excellent) Ansible Docs and Mailing List, I’m just explaining it a bit more. Many thanks to Ramon de la Fuente and Rafael Dohms for Ansible feedback, as well as Erika Heidi and Rick Kuipers for proofreading this article.

When writing Ansible playbooks, it’s useful to test them locally against your Vagrant box. The easiest way is just running “vagrant provision” after each change and then validating the results on the vagrant image. That said, this runs the entire playbook in full and while I enjoy a good cup of tea, there’s only so much I can have per day.

Often, we just want to check if the last couple changes run correctly. While you can’t do this with “vagrant provision”, you can do this with the ansible-playbook command’s –start-at-task option. Unfortunately, ansible-playbook often doesn’t want to connect directly to your vagrant box with just the IP. Luckily, the answer is in the Ansible/Vagrant docs (admittedly at the bottom of the page).

So, if we put these two things together, we have this command which’ll start running our playbook at whatever step we choose, directly against our vagrant box. All you need to do is swap out the name of the step you want to start at the very end of the command:

ansible-playbook -i inventories/dev --private-key=~/.vagrant.d/insecure_private_key \
    -uvagrant my_playbook.yml --start-at-task "Do Foobar"

You can then let it run and stop it with a quick Ctrl+C. Or, if you want to check each step by hand, you can add the –step switch. Ansible will then ask you to confirm or cancel before it goes on to each of the following steps in the playbook.

If you’re looking to squeeze out that extra bit of speed, you might try disabling fact gathering but I haven’t tried it and wouldn’t recommend it.

Remember though, just because your last changes work in isolation doesn’t mean your entire playbook can’t break. Always test your playbooks in full after any major changes.

When not enjoying my funemployment, I’ve been talking to a lot of folks lately about domain events. One of the questions that’s come up multiple times is how to properly raise events when creating something. Say, a batch of noodles.

I’m a simple man, so I reckon this is fine:

class Noodles
{
    use EventGenerator;

    public function __construct()
    {
        $this->raise(new NoodlesCreatedEvent($this));
    }
}

(EventGenerator trait code as a gist or video explanation)

At first glance, this might seem like it violates “Don’t Do Work In The Constructor”, especially the part about creating new dependencies. That said, if your raise method is only queueing the event internally and not tossing it off to an external dispatcher or static method, are you really doing any work? In my mind, these simple state-setting operations aren’t much different than initializing a scalar value in the property declaration. I’ve never seen anyone argue against:

class Noodles
{
    protected $knots = 0;
}

Also, while we are creating the Events here, I would argue these Events are the business of this entity and it should have all the necessary dependencies to create them at hand. Even Misko’s advice makes allowance for creating value objects (which these Events are akin to). Events do not have behavior, which is often what you need to substitute.

In short, the rule is nuanced and blindly applying it here will likely result in a more complex solution. The constructor example above is delightfully simple, even dumb. When using a rule like this, try to understand intent: One of the main goals in “Don’t Do Work In the Constructor” is making substitution and thus testing easier. However, the events are often the things we test for!

(Please note I am not arguing against this rule in general, I’m a firm believer in it. There is a special hell for 3rd-party libraries that connect to databases on creation).

As a side note, some DDD practitioners like Mathias Verraes recommend using static constructors named after the actual operations in your domain. In other words, noodles are never just “created”, they’re stretched from dough, for example. In these cases, you could create the event(s) in the static method and pass it through the constructor, which then raises it.

class Noodles
{
    use EventGenerator;

    public static function stretchFromDough()
    {
        $noodles = new Noodles(new NoodlesStretchedEvent());
        return $noodles;
    }

    protected function __construct($event) {
        $event->setNoodles($event);
        $this->raise($event);
    }
}

This works well, especially if you have multiple ways of creating noodles, such as stretching or rolling. A downside is that if you want to pass the entity into the event itself (rather than the ID which might be better) then you need to do that in two stages, either by setting the Noodles on the Event or adding the event to the Noodles.

However, the ever-clever Alexander Mols pointed out this can be simplified because PHP is class scoped, not instance scoped (see example 3). In that case, you can just invoke the raise() method inside the factory method.

class Noodles
{
    use EventGenerator;

    protected function __construct() {}

    public static function stretchFromDough()
    {
        $noodles = new Noodles();
        $noodles->raise(new NoodlesStretchedEvent($noodles));
        return $noodles;
    }
}

More so than any of my other talks, I’ve receive a huge number of requests for a video of “Models & Service Layers; Hemoglobin & Hogoblins”. At last, the original version from #phpnw13 is now online, many thanks to the awesome crew:

This version is a bit older and doesn’t have some new content found in the latest slide deck but the main points are there. Two remarks: The audio is a bit muffled so you may need to crank it up, also the in-video slides don’t have the “appear” animations (my fault for not making a screen recording) so the crowd reaction may not sync with the slides. ;)

Also, I’d like to clarify something based on some feedback from the last few months. I’m not advocating CQRS as the end goal architecture here, nor would I recommend it for general purpose projects. Like any pattern, it has a sweet spot for when and where to use it. This talk is an overview, I simply wanted to demonstrate a number of different techniques you can use when modeling. You can essentially get off the train at any stop during this talk and that’s okay. If you’re hellbent on trying something new from this talk, I would suggest Domain Events.

That said, CQRS and Event Sourcing are really cool and if you do have a project that suits it (see Fowler and SO), by all means, use it! And then blog about it.