About six years ago I met with a smart young entrepreneur from Bristol, a graduate of Oxford University and someone who was passionate about the world of storage. Luke Marsden was working on the early stages of an idea and people on both sides of the Atlantic (as well as a random guy in the South Pacific) were interested in what he was up to. Long story short, I came on board as an adviser and invested in HybridCluster.

Fast forward three or four years and HybridCluster was now ClusterHQ, and focused on solving the storage problems for stateful containerized applications. The company had also picked up a sweet funding round from some high profile venture capitalists and taken on a seemingly heavy-hitting Silicon Valley VC. That was, alas, the beginning of the end for in short order Marsden left the company which, only a few months later, closed down.

You win some, you lose some.

But Marsden’s reasons for starting that company original were pure – he had an itch to scratch and identified a very real project. You can’t keep people like that down, despite being bruised and battered. After an interlude at Container networking startup Weaveworks, Marsden is now back as founder of Subtree, a cloud-native data management startup which is launching today under the dotmesh URL. The company also announcing a $10 million seed investment made by DDN storage.

Alongside the investment round and launch, Subtree is rolling out two initial products, dotmesh and dothub, which allow the state of entire applications to be captured, organized and shared. According to Marsden, dotmesh and dothub give you the same power over data that source control gives you over code. Think of it as a “Git for data” and you’re along the right lines.

Looking back at history

I sat down (virtually, since I was on the other side of the world from him) with Marsden and asked him for some on-record thoughts about what happened at ClusterHQ. Despite the temptation to throw people and situations under the bus, Marsden was philosophical about the experience, and talked to some of the risks of building a business adjacent to a massive open source project:

ClusterHQ was a fantastic learning experience. I’m proud of what we achieved and the many strong relationships that were built in the team. Ultimately the reason that ClusterHQ failed, I think, was that we believed we had product-market fit before we really did, and we started scaling too soon. When we started, it wasn’t possible to connect storage to containers at all, and so we had to put a lot of work into making that possible. And by the time we’d got Flocker working reliably across AWS, GCE, OpenStack & a dozen or so storage vendors, we’d been commoditized by Kubernetes. Our premature scaling then made it harder to adapt as fast as we needed to. Many lessons learned the hard way!

So onto dotmesh, what is it?

In explaining the thinking behind dotmesh, Marsden pointed out that data management issues cause significant pain throughout the software development lifecycle, and that, in his observation, the increased adoption of cloud-native computing practices is exacerbating the problem as applications change faster and employ multiple databases behind their microservices. As he explains it:

In this world of increasing complexity, Subtree’s mission is to bring data into the ‘circle of control’. We have years of well-defined methodologies and tooling for dealing with code and infrastructure, but dealing with data in cloud native environments is still an ad-hoc collection of disparate best practices and home grown tooling, dotmesh and dothub are our first steps towards addressing this problem.

What that means beyond the sound bites is that dotmesh is an open source utility that lets organizations and developers capture, manage and share their entire application state in units called datadots. A single datadot captures and versions the state of all the databases, files and other data in a containerized application. The datadot is like a git repo that works for data: it can committed, branched, pushed and pulled just like with source controlled code.  datadots are kind of like recipes or snapshots of a state of containerized applications.

And once one has a recipe, there is the need for the place to keep them. Enter dothub. dothub is an online centralized repository from which users can pull and push datadots to for safekeeping and collaboration. Use cases could include developer collaboration or capturing accurate metrics from production.

MyPOV

On a personal level, I’m excited to see Marsden bounce back from his ClusterHQ experience and have another go. In my dealings with him, Marsden proved himself to be, as the English say, “a top bloke” and a smart one at that.

Putting aside my views about the history, it is worth reflecting on what is going on here with subtree. I’m reminded of an earlier time, when cloud was still a nascent approach, and a number of vendors were working on the cloud visibility or blueprinting space. Like subtree, these companies all promised that they would deliver a clear picture of the architecture and structure of applications. Back then applications, while more complex than those in the pre-cloud era, were still relatively simple and no one really managed to crank the code on this notion.

The container world is, obviously, different. The level of complexity that modern applications have is exponentially greater than in the past, and hence perhaps it is time for this application snapshotting idea to break out. The historical analogs of what Chef and Puppet did for server configuration, and the parallel uptake of developer repositories such as Git, indicate that these notions are attractive.

It’s going to be interesting to watch subtree’s progress and see where this journey takes Marsden.

Ben Kepes

Ben Kepes is a technology evangelist, an investor, a commentator and a business adviser. Ben covers the convergence of technology, mobile, ubiquity and agility, all enabled by the Cloud. His areas of interest extend to enterprise software, software integration, financial/accounting software, platforms and infrastructure as well as articulating technology simply for everyday users.

Leave a Reply