Skip to content

Thoughts on MongoDB

jducoeur edited this page Oct 22, 2012 · 1 revision

In conversations with Aaron, I've come to the conclusion that we probably want a document-centric database for the main data store. (User data will go in MySQL, but that's a tiny fraction of the total.) One of the possibilities is MongoDB, especially since it appears that CloudBees has a partner that provides it on-demand.

Mongo is natively JSON, which suits our requirements well -- I'd been planning on a JSON-oriented store anyway. Moreover, it allows very free-form documents, and lets you index into them.

This suggests a very different architecture than I'd been planning -- actually, closer to my original plans, but with some nice tweaks.

Each Thing would be a Mongo "document". It would contain only its non-inherited, non-default property values, but it would actually contain its Property values -- since Mongo allows non-homogeneous documents, that becomes straightforward.

This implies that I do not necessarily have to sweep the entire Space into memory in order to begin working. Instead, when we reference a Thing, we load it and all of its ancestors. (Possibly, as an optimization, we have a Model flag, and we sweep all Models at first load.) When we reference a Property, we sweep in all of the values set for that Property.

Getting that right might be tricky, so I suspect that we will do the full sweep-into-memory to begin with. But it's good to know that we may be able to refactor that out later, to optimize the system.

Mongo does not appear to have the sort of diff-oriented history mechanism I would like, which is a little unfortunate. But we can almost certainly build that by hand, especially if we have a document per Thing -- computing at least a crude diff becomes pretty easy.

Mongo is designed for sharding. You need to declare the shard field early, but that's easy: we would shard on the Space ID. We probably won't start with the system sharded, mostly due to cost -- the "Replica Set Small" model costs $150/month to start with -- but we should assume that we will move to a sharded environment before the system opens to the public.

This article takes Mongo heavily to task as getting performance at the cost of possible poor reliability. That may be a serious matter to consider.

Clone this wiki locally