Book review: Seven Databases in Seven Weeks

That's a really nice book. From the seven databases that are covered I was familiar with PostgreSQL and only briefly with Neo4j. So the book gave me the chance to explore some more databases and find out about their strengths and weaknesses. In the following paragraphs I'll explain what I found nice and what not so nice about each of them. Before I start: if you are planning to buy this book, I want to warn you that some features are deprecated or even removed, because some of the database systems have evolved since the time the book was written (2012). For example the largest part of the Neo4j chapter is useless, because it doesn't use the Cypher language.

PostgreSQL rocks. It's a very powerful RDBMS and I acknowledge that since I have used it professionally. Postgres is mature, fast, and rock-solid. For those reasons I would choose it for all problems that play nicely with relational DBs. And yes, an RDBMS is not the answer to all problems. For example distributed computations do not fit well into this model. Scaling is limited to making your single DB server/cluster more powerful by upgrading/extending its hardware. And not all problems require full ACID compliance and strict schema enforcement.

Riak is flexible. Being able to interact with a DB using a REST interface and a tool like curl should not be underestimated. What I like about Riak is that you can store whatever resource (be it a document, an image, etc.) you like on the fly and map it to your URL of preference. It just works! I see Riak as a Web filesystem that supports distributed computations through mapreduce. But Riak also supports connecting resources and traversing between those connections (link walking). On the down side configuring, and understanding some Riak concepts (for example conflict resolution and adding indexes) is currently a pain. And you can only find prebuilt binaries for your operating system (Windows is not supported at all) on basho.com.

HBase is unusual. It takes some time to understand the way a column-oriented DB works. What I found great is that versioning is builtin. If you care about data history that's a big deal. Another plus: compression and fast lookups using bloom filters are also builtin. Great features, that can save a lot of time of development. The negatives: no REST interface, complex configuration, and no prebuilt binaries -- you need to compile HBase on your own, so forget Windows unless you like pain.

MongoDB is all about JavaScript. Having the full support of a powerful language like Javascript while using a DB is very valuable. Being able to save JSON documents adds a lot of flexibility since they can nest arbitrary. But this flexibility comes with a cost: updating a document means replacing it without a warning, deleting specific elements of a document is not supported and debugging JavaScript code is a pain. On the contrary: the mapreduce support of Mongo is nice, and it also supports indexing documents. Configuring replicas and sharding is also quite easy. And operating system support is very good.

CouchDB is cute. The Futon Web interface makes CouchDB very user-friendly. Its REST interface and the ability to use curl makes it developer-friendly. Moreover, CouchDB has an interesting approach regarding replication, since all servers are treated equally (no master-slave model). The same is true for conflict resolution: one of the conflicting updates is automatically considered the winner, and this is consistent through all nodes. But that's not necessarily the "correct" update... One last thing: CouchDB is easy to install on all popular platforms.

Neo4j is the graph database. There are simply no competitors when it comes to modelling relationships (think of social networks, movies, food, drinks) using graphs. Neo4j has its own query language (Cypher) and a very nice browser that makes experimenting easy. The documentation is also extensive and interactive. Building a cluster is easy. The negatives: learning curve (new concepts and new language), the enterprise edition is not free (gratis).

Redis is generic. It's not a DB as such, but more an in-memory data structure storage toolkit. Redis is simple to use, fast, and supports transactions. Its commands have strange names though, probably the result of an effort to avoid verbosity. Because it is very generic, Redis can be used as a fast in-memory cache for applications that require high performance.

Final comments: Some people have proposed a better definition of the name NoSQL: Not only SQL. I like this definition. Similar to programming paradigms and languages, different database systems have both strengths and weaknesses. Why not use more than one to achieve our goals? That's the main idea behind the polyglot persistence concept, as suggested by the authors. Polyglot persistence means using more than one databases to target different application layers. For example Redis for caching, Neo4j for modelling relationships, and PostgreSQL for persistence.

Book review: Seven Databases in Seven Weeks

Show me your Identity …

Intro

Posting to LinkedIn and …

Background

Dirty clouds