NoSQL standouts: The best document databases

Which document-oriented database is right for your app? Follow this guide to the most developer-friendly NoSQL databases

NoSQL standouts: The best document databases
Thinkstock

“The right tool for the right job.” If such wisdom holds true anywhere, it certainly holds true with the choice of database a developer picks for a given application. Document databases, one of the family of data products collectively referred to as “NoSQL,” are for developers who want to focus on their application rather than the database technology.

With a document database, data is not stored in tables with distinct column types. Instead, it is stored in freeform “documents” with any number of fields and any number of nested structures. Such documents are typically represented as JSON, and updated either by way of APIs or by sending JSON to a REST endpoint. Most every modern programming language supports JSON and REST, so working with a document database feels more like working natively with those data structures than working with a traditional database.

This schemaless design, as it is called, has its limitations. A developer must do more work to ensure that inserted data is consistent, because such consistency isn’t always guaranteed by the database itself. SQL, the standard-issue and widely understood language for database work, isn’t supported by most document databases, so those with existing database expertise must start from scratch. But the convenience, speed, scalability, and versatility of a document database is hard to beat when you’re writing an application that needs a protean, free-form data structure.

Here we’ve profiled seven of the best known and most widely used document databases. Four of the seven—CouchDB, Couchbase Server, MongoDB, and RethinkDB—are open source projects with few or no practical barriers to getting started; Couchbase and MongoDB are also available in supported enterprise editions under commercial licenses. The other three—Amazon DynamoDB, Google Firebase, and IBM Cloudant—are hosted services from major cloud vendors, where close integration with other services in those clouds is a big draw.

See the table below to compare features; scroll right in the table to see all columns, using the scrollbar at bottom. Read on for brief discussions of each database.

Amazon DynamoDB

Amazon’s DynamoDB document store began life in 2012 as an extension of Amazon’s SimpleDB. Under the hood it is powered by a key-value store, Dynamo. A co-developer of DynamoDB would later draw on many of the same ideas to create Apache Cassandra. 

DynamoDB features

Like most of Amazon’s other cloud offerings, DynamoDB is a pay-as-you-go-for-what-you-need managed service. Developers set how much storage capacity to provide for keeping either unstructured documents or key-value pairs, and choose a flat hourly rate limit for read and write requests to the database. No need to provision servers or configure replication—Amazon handles all of that under the covers, and recently added autoscaling to the mix.

Naturally, DynamoDB offers developers useful integrations with other services in the Amazon cloud. Triggers, for instance, can be set up by way of AWS Lambda functions. Amazon’s BI and analysis tools are also nearby.  The proximity to these services is convenient, but it also means Amazon can upsell functionality any number of ways. Caching and acceleration a la Redis, for instance, are available by way of the DynamoDB Accelerator, a cost-plus add-on.

DynamoDB Local

You won’t find DynamoDB in an open source incarnation. It’s available exclusively as a hosted offering on the Amazon cloud.

That said, unlike many other cloud-native databases, DynamoDB is also available in a version that can be downloaded and run locally. But DynamoDB Local is not intended for production use, but rather as a way to stage an application in a test environment without requiring connectivity or running up an Amazon bill.

Microsoft Azure Cosmos DB

Cosmos DB is an ambitious project, a database system that encompasses multiple models for storing and retrieving data. Cosmos DB can serve as a document database, a columnar database, a graph database, or a key-value store, allowing the user to pick the paradigm that suits them and draw on various APIs for working with those paradigms. 

Cosmos DB features

Rather than invent an entirely new API for a document database system, Cosmos DB provides an API compatible with the popular MongoDB (discussed below). Among the benefits is that existing code that uses MongoDB interface libraries or MongoDB’s binary wire protocol can work as-is. It amounts to Cosmos DB being able to provide MongoDB as a service. Likewise, Cosmos DB supports the API of Cassandra, the popular column-family database. 

Microsoft touts several advantages to Cosmos DB that aren’t necessarily exclusive to its document database functionality, but are intended to appeal to those building document database applications. One such offering is tunable consistency levels. If you have some classes of document transactions that require stronger consistency across Azure regions than others, you can manually specify them on a per-transaction basis.

Other features are more specific to document databases. For instance, MongoDB users have to set up indexes on document collections to optimize searches. Cosmos DB users working with the MongoDB APIs don’t have to set up indexing for documents, as every property in an inserted documented is automatically indexed.

Using Cosmos DB on Microsoft Azure

There’s no locally hosted version of Cosmos DB. It’s only available as a service in the Microsoft Azure cloud. That said, development APIs for Cosmos DB are available for most every popular enterprise language—Java, Node.js, .NET, and Python.

Couchbase Server

Couchbase is not so much sibling to CouchDB as successor. Couchbase was built on work done in CouchDB and Membase, but is not related to either of those projects. It’s a document database and distributed key-value store rolled into one, with advanced features like automated failover and cross-datacenter replication, intended for enterprise use cases.

Couchbase features

One feature that sets Couchbase apart, not just from other NoSQL competition but from its predecessor CouchDB, is its SQL-like query language called N1QL (pronounced “nickel”). N1QL doesn’t offer the full range of commands you would expect from an ANSI SQL implementation, but it provides enough useful functions, such as JOIN operations, for someone with SQL experience to get workable results.

The Couchbase query system is not just for developers, but for the DBAs and business analysts who normally deal with conventional databases. Features like the EXPLAIN keyword seem to have been put in specifically to appeal to that crowd.

As a combination document database and key-value store, Couchbase stores documents by using their unique identifiers as the key. Documents can also be assigned time-to-live values, to function like a key-value cache. That said, a true key-value caching system like Redis will be far faster for basic key-value storage, but Couchbase is more flexible, and Redis and Couchbase can be combined effectively to speed things up. On that note, Couchbase has native support for the Memcached protocol, so existing applications that use Memcached can plug into Couchbase as a substitute.

Couchbase Community vs. Enterprise

Couchbase Server comes in a full-blown for-pay enterprise edition, a free-to-use community edition, and an open source edition, which is the foundation for the others. Binary downloads for the enterprise and community edition are available from Couchbase’s site, and the source code is available from Couchbase’s developer site. (There is no one GitHub repository for the Couchbase open source project as it is an aggregation of several projects.)

The community edition can be deployed in production, but lacks the more advanced features the enterprise edition as well as support, so non-buyer beware. Some features in Couchbase, such as its horizontal scaling functionality, have found their way into the CouchDB project, but that is more the exception than the rule.

Couchbase Lite

Another edition of Couchbase worthy of note for app developers is Couchbase Lite, an embeddable version of Couchbase that can synchronize with instances of the full-blown edition. Couchbase Lite is the key component in Couchbase Mobile, an application stack for mobile apps that need a data store that synchronizes automatically with a back end. Couchbase Mobile is available for iOS, Android, Java. .Net, MacOS, and tvOS.

CouchDB

The CouchDB project was begun in 2005 by a former IBM developer and moved to the Apache Software Foundation in 2008. It is sometimes assumed that CouchDB is the basis for Couchbase, but CouchDB and Couchbase are parallel projects with different aims.

CouchDB vs. Couchbase

Whereas Couchbase is both a document database and a key-value store, CouchDB is strictly a document database. And while Couchbase has long focused on enterprise features such as fault tolerance and a SQL-like query language, such niceties are only beginning to arrive in CouchDB.

CouchDB features

CouchDB emphasizes simplicity of deployment and ease of use. Retrieving data from the database is as simple as sending JSON-formatted queries to a REST HTTPS endpoint, with the results returned in JSON. Most every modern programming language can do these things, and also perform the mapping and reducing needed to create the views behind CouchDB queries and reports. There is no need for an ODBC driver or a data connector.

One of CouchDB’s special sauces is its data reconciliation technology. Changes made to one CouchDB peer are automatically reconciled with others, in a manner akin to a version control system. Any conflicts between document versions are retained as if they were previous revisions to that document.

This eventually consistent model is useful for databases that aren’t always or consistently connected (such as for intermittently connected mobile applications), or in cases where you don’t need the latest-and-greatest version of data in a particular node. But eventual consistency is also one of CouchDB’s biggest caveats. If you do need immediate consistency, CouchDB is not the place to find it.

Scalability has long been a weak spot for CouchDB, but it has recently been addressed. Version 2.0 stirred in a new clustering technology, courtesy of bits open sourced by Cloudant/IBM and merged into the project. Finally, for those who are familiar with MongoDB and want to use a similar declarative query syntax, the Mango project, also from Cloudant/IBM, provides that as an external add-on.

CouchDB download

CouchDB binaries for all major platforms, and source code, can be downloaded from the official CouchDB site. Source for the project is available on GitHub as well.

Google Firebase Realtime Database

You might think of Google Firebase as Google’s answer to DynamoDB—a way to provide fast-syncing data storage between a cloud back-end and local apps on multiple platforms.

The Firebase Realtime Database is just one component in the Firebase stack, intended for building apps heavy on audience engagement and insight. The whole stack includes functions like authentication, performance monitoring, user analytics, and many others, but here we focus on Firebase itself.

Google Firebase features

Google acquired Firebase in 2014. In the years since, it has wired up Firebase to take advantage of many Google Cloud features. Google Cloud Functions for Firebase, for instance, allows you to trigger JavaScript functions in the cloud in response to Firebase events. Google Analytics for Firebase lets you pull mobile app data into BigQuery for deeper analysis.

As gaming is one of Firebase’s target applications, the SDKs provided for Firebase include the Unity cross-platform game development framework. Developers working on more conventional enterprise-focused or consumer-facing projects have plenty of other choices: native iOS and Android, C++, generic web/JavaScript, and any other language that supports REST (Java, Python, you name it).

Firebase is designed to work in scenarios where connectivity isn’t guaranteed. Like CouchDB, it caches changes locally when offline, and automatically synchronizes with the back end when connectivity returns. Note that Firebase isn’t designed to be used as a standalone, entirely offline solution; on Android, for instance, local databases are limited to 10 MB in storage.

Firebase on Google Cloud and GitHub

Firebase isn’t available as a standalone product, but is only available as part of Google’s cloud products offerings. The Firebase GitHub repository has source code for the SDKs and for various platform-specific tools.

IBM Cloudant

Cloudant is essentially IBM’s hosted edition of CouchDB. Originally, Cloudant was an independent company, offering an edition of CouchDB called “BigCouch” that was hosted on IBM’s SoftLayer cloud. In 2014, IBM acquired Cloudant outright as part of IBM’s overall push towards analytics and big data

Cloudant vs. CouchDB

Cloudant is meant to be more than a hosted version of CouchDB. Cloudant provides features not readily available in CouchDB itself, such as natively integrated full-text search. Full-text search in CouchDB typically requires integration with external projects. Data can be replicated in both directions between Cloudant and an instance of CouchDB, so it’s relatively easy to move between either one as needed.

Some of Cloudant’s improvements to CouchDB have found their way back into the underlying CouchDB project, including CouchDB 2.0’s horizontal scaling functionality and the Mango query language interface. But don’t take that as proof that Cloudant features will automatically trickle down to CouchDB.

Cloudant on IBM Cloud

Cloudant is primarily a cloud offering on IBM Cloud, where it can be used in conjunction with other IBM Cloud data products such as dashDB, DataWorks, and Watson Analytics.

Cloudant Local

A behind-the-firewall edition of Cloudant, called Cloudant Local, offers all of the same functionality as the cloud-hosted offering. Cloudant Local is available on the Ubuntu and Red Hat flavors of x86 Linux, as well as IBM’s own System z running Red Hat or Suse. Developers can download a free, test-and-dev-only version in a Docker image.

1 2 Page 1
Page 1 of 2