The data structures used by NoSQL databases—key-value, wide column, graph, or document—differ from those used by relational databases. As a result, NoSQL databases. NoSQL databases can be scaled across thousands of servers, though sometimes with loss of data consistency. But what makes NoSQL databases especially relevant today is that they are particularly well suited for working with large sets of distributed data, which makes them a good choice for big data and analytics projects.
How to choose a NoSQL database: Key factors
With more than two dozen open source and commercial NoSQL databases in the market, how do you choose the right product or cloud service?
One vital factor is to know the purpose to which you want to put the data, says Carl Olofson, an IDC research vice president.
NoSQL databases vary in architecture and function, so you need to pick the type that is best for the desired task:
- In general, key-value stores are best for the persistent sharing of data by multiple processes or microservices in an application.
- If you plan to do deep relationship analysis for proximity calculation, fraud detection, or evaluation of associative structure, a graph database might be the better choice.
- If you need to collect data very rapidly and at high volumes for analytics, look at a wide column store. Such NoSQL databases tend also to offer document and graph support as well.
Don’t assume your initial project is the only usage model that you will apply to the database. You might start out just doing state or session data management, then look to do transaction processing, and still later do some analytics.
For the near term, the focus should be around performance, scale, security, support for various workloads (including transactional, operational, and analytics), integration with existing ecosystems, administration effort, cloud support, and type of use cases supported, says Noel Yuhanna, a principal analyst at Forrester Research. Of these, security is critical. NoSQL databases that have security certifications should be given higher consideration. Look for features such as encryption of both data at rest and data in motion to protect sensitive information.
Also, not all NoSQL databases can scale well, Yuhanna says, so don’t take for granted that just because a product is in the NoSQL category it will scale and perform better than relational databases.
NoSQL offers different consistency levels in the scale-out model, so look at solutions that meet your specific requirements. For example, if you want to support highly critical banking-like transactions, relational databases are still the best solution.
The NoSQL databases you should consider
Here are the NoSQL databases you should consider.
MongoDB is the most popular NoSQL database. A free and open source, cross-platform, document-oriented database, MongoDB uses JSON-like documents with schemas. The platform is maintained by MongoDB Inc. and is published under a combination of the Gnu Affero General Public License and the Apache License.
MongoDB Atlas incorporates operational best practices the company has learned from optimizing thousands of deployments at organizations of all sizes. The cloud-based offering handles database management, setup and configuration, software patching, monitoring, and backups, and it operates as a distributed database cluster.
Key features and capabilities include fully managed backup, continuous backup, point-in-time recovery, queryable snapshots, automatically generated charts, a real-time performance panel, and customizable alerting. Users can import live data to MongoDB Atlas with minimal impact to applications, using the built-in Live Migration Service.
The database is optimal for natively storing, processing, and accessing documents and other types of data sets, and it is popular among developers because it's easy to use, scales to meet demanding applications, and offers a comprehensive ecosystem of tools and partners, Yuhanna says. Common use cases for MongoDB include personalization, real-time analytics, internet of things (IoT), big data, product/asset catalogs, security and fraud detection, mobile applications, data hubs, content management, and social and collaboration applications.
Amazon DynamoDB is another popular cloud-based NoSQL database. Amazon DynamoDB is a fully managed NoSQL platform that uses a solid-state drive (SSD) to store, process, and access data to support high performance and scale-driven applications.
It automatically shards data across servers based on the workload's throughput and storage requirements, and handles larger high-performance use cases.
Users can scale, monitor, and manage their tables both via application programming interfaces (APIs) and the Amazon Web Services Management Console. DynamoDB is tightly integrated with Amazon EMR (a managed framework for Apache Hadoop, Apache Spark, and HBase) that offers the ability to run queries that span multiple data sources.
The platform supports both key-value and document models and also has a library for geospatial indexing. Organizations use DynamoDB to support a variety of use cases, including advertising campaigns, social media applications, tracking gaming information, collecting and analyzing sensor and log data, and e-commerce.
DataStax and DataStax Enterprise Platform
DataStax leverages Apache Cassandra for distribution across data centers. A strong plus for DataStax NoSQL has been its global distributed architecture, says Forrester’s Yuhanna. DataStax distributes, contributes to, and supports the commercial enterprise version of Apache Cassandra, an open source project. Cassandra is a wide-row store, distributed key-value database based on Google Bigtable.
Among its key features are fault tolerance, scale-out architecture, low-latency data access, and simplified administration. DataStax provides additional features such as analytics, search, monitoring, in-memory, and security to support critical applications.
DataStax Enterprise supports various types of business applications, including transactional, analytical, predictive analytics, and mixed workloads. It offers broader multi-model capabilities with support for graph and JSON data. The top use cases include fraud detection, product catalogs, consumer personalization, recommendation engines, and IoT.
Couchbase is a JSON document support database platform distributed by Couchbase Inc. The open source NoSQL DBMS supports broad use cases.
Couchbase Server, an open source NoSQL key-value and document database with built-in cache, appeals to enterprises that need a database that can deliver performance, multi-model, scale, and automation, Yuhanna says.
Organizations use Couchbase to support social and mobile applications, content and metadata stores, e-commerce transactions, and online gaming applications. Couchbase provides full support for documents, flexible data model, indexing, full-text search, and MapReduce for real-time analytics.
The platform is used by large enterprises to support various critical workloads, including operational and analytical processes.
Sponsored by Redis Labs, open source platform Redis Enterprise is one of the most common key-value NSQ databases, says IDC’s Olofson. (Learn more at InfoWorld about using Redis for real-time metering, managing access control, and traffic-shaping WebSockets.)
Redis offers a high-performing, in-memory database that supports both relaxed and strong consistency, a flexible schemaless model, high availability, and ease of deployment, says Forrester’s Yuhanna.
Redis Labs developed additional features and technology that encapsulates the open source software and provides an enhanced deployment architecture for Redis, while supporting the open source API.
The data model supports key-value; a variety of data structures such as lists, sets, bitmaps, and hashes; and a range of models through pluggable modules such as search, graph, JSON, and XML. Redis supports a variety of use cases, including real-time analytics, transactions, data ingestion, social media, job management, message queuing, and caching.
Other NoSQL options
Other open source and commercial NoSQL database offerings include:
- Blazegraph, from Systap
- Google BigQuery, from Google
- Helium, from Levyx
- MarkLogic NoSQL Database, from MarkLogic
- Microsoft Azure Cosmos DB, from Microsoft
- Neo4j, from Neo4j
- Oracle NoSQL Database, from Oracle
- Riak KV, distributed by Basho
- ThingSpan, from Objectivity
- Titan, from Aurelius (which was acquired by DataStax)
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.