- Introduction to NoSQL by Martin Fowler
- NoSQL Distilled@martinfowler.com
- NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
Looking around the NoSQL resources, and watched/read the above ones. It has great explanation for the advantage/disadvantages of NoSQL approach over traditional relational databases. The followings are some notes.
Application development productivity
A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.
For application developers, the biggest frustration has been what’s commonly called the impedance mismatch: the difference between the relational model and the in-memory data structures.
NoSQL’s is often driven by the scaling capability through clusters, but also development productivity is one major factor. Mapping database with in-memory data structure has been the pain in the neck during the application development. Though ORM (Hibernate, ActiveRecord, etc.) alleviates some of them, it still requires certain care and efforts to gain both productivity and effective performance.
Aggregate orientation takes a different approach. It recognizes that often, you want to operate on data in units that have a more complex structure than a set of tuples.
As we’ll see, key-value, document, and column-family databases all make use of this more complex record.
Aggregate is a term that comes from Domain-Driven Design. In Domain-Driven Design, an aggregate is a collection of related objects that we wish to treat as a unit.
In relational databases, data is normalized and split into multiple tables. Instead, NoSQL is storing the primary data with related objects into single item. This approach focuses on maintaining data integrity for each item, rather than a transaction that handles multiple independent tables and rows, which RDB is taking. This aggregate-oriented approach makes it easier to distribute the data over multiple cluster nodes.
A common statement about NoSQL databases is that since they have no schema, there is no difficulty in changing the structure of data during the life of an application. We disagree; a schemaless database still has an implicit schema that needs change discipline when you implement it
The claim that NoSQL databases are entirely schemaless is misleading; while they store the data without regard to the schema the data adheres to, that schema has to be defined by the application, because the data stream has to be parsed by the application when reading the data from the database.
NoSQL’s flexible scheme allows concentrating on the domain design, but schemaless database still has an implicit schema that needs change discipline when you implement it. Also, as the NoSQL’s aggregated data is not normalized, analyzing the data from different perspective from the primary-key, and it requires to create index (materialized views) for them. These factors need to be taken cared.
Consistency and Availability
The CAP theorem states that if you get a network partition, you have to trade off availability of data versus consistency.
NoSQL advocates the capability of data distribution. However, there’s a trade-off between consistency and availability. It often involves business decision about which is important for the provided services.