Document DB’s Data Modeling: 101

Amit Raj
Dev Genius
Published in
4 min readJul 3, 2022

--

This blog is part of the series where we discuss 101 concepts from Ground Zero for an audience that has limited starting knowledge. This article comes in the Intermediate-Level Series since it involves understanding the of Data Modeling in Document Databases. Document DB’s provide more flexibility and scalability to developers as compared to traditional SQL databases and have become a popular choice for distributed applications.

Some of the earlier blogs in the 101 Series are as follows:

DataBase Replication 101
DataBase Sharding 101
Caching Strategy 101
Circuit Breaker 101
Async Communication 101
Database Design 101

What is a Document DataBase?

Document DB’s is a subset of NoSQL databases which stores data in the form of json-like documents unlike a defined schema of tabular rows and columns. Unlike SQL databases, different documents in a DB instance can have variation in stored fields, hence making it easy to meet new data requirements for business needs.

Each document is referenced by a unique key , mostly a system generated value. Also, given application code can be modelled around the fields in the documents instead of using complex SQL queries, it helps improve the overall developer productivity.

What is Data Modeling ?

Data Modeling decisions are mainly focused around the access pattern of the application i.e. fields which need to be accessed together need to be stored in a single document.

Unlike relational databases, a fixed data model isn’t mandatory before storing documents into collection. However, adding a data validation of minimum fields needed for each document-type is suggested. Fields which are infrequently accessed can be linked in separate documents using database references.

Logical Data Modeling

This step involves defining logical elements of the documents i.e entity keys, entity attributes and entity relationships.

  • Entity key is the unique key/set of keys to reference a document. Ex: _id is the default primary key in mongodb.
  • Entity attributes can be a combination of data types such as string, boolean, arrays etc
  • Entity relationships can be One-to One, One to Many or Many-Many relationships depending on relationship between the documents

Physical Data Modeling

This step involves definition of physical containers to store the entities and documents defined as part of the logical modeling.

Document Design — Entity Relationships

The JSON structure of the documents allows defining relationships using either embedding (nested structure) or creating references of related documents using the primary keys.

  • A fully Normalised Data Model stores each entity as a separate document , and the entity relationship is defined using references. This is similar to relational databases schema’s and using this hinders the inherent flexibility.
  • At the other extreme De-normalized end, all related entities can be stored in a single bulky document. However, this will result in entity repetition as well extended storage needs.

One to One Relationship

One to One relationship is a single association between two distinct entities in the overall data modeling. It is preferred to embed entities in the same document for this pattern.

Ex- Relationship between Employees and ID Cards.

One to Many Relationship

One to Many relationship is when an entity of one type is related to multiple entities of another type. In order to decide if referencing or embedding

  • If the cardinality of the relationship is a lower number and the data needs to be accessed together, then it’s preferred to embed entities into a single document. Ex — Employee and Work Projects .
  • However, if data access is infrequent or cardinality is varying with time , referencing of related entities is preferred.

Many to Many Relationship

In Many to Many relationship, the key decision-making guideline is the upper cardinality/boundness of the related entity

  • Child References are preferred when the cardinality of referenced entities is static. The parent itself can refer to all related children in its parent document.
  • Parent References are suggested when the cardinality of referenced entities is growing. Every new child document can add a reference to the parent document’s primary key.

Summary

Data modeling decisions in NoSQL databases are generally evolving based on data-access patterns observed from customer traffic. The flexibility of the json data structure coupled with versioning of documents can serve as a good progression from the schema design in relational databases. However, they come with the major fallback of not supporting ACID rules of database transactions.

Most of the Cloud offerings such as AWS, Azure etc provide document databases as a PAAS Offering. AWS DocumentDB , Azure Cosmos etc are some popular database choices by the engineering teams.

For feedback, please drop a message to amit[dot]894[at]gmail[dot]com or reach out to any of the links at https://about.me/amit_raj.

--

--