MongoDB is a document-oriented database application that runs on multiple platforms. Despite being categorized as a schema-less database tool, MongoDB uses a JSON-like document structure, therefore a data model exists. Data modelling, in general, contains numerous components that necessitate active participation from a variety of stakeholders; developers must be aware of the following four questions:
- What information should be saved?
- Which documents are most likely to be accessed together?
- How frequently will this document be accessed? and
- How quickly will the data grow?
As a result, the answers to these questions will direct the team in developing competent and agile data models to meet the needs of twenty-first-century enterprises that succeed in seamless information exchange environments.
This blog article will describe how data modelling in MongoDB works. What are the many components of the data modelling process, as well as the schema utilized in MongoDB to develop the necessary processes? Let's get started.
Table of Contents
- What is Data Modeling?
- Embedded & Normalized MongoDB Data Modeling
- Embedded Data Model
- Normalized Data Model
- Defining Relationships in MongoDB Data Modeling
- One-to-one Relationship
- One-to-many Relationship
- Many-to-many Relationship
- MongoDB Data Modeling Schema
- What is a flexible schema?
- What is Rigid Schema?
- What is Schema Validation?
- Schema Validation Levels in MongoDB Data Modeling
- Schema Validation Actions
- MongoDB Data Modeling Schema Design Pattern
What is Data Modeling?
Data modelling is the blueprint for creating a full-fledged database system. A data model's principal function is to provide visual information about the relationship between two or more data items. The layout/design would thus be critical in managing petabyte-scale data repositories to store data from multiple company functions and teams, ranging from sales to marketing and beyond.
To add new data models or reaffirm definitions on an existing one, the process of ideating a data model is always continuous and developing, necessitating various feedback loops and direct contact with stakeholders.
Formalized schemas and procedures are used to construct competent data models, ensuring a standard, consistent, and predictable way to perform business processes and strategize data resources in an organization
Conceptual Data Models: Conceptual Data Models are crude drawings that provide the large picture, answering where data/information from various business processes will be housed in the database system and the relationships they will be entangled with. The entity type, characteristics, limitations, and the relationship between security and data integrity needs are often included in a conceptual data model.
Logical data models provide more in-depth, subjective information on the relationships between data sets. We can clearly link to what data types and relations are used at this point. Logical data models are typically ignored in agile business environments, although they are useful in projects that are data-driven and require extensive procedure execution.
Physical Data Models: A physical data model is a schema/layout for storing data in a database. A physical data model provides a finished proposal that can be implemented in a relational database.
MongoDB Data Modeling Embedded and Normalized Data Models
When data professionals begin constructing data models in MongoDB, they are faced with the decision of embedding the information or storing it independently in a collection of documents. As a result, there are two concepts for efficient MongoDB Data Modeling:
- The Normalized Data Model and?
- The Embedded Data Model
Embedded Data Model
MongoDB embedded data modelling When two data sets have a relationship, Data Modeling ? a denormalized data model ? is used. As a result, an embedded data model establishes links between data pieces while maintaining documents in a single document structure. Depending on the situation, you can save data in an array or a field.
Normalized Data Model
Object references are used in a normalized data model to model relationships between data elements/documents. Because this architecture avoids data duplication, many-to-many relationships can be described without duplicating content pretty easily. Normalized data models are ideal for modelling big hierarchical datasets with cross-references.
Defining Relationships in MongoDB Data Modeling
The most crucial consideration in your MongoDB data modelling project is defining relationships for your schema. These relationships describe how your system will use data. MongoDB Data Modeling defines three types of relationships: one-to-one, one-to-many, and many-to-many.
One-to-one Relationship
Your name is a fantastic example of this relationship. Because each user can only have one name. One-to-one data can be represented in your database as key-value pairs. Consider the following example:
{
"_id": "ObjectId('AAA')",
"name": "codesolutionstuff",
"company": "MongoDB",
"twitter": "@codesolutionstuff",
"twitch": "codesolutionstuff",
"tiktok": "codesolutionstuff",
"website": "codesolutionstuff.com"
}
One-to-many Relationship
Consider the following scenario: you are creating a page for an e-commerce site with a schema that displays product information. As a result, we save information in the system for various pieces that comprise a single project. Thousands of subparts and relationships could be saved thanks to the schema. Let's have a look at some of its works:
{
"_id": "ObjectId('AAA')",
"name": "codesolutionstuff",
"company": "MongoDB",
"twitter": "@codesolutionstuff",
"twitch": "codesolutionstuff",
"tiktok": "codesolutionstuff",
"website": "codesolutionstuff.com",
"addresses": [
{ "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },
{ "street": "123 Avenue Q", "city": "New York", "cc": "USA" }
]
}
Many-to-many Relationship
Consider imagining a to-do app to better grasp many-to-many relationships. A user may have many tasks assigned to him or her in the programme. As a result, in order to retain user-task links, references will exist between one user and many tasks, as well as one task and many users. Let's look at this with the help of the following example:
Users:
{
"_id": ObjectID("AAF1"),
"name": "codesolutionstuff",
"tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
}
Tasks :
{
"_id": ObjectID("ADF9"),
"description": "Write blog post about MongoDB schema design",
"due_date": ISODate("2014-04-01"),
"owners": [ObjectID("AAF1"), ObjectID("BB3G")]
}
MongoDB Data Modeling Schema
MongoDB By default, data modelling uses a flexible scheme that is not the same for all documents. This is a paradigm shift in how we see data in tables from a SQL standpoint, where all rows and columns are defined to have a fixed data type.
What exactly is a flexible schema?
It is unnecessary to identify data type in a single field in a flexible schema model because a field can differ across documents. When adding, removing, or modifying new areas to an existing table, or even updating documents to a new structure, a flexible schema comes in handy.
Let's look at an example. In the following example, two papers are in the same collection:
{ "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),
"StudentName" : "ABC",
"ParentPhone" : 75646344,
"age" : 10
}
{ "_id" : ObjectId("5b98bfe7e8b9ab98757e8b9a"),
"StudentName" : "XYZ",
"ParentPhone" : false,
}
The field 'age' is present in the first batch, but it is absent in the second. Furthermore, the data type for field 'ParentPhone' in the first set is numerical, however it is set to 'False' in the second set, which is a boolean type data set.
What is Rigid Schema?
All documents in a collection have a comparable structure in a strict schema, providing you a greater opportunity while building up some new document validation criteria to improve data integrity during insert and update options. String, number, boolean, date, buffer, objectld, array, mixed, deciman128, map are some hard schema data types.
The following is an example of a sample schema:
var userSchema = new mongoose.Schema({
userId: Number,
Email: String,
Birthday: Date,
Adult: Boolean,
Binary: Buffer,
height: Schema.Types.Decimal128,
units: []
});
var user = mongoose.model(?Users?, userSchema )
var newUser = new user;
newUser.userId = 1;
newUser.Email = ?example@gmail.com?;
newUser.Birthday = new Date;
newUser.Adult = false;
newUser.Binary = Buffer.alloc(0);
newUser.height = 12.45;
newUser.units = [?Circuit network Theory?, ?Algerbra?, ?Calculus?];
newUser.save(callbackfunction);
What is Schema Validation?
When validating data from the server's end, schema validation is critical. There are certain schema validation rules that can help with this. The validation rules are used for insertion and deletion actions. Using the 'collMod' command, the rules can also be introduced to an existing collection. The modifications will not be applied to existing documents unless they are updated.
When using the 'dv.createCollection()' command to create a new collection, the validator command can be issued. Because MongoDB supports JSON Schema starting with version 3.6, you must use the '$jsonSchema' operator.
db.createCollection("students", {
validator: {$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "gpa" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
gender: {
bsonType: "string",
description: "must be a string and is not required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
exclusiveMaximum: false,
description: "must be an integer in [ 2017, 2020 ] and is required"
},
major: {
enum: [ "Math", "English", "Computer Science", "History", null ],
description: "can only be one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
minimum: 0,
description: "must be a double and is required"
}
}
}})
To add a new document to the schema, use the following example:
db.students.insert({
name: "ABC",
year: NumberInt(2016),
major: "History",
gpa: NumberInt(3)
})
Because the supplied year is not within the set limit, an error will be generated by the callback method due to some invalid validation rules.
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Except for the $where, $text, near, and $nearSphere operators, query expressions can be added to the validation option.
db.createCollection( "contacts",
{ validator: { $or:
[
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
} )
Schema Validation Levels in MongoDB Data Modeling
Validations are typically issued to write operations. They can, however, be applied to existing papers. There are three types of validation:
- Strict: All inserts and updates are subject to validation criteria.
- Moderate: Validation rules are applied only to existing documents that meet the validation criteria during inserts and changes.
- Validations are turned off, therefore no validation criteria are applied to any documents.
Let's put the data below in a 'client' collection, for example.
db.clients.insert([
{
"_id" : 1,
"name" : "Brillian",
"phone" : "+1 778 574 666",
"city" : "Beijing",
"status" : "Married"
},
{
"_id" : 2,
"name" : "James",
"city" : "Peninsula"
}
]
After utilizing the moderate validation level:
db.runCommand( {
collMod: "test",
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone", "name" ],
properties: {
phone: {
bsonType: "string",
description: "must be a string and is required"
},
name: {
bsonType: "string",
description: "must be a string and is required"
}
}
} },
validationLevel: "moderate"
} )
As a result, because it meets the criteria, the validation rules will only be applied to the document with the '_id' of 1. The validation conditions were not met in the second document, hence it will not be validated.
Schema Validation Actions
Documents that breach the validation criteria in the first place are subject to schema validation actions. As a result, there is a requirement to give actions when this occurs. MongoDB has two actions for this: Error and Warn.
If the validation conditions are not met, this operation rejects the insert or update.
Warn: The Warn action will log every violation in the MongoDB log and allow the insert or update operator to finish. As an example:
db.createCollection("students", {
validator: {$jsonSchema: {
bsonType: "object",
required: [ "name", "gpa" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
gpa: {
bsonType: [ "double" ],
minimum: 0,
description: "must be a double and is required"
}
}
},
validationAction: ?warn?
})
If we insert a document that looks like this:
db.students.insert( { name: "Amanda", status: "Updated" } );
The gpa field is missing, but because the validation is set to 'warn,' the document will be saved with an error message reported in the MongoDB log.
MongoDB Data Modeling Schema Design Patterns
The MongoDB Data Modeling Schema Design includes 12 patterns. Let us go over them quickly.
- Approximation: Only approximate values are saved in a few writes and calculations.
- Attribute: When indexing and querying huge documents, only index and query on a subset of fields.
- Bucket values limit the quantity of documents while streaming data or employing IoT apps. Furthermore, pre-aggregation facilitates data availability.
- MongoDB avoids repeating computations by performing reads at writes or at regular intervals.
- Document Versioning: Document versioning provides for the coexistence of many versions of a document.
- Extended Reference: By embedding just commonly embedded fields, we eliminate several joins.
- Outlier: Data models and queries are built for common use cases and are unaffected by outliers.
- Pre-Allocation: When the structure of a document is known ahead of time, pre-allocation lowers memory reallocation and improves efficiency.
- Polymorphic: When related documents do not have the same structure, polymorphic is useful.
- Schema Versioning: Schema is important when the schema evolves during the life of the application, avoiding downtime and technical debt.
- Subset: A portion is beneficial when the application?only uses a subset of the data. Because a smaller dataset will fit into RAM more efficiently.
- The tree pattern is appropriate for hierarchical data. The application must keep track of graph updates.