This post summarises all of my research and analysis on the performance gains made possible by utilising projection in MongoDB. We will be able to determine whether using projection will enhance MongoDB query performance at the conclusion of this tutorial.

Let's get started without further ado.

What is MongoDB Projection?

With a MongoDB projection query, we can specify the fields that should be returned. By placing a 0 or 1 next to a field's name after included it in a query, we can do projection in MongoDB. It will be visible if you specify the parameter 1; if you specify 0 it will be hidden.

Queries by default return all fields from matched documents. The server manipulating the result set using projection criteria will be less effective if you need all the fields; entire documents should be returned instead.

However, efficiency can be enhanced by utilising projection to restrict the fields that query results return by:

  • eliminating unnecessary fields from search results (saving on network bandwidth)
  • reducing the number of response fields to satisfy a covered query (returning indexed query results without fetching full documents)

The MongoDB server will have to fetch each whole document into memory (assuming it isn't already there) and filter the results to return when using projection to remove unneeded data. Depending on your data model and the projected fields, this use of projection can significantly reduce network traffic for query results without affecting memory use or the working set on the MongoDB server.

An exception to this rule is a covered query, which saves the server from having to retrieve the entire document by having all requested fields in the query result contained in the index that was used. If other queries don't need to fetch the same content, covered queries can decrease memory usage and enhance performance.

Examples

Imagine you have the following document to use as an example with the mongo shell:

db.data.insert({
    a: 'abc',
    b: new Array(10*1024*1024).join('z')
})

A range of values could be represented by the field b. (or in this case a very long string).

Next, build an index on the field a:1, which is frequently used by your use case:

db.data.createIndex({a:1})

simple findOne()?gives a query result that is around 10MB in size with no projection criteria:

> bsonsize(db.data.findOne({}))
10485805

The result will only include the field a and the document _id if you add the projection a:1 (which is included by default). The query result is now only 33 bytes, but the MongoDB server is still manipulating a 10MB document to choose two fields:

> bsonsize(db.data.findOne({}, {a:1}))
33

This query is not covered because it is necessary to fetch the entire page in order to determine the _id value. As a document's unique identifier, the _id field is included by default in query results; but, unless specifically included, _id won't be included in a secondary index.

The results from explain() will display the number of documents and index keys investigated using the totalDocsExamined and totalKeysExamined metrics:

 > db.data.find(
     {a:'abc'}, 
     {a:1}
 ).explain('executionStats').executionStats.totalDocsExamined
 > 1

The _id field can be removed from this query using projection to create a covered query that just uses the a:1 index. The covered query will be effective in terms of network and memory usage because it won't need to fetch a roughly 10MB document into memory:

 > db.data.find(
     {a:'abc'},
     {a:1, _id:0}
 ).explain('executionStats').executionStats.totalDocsExamined
 0

 > bsonsize(db.data.findOne( {a:'abc'},{a:1, _id:0}))
 21
My MongoDB queries are slow. Does my sluggish query?which uses a compound index on the field?get affected by returning a subset?

Without the context of a particular query, an example document, and the entire explain output, this cannot be answered. To compare the results of the same query with and without projection, you might run some benchmarks in your own environment. It may be a clear indication that your data model needs to be updated if your projection significantly increases the time it takes for a query to execute overall (including processing and transferring results).

It might be preferable to file a new question with specific details to examine if it's unclear why a query is delayed.


Recommended Posts

View All

Understanding the Pros and Cons of MongoDB


A NoSQL document database is MongoDB. It is a document-based open-source application for large-scale data storage.

Top Features You Must Learn to Master in MongoDB


MongoDB is a NoSQL database with a tonne of amazing features. These incredible qualities give this technology a distinctive and alluring look. These c...

Botman Chatbot integration in Laravel 9 Tutorial


Botman Chatbot integration in Laravel 8 Tutorial. Laravel 8 Botman Chatbot Tutorial. How to work with Chat bots. Building a booking chatbot using BotM...

Learn What is MongoDB with Application and Features


MongoDB is a C++-based open source platform with a fairly simple setup process. It is a non-structured, document-oriented, cross-platform database. Hi...

CRUD Operations in MongoDB


The conventions of a user interface that allow users to view, search, and alter portions of the database are referred to as CRUD operations MongoDB.