MongoDB Query Recipes

Dhruv Saksena
6 min readNov 21, 2021

--

MongoDB comes in with a lot of features and flexibility, but at the same time the wrong design of schema and wrong queries do cause a lot of performance issues. In this article, we are going to check on how can we optimise queries and design schemas-

Query Plan

For any query, MongoDB first checks in its cache if there is a Query Plan for the same. If it’s there then it simply executes that plan, but if it is not there then firstly it generate plans for that query and then chooses the most efficient plan and finally executes the plan and caches it aswell for future executions.

To check the plan of a query

db.transaction.find({'deptId' : '7'}).explain()

To get plan details alongwith execution statistics

db.transaction.find({'deptId' : '7'}).explain('executionStats')

Here, totalDocsExamined will tell the number of documents which were analysed to get the result and nReturned to get the docs returned by this query. If the totalDocsExamined is a high number then the query is bound to be slow.

Fast inserts

MongoDB works best when we insert data in bulk, its much faster than inserting data one-by-one. bulkWrite() is an excellent method to achieve that.

All you need to do is to organise your objects in batches and bulk write them into database

const batchToBeInserted = [];batchToBeInserted.push(
{
replaceOne:
{
filter:
{
_id: id
},
"replacement": <<object-to-be-replaced>>
}
})
db.getCollection("<<collection-name>>").bulkWrite(batchToBeInserted);

Data organization

MongoDb is basically a key-value database, which let’s it to accept any key/value into DB. On one hand it does give a lot of flexibility, but on the other hand it becomes difficult to bind that to a fixed schema. For use cases, where we can have any kind of key/value pair in DB, we can leverage EAV(Entity Attribute Value) modeling of database-

Ex: Let’s have a person object. It can have any number of attributes depending on use case ex: age,gender,school,marks,degree,contact-no, spouse. We can keep the most queryable fields on the top and less-queryable fields as a part of a collection within it-

Ex:

{
"_id": ObjectId("6143c6b9b421dc001eedd96e"),
"name": "Dhruv",
"city": "Jaipur",
"gender": "Male",
"phoneNo": "00000000"
attributes:
[
{
"code": "qualification",
"value": "B.Tech"
},
{
"code": "homeTown",
"value": "Jodhpur"
}
]}

In the above example now our Mongo collection has a fixed schema and it can accomodate any number of attributes within attributes collection. You, can query on fields within attributes array but they will be slower in nature, so based on the nature of queries your application is firing, you choose which one to put on top and which one to be a part of attributes collection-

Design Patterns

Like above, MongoDB has a rich set of design patterns which you can leverage to organize your data. Refer this article for more details.

High CPU utilisation by MongoDB

There can be many reasons for high cpu usage of MongoDB, Iam highlighting few of them to ease out the figuring out process-

  1. There could be multiple queries running in continuous manner on fields which are not indexed. Since, the collection size is huge the query spans over entire result set on continuous basis and thereby causing a spike in the CPU.

Use the below query to check the currently running query in mongodb.

db.currentOp();

Log Rotation

MongoDB creates log file and by default all logs get appended to mongodb.log. If the log becomes heavy every write on that log will cause CPU spike, so its better to log-rotate that-

db.adminCommand( { logRotate : 1 } )

Using compound indexes

Many developers create multiple single indexes to improve the query performance, but you really need to check the queries the application is firing. If your application is firing query with multiple and clauses, then a compound index will be much more helpful then multiple single indexes.

Also, let’s say the you have created a compound index on two fields A,B and if you query on just field “A” then also this will work as “A” is the first field in the compound index.

In a nutshell, to tune the performance of MongoDB the way in which indexes are designed play a very vital role.

Do away with heavy documents

Sometimes, we put too many things in a single document and transfer the entire document to our application, this causes huge data transfer and slowness in MongoDB operations. We should either use projections in our queries or make our documents thinner.

Reducing space of data on disk

If for some reason you want to reduce the size of data at rest then MongoDB gives option to compress the data-

db.createCollection( "test", {storageEngine:{wiredTiger:{configString:'block_compressor=zlib'}}} );

Making sure indexes are used in queries

Use .explain() after your queries. If that query is satisfied by any index present in your DB then in the stage you can see IXSCAN, which will indicate the index being used for that specific query.

MongoDB App level compression

With use of application level compression in MongoDB, the cpu spikes on queries get drastically reduced.MongoDB’s in built compression only compresses data at rest and decompresses the same in cache. So, it’s main use is to reduce disk space and not optimize performance.

Sharding of database cluster

Sharding is also a great technique to horizontally scale your database to a greater extent. This way you partition the data into multiple shards-

Following are the steps-

  1. Initiate the configuration servers
sh-3.2# mongod --configsvr --replSet ConfigReplSet --dbpath /var/lib/cfgsvr1 --port 27018 --fork --logpath /var/lib/cfgsvr1/log --logappendmongod --configsvr --replSet ConfigReplSet --dbpath /var/lib/cfgsvr2 --port 27019 --fork --logpath /var/lib/cfgsvr2/log --logappendmongod --configsvr --replSet ConfigReplSet --dbpath /var/lib/cfgsvr3 --port 27020 --fork --logpath /var/lib/cfgsvr3/log --logappendmongo --port 27018
rs.initiate();
rs.add("localhost:27019");
rs.add("localhost:27020")

2. Initiate the shard servers-

ReplicaSet-1

sh-3.2# mongod --shardsvr --replSet ReplicaSet1 --dbpath /var/lib/replica11 --logpath /var/lib/replica11/log --port 27011 --logappend  --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 14217
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet1 --dbpath /var/lib/replica12 --logpath /var/lib/replica12/log --port 27012 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 14594
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet1 --dbpath /var/lib/replica13 --logpath /var/lib/replica13/log --port 27013 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 14610
child process started successfully, parent exiting
sh-3.2# mongo --port 27011
> rs.initiate();
{
"info2" : "no configuration specified. Using a default configuration for the set",
"me" : "localhost:27011",
"ok" : 1
}
ReplicaSet1:SECONDARY> rs.add("localhost:27012");
{
"ok" : 1,
"$clusterTime" :
{
"clusterTime" : Timestamp(1645611766, 1),
"signature" :
{
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1645611766, 1)
}
ReplicaSet1:PRIMARY> rs.add("localhost:27013");
{
"ok" : 1,
"$clusterTime" :
{
"clusterTime" : Timestamp(1645611790, 1),
"signature" :
{
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1645611790, 1)
}
ReplicaSet1:PRIMARY> rs.status();

ReplicaSet-2

sh-3.2# mongod --shardsvr --replSet ReplicaSet2 --dbpath /var/lib/replica21 --logpath /var/lib/replica21/log --port 27021 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16853
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet2 --dbpath /var/lib/replica22 --logpath /var/lib/replica22/log --port 27022 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16883
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet2 --dbpath /var/lib/replica23 --logpath /var/lib/replica23/log --port 27023 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16900
child process started successfully, parent exiting

ReplicaSet-3

sh-3.2# mongod --shardsvr --replSet ReplicaSet3 --dbpath /var/lib/replica31 --logpath /var/lib/replica31/log --port 27031 --logappend  --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16926
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet3 --dbpath /var/lib/replica32 --logpath /var/lib/replica32/log --port 27032 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16948
child process started successfully, parent exiting
sh-3.2# mongod --shardsvr --replSet ReplicaSet3 --dbpath /var/lib/replica33 --logpath /var/lib/replica33/log --port 27033 --logappend --oplogSize 50 --forkabout to fork child process, waiting until server is ready for connections.
forked process: 16960
child process started successfully, parent exiting

Enable sharding at mongos-

mongo --port 47017mongos> sh.addShard("ReplicaSet1/localhost:27011");
{
"shardAdded" : "ReplicaSet1",
"ok" : 1,
"operationTime" : Timestamp(1647957319, 4),
"$clusterTime" :
{
"clusterTime" : Timestamp(1647957319, 4),
"signature" :
{
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
mongos> sh.addShard("ReplicaSet2/localhost:27021");
{
"shardAdded" : "ReplicaSet2",
"ok" : 1,
"operationTime" : Timestamp(1647957350, 3),
"$clusterTime" :
{
"clusterTime" : Timestamp(1647957350, 3),
"signature" :
{
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
mongos> sh.addShard("ReplicaSet3/localhost:27031");
{
"shardAdded" : "ReplicaSet3",
"ok" : 1,
"operationTime" : Timestamp(1647957386, 5),
"$clusterTime" :
{
"clusterTime" : Timestamp(1647957386, 6),
"signature" :
{
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
mongos> sh.status();
--- Sharding Status ---
sharding version:
{
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("6239d1be4d2efb1817aa14c8")
}
shards:{ "_id" : "ReplicaSet1", "host" : "ReplicaSet1/localhost:27011,localhost:27012,localhost:27013", "state" : 1 }{ "_id" : "ReplicaSet2", "host" : "ReplicaSet2/localhost:27021,localhost:27022,localhost:27023", "state" : 1 }{ "_id" : "ReplicaSet3", "host" : "ReplicaSet3/localhost:27031,localhost:27032,localhost:27033", "state" : 1 }active mongoses:
"4.4.6" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
mongos>

--

--

No responses yet