MongoDB Aggregation Pipelines

3 min readMay 29, 2022

MongoDB provides an excellent framework to run aggregations. It’s a set of tools through which you can run any kind of report or do data analysis on one or more MongoDB collections-

In majority of cases, we use MongoDB just to insert and query the data, but with the aggregation pipelines we can generate analytics from our data, by using aggregation operators and also derive new values.

The pipeline in simple words is a flow data through different stages. The output of one stage becomes the input for another stage.

Following are the common stage operators in MongoDB-

$group — Groups documents by a field value/expression.
$project- Selects a set of fields from the resulting output
$match — Filters out the results.
$limit / $skip— Paginate the documents
$unwind — Expand arrays in documents
$sort- Sorts the data

Now, let’s first prepare a small dataset so that we can query on the same. We are making a collection called orders and it has following 3 objects-

/* 1 */
{
    "_id" : ObjectId("628a40d0265ae89785733873"),
    "categoryId" : 2,
    "productId" : 73,
    "amount" : 300,
    "orderDate" : ISODate("2022-05-20T08:00:00.000Z")
}/* 2 */
{
    "_id" : ObjectId("628a40ff265ae89785733892"),
    "categoryId" : 2,
    "productId" : 63,
    "amount" : 456,
    "orderDate" : ISODate("2022-05-21T08:00:00.000Z")
}/* 3 */
{
    "_id" : ObjectId("628a44db265ae897857338eb"),
    "categoryId" : 3,
    "productId" : 163,
    "amount" : 496,
    "orderDate" : ISODate("2022-05-15T08:00:00.000Z")
}

$match / $sort / $project example

Now, let’s execute a simple aggregation pipeline which filters a set of document , sorts them and finally project a selected set of fields from it-

db.order.aggregate([
  {
      $match: {"categoryId" : 2}
  },
  {
      $sort: { "orderDate" : -1}
  },
  {
      $project : { "_id" : 0 , "productId" : 1 }
  }]);

The output of the above program will be-

/* 1 */
{
    "productId" : 63
}/* 2 */
{
    "productId" : 73
}

Note, we have intentionally put _id as 0, because if we don’t MongoDB will by default add _id to the resultset.

$group example

db.order.aggregate([
  {
      $match: {}
  },
  {
      $group: 
         {
             "_id" : "$categoryId",
             "products": { $push : "$productId"}
          
         }
  }
  
]);

Result-

/* 1 */
{
    "_id" : 3,
    "products" : [ 
        163
    ]
}/* 2 */
{
    "_id" : 2,
    "products" : [ 
        73, 
        63
    ]
}

It’s not essential to have $match as the first stage of pipeline. It could be any of the stage operators, however this will scan through the entire collection.

To improve readability, it’s always advisable to store the pipeline stage in a variable-

var groupStage = {
      $group: 
         {
             "_id" : "$categoryId",
             "products": { $push : "$productId"}
          
         }
  };db.order.aggregate([
  {
      $match: {}
     
  },
   groupStage
  
]);

$unwind example

$unwind basically deconstructs an array field and make new documents with the fields of that array. It duplicates each element in the pipeline and add the additional field of the array into the original document.

Let’ say we have a product collection, with below data-

/* 1 */
{
    "_id" : ObjectId("629372e3265ae897857b6eff"),
    "categoryId" : 23,
    "brand" : "Samsung",
    "model" : "DXV-LSI",
    "tags" : [ 
        "tv", 
        "samsung", 
        "led"
    ]
}

Now, let’s run unwind over it-

db.getCollection('product').aggregate([ { $unwind : "$tags"}]).pretty()

Output-

/* 1 */
{
    "_id" : ObjectId("629372e3265ae897857b6eff"),
    "categoryId" : 23,
    "brand" : "Samsung",
    "model" : "DXV-LSI",
    "tags" : "tv"
}/* 2 */
{
    "_id" : ObjectId("629372e3265ae897857b6eff"),
    "categoryId" : 23,
    "brand" : "Samsung",
    "model" : "DXV-LSI",
    "tags" : "samsung"
}/* 3 */
{
    "_id" : ObjectId("629372e3265ae897857b6eff"),
    "categoryId" : 23,
    "brand" : "Samsung",
    "model" : "DXV-LSI",
    "tags" : "led"
}

As you notices, all of them have the same _id.

Also, starting from MongoDb 5.0 onwards, map-reduce function has been deprecated and aggregation pipelines is the goto feature if you want to generate analytics. Aggregation pipelines are much more efficient then map-reduce function.

MongoDB Aggregation Pipelines

Written by Dhruv Saksena

No responses yet