MongoDB Aggregation: Powerful Data Processing Techniques
Learn about MongoDB's aggregation framework, a versatile tool for processing and analyzing data. Explore the different aggregation methods, including the aggregation pipeline and single-purpose methods, and discover how to use them to perform various data transformations and calculations.
MongoDB Aggregation
Aggregation in MongoDB processes multiple documents and returns computed results. You can use it to group values from multiple documents or perform operations on grouped data.
Aggregation Methods
Aggregation can be performed in two ways:
- Aggregation Pipeline: Uses an array of stages to process documents.
- Single Purpose Aggregation Methods: Methods like
db.collection.estimatedDocumentCount()
,db.collection.count()
, anddb.collection.distinct()
.
Aggregation Pipeline
The aggregation pipeline consists of one or more stages passed to the db.aggregate()
or db.collection.aggregate()
method.
Syntax
db.collection.aggregate([ {stage1}, {stage2}, {stage3}...])
Each stage receives the output of the previous stage, processes it, and passes it to the next stage. The pipeline executes on the server and can utilize indexes.
Sample Data
Insert the following documents into the employees
collection:
Insert Sample Data
db.employees.insertMany([
{ _id: 1, firstName: "John", lastName: "King", gender: "male", email: "john.king@abc.com", salary: 5000, department: { name: "HR" }},
{ _id: 2, firstName: "Sachin", lastName: "T", gender: "male", email: "sachin.t@abc.com", salary: 8000, department: { name: "Finance" }},
{ _id: 3, firstName: "James", lastName: "Bond", gender: "male", email: "jamesb@abc.com", salary: 7500, department: { name: "Marketing" }},
{ _id: 4, firstName: "Rosy", lastName: "Brown", gender: "female", email: "rosyb@abc.com", salary: 5000, department: { name: "HR" }},
{ _id: 5, firstName: "Kapil", lastName: "D", gender: "male", email: "kapil.d@abc.com", salary: 4500, department: { name: "Finance" }},
{ _id: 6, firstName: "Amitabh", lastName: "B", gender: "male", email: "amitabh.b@abc.com", salary: 7000, department: { name: "Marketing" }}
])
$match Stage
The $match
stage filters documents to include only those matching the specified criteria, similar to the find()
method.
Example: $match Stage
db.employees.aggregate([{ $match: { gender: 'female' } }])
Output:
[ { _id: 4, firstName: 'Rosy', lastName: 'Brown', gender: 'female', email: 'rosyb@abc.com', salary: 5000, department: { name: 'HR' } } ]
$group Stage
The $group
stage groups input documents by the specified expression and accumulates values for each group.
Example: $group Stage
db.employees.aggregate([{ $group: { _id: '$department.name' } }])
Output:
[ { _id: 'Marketing' }, { _id: 'HR' }, { _id: 'Finance' } ]
Calculate the number of employees in each department:
Example: Get Accumulated Values
db.employees.aggregate([
{ $group: { _id: '$department.name', totalEmployees: { $sum: 1 } } }
])
Output:
[ { _id: 'Marketing', totalEmployees: 2 }, { _id: 'HR', totalEmployees: 2 }, { _id: 'Finance', totalEmployees: 2 } ]
$sort Stage
The $sort
stage sorts documents based on the specified field in ascending or descending order.
Example: Sort Documents
db.employees.aggregate([
{ $match: { gender: 'male' } },
{ $sort: { firstName: 1 } }
])
Output:
[ { _id: 6, firstName: 'Amitabh', lastName: 'B', gender: 'male', email: 'amitabh.b@abc.com', salary: 7000, department: { name: 'Marketing' } }, { _id: 3, firstName: 'James', lastName: 'Bond', gender: 'male', email: 'jamesb@abc.com', salary: 7500, department: { name: 'Marketing' } }, { _id: 1, firstName: 'John', lastName: 'King', gender: 'male', email: 'john.king@abc.com', salary: 5000, department: { name: 'HR' } }, { _id: 5, firstName: 'Kapil', lastName: 'D', gender: 'male', email: 'kapil.d@abc.com', salary: 4500, department: { name: 'Finance' } }, { _id: 2, firstName: 'Sachin', lastName: 'T', gender: 'male', email: 'sachin.t@abc.com', salary: 8000, department: { name: 'Finance' } } ]
Sort grouped data by department name:
Example: Sort Grouped Data
db.employees.aggregate([
{ $match: { gender: 'male' } },
{ $group: { _id: { deptName: '$department.name' }, totalEmployees: { $sum: 1 } } },
{ $sort: { '_id.deptName': 1 } }
])
Output:
[ { _id: { deptName: 'Finance' }, totalEmployees: 2 }, { _id: { deptName: 'HR' }, totalEmployees: 1 }, { _id: { deptName: 'Marketing' }, totalEmployees: 2 } ]
Use aggregation pipelines to efficiently query and process data from MongoDB collections.