Filtered Subset

On this page

Introduction

Aggregation Task Summary
Before You Get Started
Tutorial
Add a match stage for people who are engineers
Add a sort stage to sort from youngest to oldest
Add a limit stage to see only three results
Add an unset stage to remove unneeded fields
Run the aggregation pipeline
Interpret results

Introduction

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app. This aggregation performs the following operations:

Matches a subset of documents by a field value
Formats result documents

Aggregation Task Summary

This tutorial demonstrates how to query a collection for a specific subset of documents in a collection. The results contain documents that describe the three youngest people who are engineers.

This example uses one collection, persons, which contains documents describing people. Each document includes a person's name, date of birth, vocation, and other details.

Before You Get Started

Before you begin following an aggregation tutorial, you must set up a new Python app. You can use this app to connect to a MongoDB deployment, insert sample data into MongoDB, and run the aggregation pipeline in each tutorial.

Tip

To learn how to install the driver and connect to MongoDB, see Get Started with PyMongo

Once you install the driver, create a file called agg_tutorial.py. Paste the following code in this file to create an app template for the aggregation tutorials:

from pymongo import MongoClient
# Replace the placeholder with your connection string.
uri = "<connection string>"
client = MongoClient(uri)
try:
    agg_db = client["agg_tutorials_db"]
    # Get a reference to relevant collections.
    # ... some_coll =
    # ... another_coll =
    # Delete any existing documents in collections.
    # ... some_coll.delete_many({})
    # Insert sample data into the collection or collections.
    # ... some_data = [...]
    # ... some_coll.insert_many(some_data)
    # Create an empty pipeline array.
    pipeline = []
    # Add code to create pipeline stages.
    # ... pipeline.append({...})
    # Run the aggregation.
    # ... aggregation_result = ...
    # Print the aggregation results.
    for document in aggregation_result:
        print(document)
finally:
    client.close()

Important

In the preceding code, read the code comments to find the sections of the code that you must modify for the tutorial you are following.

If you attempt to run the code without making any changes, you will encounter a connection error.

For every tutorial, you must replace the connection string placeholder with your deployment's connection string. To learn how to locate your deployment's connection string, see Create a Connection String.

For example, if your connection string is "mongodb+srv://mongodb-example:27017", your connection string assignment resembles the following:

uri = "mongodb+srv://mongodb-example:27017";

To run the completed file after you modify the template for a tutorial, run the following command in your shell:

python3 agg_tutorial.py

After you set up the app, access the persons collection by adding the following code to the application:

person_coll = agg_db["persons"]

Delete any existing data in the collections and insert sample data into the persons collection as shown in the following code:

person_coll.delete_many({})
person_data = [
    {
        "person_id": "6392529400",
        "firstname": "Elise",
        "lastname": "Smith",
        "dateofbirth": datetime(1972, 1, 13, 9, 32, 7),
        "vocation": "ENGINEER",
        "address": {
            "number": 5625,
            "street": "Tipa Circle",
            "city": "Wojzinmoj",
        }
    },
    {
        "person_id": "1723338115",
        "firstname": "Olive",
        "lastname": "Ranieri",
        "dateofbirth": datetime(1985, 5, 12, 23, 14, 30),
        "gender": "FEMALE",
        "vocation": "ENGINEER",
        "address": {
            "number": 9303,
            "street": "Mele Circle",
            "city": "Tobihbo",
        }
    },
    {
        "person_id": "8732762874",
        "firstname": "Toni",
        "lastname": "Jones",
        "dateofbirth": datetime(1991, 11, 23, 16, 53, 56),
        "vocation": "POLITICIAN",
        "address": {
            "number": 1,
            "street": "High Street",
            "city": "Upper Abbeywoodington",
        }
    },
    {
        "person_id": "7363629563",
        "firstname": "Bert",
        "lastname": "Gooding",
        "dateofbirth": datetime(1941, 4, 7, 22, 11, 52),
        "vocation": "FLORIST",
        "address": {
            "number": 13,
            "street": "Upper Bold Road",
            "city": "Redringtonville",
        }
    },
    {
        "person_id": "1029648329",
        "firstname": "Sophie",
        "lastname": "Celements",
        "dateofbirth": datetime(1959, 7, 6, 17, 35, 45),
        "vocation": "ENGINEER",
        "address": {
            "number": 5,
            "street": "Innings Close",
            "city": "Basilbridge",
        }
    },
    {
        "person_id": "7363626383",
        "firstname": "Carl",
        "lastname": "Simmons",
        "dateofbirth": datetime(1998, 12, 26, 13, 13, 55),
        "vocation": "ENGINEER",
        "address": {
            "number": 187,
            "street": "Hillside Road",
            "city": "Kenningford",
        }
    }
]
person_coll.insert_many(person_data)

Tutorial

Add a match stage for people who are engineers

First, add a $match stage that finds documents in which the value of the vocation field is "ENGINEER":

pipeline.append({
    "$match": {
        "vocation": "ENGINEER"
    }
})

Add a sort stage to sort from youngest to oldest

Next, add a $sort stage that sorts the documents in descending order by the dateofbirth field to list the youngest people first:

pipeline.append({
    "$sort": {
        "dateofbirth": -1
    }
})

Add a limit stage to see only three results

Next, add a $limit stage to the pipeline to output only the first three documents in the results.

pipeline.append({
    "$limit": 3
})

Add an unset stage to remove unneeded fields

Finally, add an $unset stage. The $unset stage removes unnecessary fields from the result documents:

pipeline.append({
    "$unset": [
        "_id",
        "address"
    ]
})

Tip

Use the $unset operator instead of $project to avoid modifying the aggregation pipeline if documents with different fields are added to the collection.

Run the aggregation pipeline

Add the following code to the end of your application to perform the aggregation on the persons collection:

aggregation_result = person_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py

Interpret results

The aggregated result contains three documents. The documents represent the three youngest people with the vocation of "ENGINEER", ordered from youngest to oldest. The results omit the _id and address fields.

{
  'person_id': '7363626383',
  'firstname': 'Carl',
  'lastname': 'Simmons',
  'dateofbirth': datetime.datetime(1998, 12, 26, 13, 13, 55),
  'vocation': 'ENGINEER'
}
{
  'person_id': '1723338115',
  'firstname': 'Olive',
  'lastname': 'Ranieri',
  'dateofbirth': datetime.datetime(1985, 5, 12, 23, 14, 30),
  'gender': 'FEMALE',
  'vocation': 'ENGINEER'
}
{
  'person_id': '6392529400',
  'firstname': 'Elise',
  'lastname': 'Smith',
  'dateofbirth': datetime.datetime(1972, 1, 13, 9, 32, 7),
  'vocation': 'ENGINEER'
}

To view the complete code for this tutorial, see the Completed Filtered Subset App on GitHub.

Back

Aggregation

Group and Total