One-to-One Join
On this page
Introduction
In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app.
This aggregation performs a one-to-one join. A one-to-one join occurs when a document in one collection has a field value that matches a single document in another collection that has the same field value. The aggregation matches these documents on the field value and combines information from both sources into one result.
Tip
A one-to-one join does not require the documents to have a one-to-one relationship. To learn more about this data relationship, see the Wikipedia entry about One-to-one (data model).
Aggregation Task Summary
This tutorial demonstrates how to combine data from a collection that describes product information with another collection that describes customer orders. The results show a list of all orders placed in 2020 that includes the product details associated with each order.
This example uses two collections:
orders
: contains documents describing individual orders for products in a shopproducts
: contains documents describing the products that a shop sells
An order can only contain one product, so the aggregation uses a
one-to-one join to match an order document to the document for the
product. The collections are joined by a field called product_id
that exists in documents in both collections.
Before You Get Started
Before you begin following an aggregation tutorial, you must set up a new Python app. You can use this app to connect to a MongoDB deployment, insert sample data into MongoDB, and run the aggregation pipeline in each tutorial.
Tip
To learn how to install the driver and connect to MongoDB, see Get Started with PyMongo
Once you install the driver, create a file called
agg_tutorial.py
. Paste the following code in this file to create an
app template for the aggregation tutorials:
from pymongo import MongoClient # Replace the placeholder with your connection string. uri = "<connection string>" client = MongoClient(uri) try: agg_db = client["agg_tutorials_db"] # Get a reference to relevant collections. # ... some_coll = # ... another_coll = # Delete any existing documents in collections. # ... some_coll.delete_many({}) # Insert sample data into the collection or collections. # ... some_data = [...] # ... some_coll.insert_many(some_data) # Create an empty pipeline array. pipeline = [] # Add code to create pipeline stages. # ... pipeline.append({...}) # Run the aggregation. # ... aggregation_result = ... # Print the aggregation results. for document in aggregation_result: print(document) finally: client.close()
Important
In the preceding code, read the code comments to find the sections of the code that you must modify for the tutorial you are following.
If you attempt to run the code without making any changes, you will encounter a connection error.
For every tutorial, you must replace the connection string placeholder with your deployment's connection string. To learn how to locate your deployment's connection string, see Create a Connection String.
For example, if your connection string is
"mongodb+srv://mongodb-example:27017"
, your connection string assignment resembles
the following:
uri = "mongodb+srv://mongodb-example:27017";
To run the completed file after you modify the template for a tutorial, run the following command in your shell:
python3 agg_tutorial.py
After you set up the app, access the orders
and products
collections by adding the following code to the application:
orders_coll = agg_db["orders"] products_coll = agg_db["products"]
Delete any existing data and insert sample data into
the orders
collection as shown in the following code:
orders_coll.delete_many({}) order_data = [ { "customer_id": "elise_smith@myemail.com", "orderdate": datetime(2020, 5, 30, 8, 35, 52), "product_id": "a1b2c3d4", "value": 431.43 }, { "customer_id": "tj@wheresmyemail.com", "orderdate": datetime(2019, 5, 28, 19, 13, 32), "product_id": "z9y8x7w6", "value": 5.01 }, { "customer_id": "oranieri@warmmail.com", "orderdate": datetime(2020, 1, 1, 8, 25, 37), "product_id": "ff11gg22hh33", "value": 63.13 }, { "customer_id": "jjones@tepidmail.com", "orderdate": datetime(2020, 12, 26, 8, 55, 46), "product_id": "a1b2c3d4", "value": 429.65 } ] orders_coll.insert_many(order_data)
Delete any existing data and insert sample data into
the products
collection as shown in the following code:
products_coll.delete_many({}) product_data = [ { "id": "a1b2c3d4", "name": "Asus Laptop", "category": "ELECTRONICS", "description": "Good value laptop for students" }, { "id": "z9y8x7w6", "name": "The Day Of The Triffids", "category": "BOOKS", "description": "Classic post-apocalyptic novel" }, { "id": "ff11gg22hh33", "name": "Morphy Richardds Food Mixer", "category": "KITCHENWARE", "description": "Luxury mixer turning good cakes into great" }, { "id": "pqr678st", "name": "Karcher Hose Set", "category": "GARDEN", "description": "Hose + nosels + winder for tidy storage" } ] products_coll.insert_many(product_data)
Tutorial
Add a match stage for orders in 2020
Add a $match stage that matches orders placed in 2020:
pipeline.append({ "$match": { "orderdate": { "$gte": datetime(2020, 1, 1, 0, 0, 0), "$lt": datetime(2021, 1, 1, 0, 0, 0) } } })
Add a lookup stage to link the collections
Next, add a $lookup stage. The
$lookup
stage joins the product_id
field in the orders
collection to the id
field in the products
collection:
pipeline.append({ "$lookup": { "from": "products", "localField": "product_id", "foreignField": "id", "as": "product_mapping" } })
Add set stages to create new document fields
Next, add two $set stages to the pipeline.
The first $set
stage sets the product_mapping
field
to the first element in the product_mapping
object
created in the previous $lookup
stage.
The second $set
stage creates two new fields, product_name
and product_category
, from the values in the
product_mapping
object field:
pipeline.extend([ { "$set": { "product_mapping": {"$first": "$product_mapping"} } }, { "$set": { "product_name": "$product_mapping.name", "product_category": "$product_mapping.category" } } ])
Tip
Because this is a one-to-one join, the $lookup
stage
adds only one array element to the input document. The pipeline
uses the $first
operator to retrieve the data from this element.
Add an unset stage to remove unneeded fields
Finally, add an $unset stage. The
$unset
stage removes unnecessary fields from the document:
pipeline.append({"$unset": ["_id", "product_id", "product_mapping"]})
Interpret results
The aggregated result contains three documents. The documents
represent customer orders that occurred in 2020, with the
product_name
and product_category
of the ordered product:
{ 'customer_id': 'elise_smith@myemail.com', 'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52), 'value': 431.43, 'product_name': 'Asus Laptop', 'product_category': 'ELECTRONICS' } { 'customer_id': 'oranieri@warmmail.com', 'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37), 'value': 63.13, 'product_name': 'Morphy Richardds Food Mixer', 'product_category': 'KITCHENWARE' } { 'customer_id': 'jjones@tepidmail.com', 'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46), 'value': 429.65, 'product_name': 'Asus Laptop', 'product_category': 'ELECTRONICS' }
The result consists of documents that contain fields from
documents in the orders
collection and the products
collection, joined by matching the product_id
field present in
each original document.
To view the complete code for this tutorial, see the Completed One-to-one Join App on GitHub.