This repository contains the example DAG described in the Orchestrate MongoDB operations with Apache Airflow tutorial. It ingest descriptions of video games from a local text file, creates vector embeddings using OpenAI and then loads the embedded data into a MongoDB collection. Additionally the DAG will set up the collection and search index if they do not already exist. The DAG also contains a task to query the data for specific concepts.
This repository contains:
query_game_vectors
: A DAG showing how to ingest text data as vector embeddings into MongoDB and query the data.test_conn
: A DAG to test the connection to MongoDB.
- Fork and clone this repository.
- Make sure you have the Astro CLI installed and that Docker is running.
- Copy the
.env.example
file to a new file called.env
and fill in your own values and credentials into the<>
placeholders. You will need a running MongoDB cluster as well as an OpenAI API key of at least tier 1. - Run
astro dev start
to start the Airflow instance. The webserver with the Airflow UI will be available atlocalhost:8080
. Log in with the credentialsadmin:admin
. - Run the
test_conn
DAG to test your connection to MongoDB. - Run the
query_game_vectors
DAG to ingest text data as vector embeddings into MongoDB and query the data. When running the DAG you can provide concepts to query the data for.