Ask Astro is an open-source reference implementation of Andreessen Horowitz's LLM Application Architecture built by Astronomer. It provides an end-to-end example of a Q&A LLM application used to answer questions about Apache Airflow and Astronomer, including:
- Airflow DAGs for data ingestion from various sources to a vector database
- An API containing the business logic of taking a user's prompt and ultimately generating a response
- A Slack bot that exposes Ask Astro in Slack workspaces
- A UI that exposes Ask Astro to the public
- Airflow DAGs that process user feedback and generate AI feedback on questions and answers
These are generally divided into three categories: data retrieval & embedding, prompt orchestration, and feedback loops. The rest of this README contains more in-depth detail on each of the categories in advance of a series of blog posts that will be written about the topics.
If you have any questions, feedback, or want to share similar use cases, reach out to [email protected].
In order to make the responses as factual and accurate as possible, it's generally best practice to use Retrieval Augmented Generation (RAG). However, in order for RAG to be effective, a vector database needs to be populated with the most up-to-date and relevant information.
Ask Astro uses a set of Airflow DAGs that: ingest data from a source via an API or Python library, preprocesses and splits the data into smaller chunks, embeds those chunks, and writes the embeddings to Weaviate. As of today, Ask Astro retrieves data from the following sources:
- Apache Airflow docs
- Astronomer docs
- Astronomer blog
- Astronomer Registry
- Apache Airflow GitHub issues and pull requests
- Apache Airflow Slack's #troubleshooting channel
- StackOverflow's Stack Exchange Data Dump
Generally, each of these sources has a DAG that handles the ingestion flow. We use LangChain's built-in text splitters for processing Markdown, RST, and Python code into smaller chunks to ensure each document is small enough to give accurate results when doing embeddings. We then use a Weaviate provider that we've built (and plan to publish) to both embed and store each document as a vector in Weaviate using OpenAI's embedding model.
In addition to the individual DAGs per source, we have one DAG to do full-database refreshes based on an offline archive of all data. This offline archive allows us to experiment with new vector databases, embedding models, chunking strategies, etc.
Ask Astro uses LangChain's ConversationalRetrievalChain
to generate a response. This chain does the following:
- Use an LLM to generate 3 variations of the original user prompt with different wording. This is to ensure we retrieve as much helpful context as possible from our vector database
- Embed each of the prompts with OpenAI's embeddings model
- Retrieve documents from Weaviate using the embedded vectors from each prompt
- Combine the original user prompt with relevant sources found from the vector database, and make an LLM call to generate an answer
This generally works well. For prompt rewording, we use gpt-3.5-turbo
, which runs very quickly and inexpensively. For the actual user-facing answer generation, we use gpt-4
to ensure high quality answers.
Airflow is critical in improving model performance over time. Feedback on answers come from two places:
- End users can explicitly supply feedback by rating an answer as correct or incorrect
- LLMs rate each question and answer according to it's "helpfulness", "publicness", and "on-topicness"
If a user provides feedback that the answer is correct, and the LLM rates the answer as helpful, public, and on-topic, Ask Astro (1) marks the answer as a good example to be displayed on the Ask Astro homepage for users to derive inspiration from, and (2) writes the question and answer back to Weaviate as a potential source to be used in future prompts. This way, there's a continuous feedback loop to constantly improve the model over time.
A project like Ask Astro is never "complete", and there are always more methods and use cases to explore. Further exploration in Ask Astro (and more generally, in using Airflow for LLM operations) may come from the following areas:
- Data Privacy & Private Context: can the model be supplemented with customer-specific data to provide more accurate, contextual answers?
- Fine Tuning: are there any performance or accuracy improvements that come from fine tuning a model?
- Semantic Caching: if there are two similar questions asked to Ask Astro, can we avoid expensive gpt-4 calls?
- Feedback Loops: what else can we do with information about the accuracy of answers? For example, can we infer source usefulness based on how frequently (or infrequently) a source is used in accurate answers?
- Experimentation: how do we effectively experiment and measure accuracy or performance improvements as a result of experiments?
- LLM Operations and Observability: how do we ensure we have good visibility into what the model's doing, particularly given that it's making multiple API & LLM calls?