MultiPDF is a Streamlit-based application that allows users to upload and interact with multiple PDF documents. The application leverages LangChain, OpenAI, and FAISS to create a conversational agent capable of answering questions based on the content of the uploaded PDFs.
- PDF Upload: Upload multiple PDF documents for processing.
- Text Chunking: Split the text content of the PDFs into manageable chunks.
- Vector Store Creation: Create a vector store using embeddings to facilitate efficient document retrieval.
- Conversational Interface: Engage in a conversation with the chatbot to ask questions about the uploaded documents.
- Document Retrieval: Retrieve and display relevant document content based on the user's questions.
- Python 3.7+
- An OpenAI API key
- A .env file containing your OpenAI API key
-
Clone the repository:
git clone https://github.com/yourusername/multipdf-chatbot.git cd multipdf-chatbot
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the root directory and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key
-
Run the Streamlit application:
streamlit run app.py
-
Open the application in your web browser. You should see the MultiPDF interface.
-
Use the sidebar to upload your PDF documents.
-
After uploading, click on the "Process" button to process the PDFs.
-
Enter your questions in the input field to interact with the chatbot and retrieve information from the uploaded PDFs.
get_pdfs_as_documents(pdf_docs)
: Loads and splits the content of the uploaded PDFs into pages.get_text_chunks(text)
: Splits the text into chunks for processing.get_vector_store(pdf_docs)
: Creates a vector store from the PDF documents using embeddings.get_conversation_chain(vector_store)
: Sets up the conversational chain using a language model and a vector store retriever.handle_userinput(user_question)
: Handles user input, retrieves relevant documents, and displays the conversation history.main()
: The main function that initializes the Streamlit app, handles file uploads, and sets up the conversation chain.
The application uses HTML templates for styling the chat interface, including user messages, bot responses, and metadata display.
Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.
This project is licensed under the MIT License.