Adding chat history to RAG app and refactor to better utilize LangChain #648

alpha-amundson · 2024-05-03T23:12:04Z

See commit log for full description. tl;dr: added chat history to rag-frontend app.

…eep track of and retrieve chat history from Cloud SQL. main.py - removed old langchain and logic to retrieve context. replaced with new chain from rag_chain.py. Introduced browser session with 30 minute ttl. Storing session ID in the session cookie. Session ID is then used to retrieve chat history. Chat history is cleared when timeout is reached. cloud_sql.py - now includes a method to create a PostgresEngine for storing and retrieving history, plus a CustomVectorStore to perform the query embedding and vector search. Old code paths no longer needed were removed. rag_chain.py - contains helper method create_chain to create, update and delete the end-to-end RAG chain with history. various tf files: increased max input and total tokens on HF TGI for mistral. threadded through some parameters needed to instantiate the PostgresEngine. requirements.txt - added some dependencies needed for langchain

applications/rag/frontend/container/main.py

imreddy13 · 2024-05-03T23:43:56Z

/gcbrun

Reverted breaking change to env var

alpha-amundson · 2024-05-06T22:33:59Z

/gcbrun

alpha-amundson · 2024-05-06T23:30:49Z

/gcbrun

* Working on improvements for rag application: - Working on missing TODO - Fixing issue with credentials - Refactoring vector_storages so you can add different vector storages TODO: Vector Storage factory - Unit test will be added on future PR * Updating changes with db * refactoring app so can be executed using gunicorn * refactory of the code as flask application package * Fixing Bugs - Reviewing issue with IPtypes, currently the fix is to validate if there's an development environment so a public cloud_sql instance can be use. - Fixing issue with Flask App Factory

german-grandas · 2024-07-12T20:52:09Z

/gcbrun

* Working on improvements for rag application: - Working on missing TODO - Fixing issue with credentials - Refactoring vector_storages so you can add different vector storages TODO: Vector Storage factory - Unit test will be added on future PR * Updating changes with db * refactoring app so can be executed using gunicorn * refactory of the code as flask application package * Fixing Bugs - Reviewing issue with IPtypes, currently the fix is to validate if there's an development environment so a public cloud_sql instance can be use. - Fixing issue with Flask App Factory * Working on Custom HuggingFace interface - Adding a custom chat model to send request to HuggingFace TGI API - Applying formatting to code.

applications/rag/frontend/container/main.py

* Working on improvements for rag application: - Working on missing TODO - Fixing issue with credentials - Refactoring vector_storages so you can add different vector storages TODO: Vector Storage factory - Unit test will be added on future PR * Updating changes with db * refactoring app so can be executed using gunicorn * refactory of the code as flask application package * Fixing Bugs - Reviewing issue with IPtypes, currently the fix is to validate if there's an development environment so a public cloud_sql instance can be use. - Fixing issue with Flask App Factory * Working on Custom HuggingFace interface - Adding a custom chat model to send request to HuggingFace TGI API - Applying formatting to code. * Improving the CloudSQL vector vector_storage

applications/rag/frontend/container/main.py

main.py

german-grandas · 2024-07-31T16:18:22Z

/gcbrun

german-grandas · 2024-08-01T14:55:43Z

/gcbrun

german-grandas · 2024-08-01T16:48:53Z

/gcbrun

german-grandas · 2024-08-09T16:05:12Z

Some prompt answer examples using meta-llama/Llama-2-7b-hf

Some prompt answer examples using meta-llama/Llama-2-7b-chat-hf

…atform/ai-on-gke into rag-langchain-chat-history

german-grandas · 2024-09-17T14:03:32Z

/gcbrun

german-grandas · 2024-09-19T14:23:39Z

Examples of answers.

Previous RAG without Chat History:

RAG with Chat History:

gongmax · 2024-09-19T17:57:42Z

@german-grandas, in the snapshot of "Previous RAG without Chat History", I saw there were some error thrown. I tried with the latest code on the main branch (i.e. Previous RAG without Chat History) and didn't see any error. Do you know what was going wrong? Below is my snapshot:

german-grandas · 2024-09-20T15:44:02Z

@gongmax In the example I'm deploying with the image "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:ec0e7b1ce6d0f9570957dd7fb3dcf0a16259cba915570846b356a17d6e377c59 the same image used on main applications/rag/frontend/main.tf.

Which image did you use on the test you mentioned?

gongmax · 2024-09-20T15:55:10Z

@gongmax In the example I'm deploying with the image "us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:ec0e7b1ce6d0f9570957dd7fb3dcf0a16259cba915570846b356a17d6e377c59 the same image used on main applications/rag/frontend/main.tf.

Which image did you use on the test you mentioned?

I didn't make any change, I just pulled the main branch and deploy the whole application. The image is same as you mentioned above us-central1-docker.pkg.dev/ai-on-gke/rag-on-gke/frontend@sha256:ec0e7b1ce6d0f9570957dd7fb3dcf0a16259cba915570846b356a17d6e377c59

german-grandas · 2024-09-20T16:09:14Z

It's odd, reviewing the logs I see a warning from the database and that's what the frontend is showing as part of the prompt response.

Not sure how to track this, any idea?

… into rag-langchain-chat-history

german-grandas · 2024-09-25T14:43:36Z

/gcbrun

… into rag-langchain-chat-history

gongmax · 2024-10-01T16:21:02Z

/gcbrun

gongmax · 2024-10-01T17:43:55Z

/gcbrun

german-grandas · 2024-10-02T18:56:53Z

/gcbrun

gongmax · 2024-10-03T16:38:12Z

/gcbrun

german-grandas · 2024-10-08T19:09:58Z

/gcbrun

german-grandas · 2024-10-21T18:07:51Z

Comments about improving the quality of the answer generation:

Currently using the model Mistral-7B-Instruct-v0.1 is showing a lack of quality in the answering of a given question, a first sight analysis is showing that the model is having issues handling the long prompt due to the inclusion of a longer context and the chat history, the following is an example of the current prompt for the question what's kubernetes?:

System: 
### [INST]
Instruction: Always assist with care, respect, and truth. Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content.
Ensure replies promote fairness and positivity.
Here is context to help:


Kubernetes offers features to help you run highly available applications even when you
introduce frequent voluntary disruptions.
As an application owner, you can create a PodDisruptionBudget (PDB) for each application. A
PDB limits the number of Pods of a replicated application that are down simultaneously from
voluntary disruptions. For example, a quorum-based application would like to ensure that the
number of replicas running is never brought below the number needed for a quorum. A web
front end might want to ensure that the number of replicas serving load never falls below a
certain percentage of the total.
Cluster managers and hosting providers should use tools which respect PodDisruptionBudgets
by calling the Eviction API  instead of directly deleting pods or deployments.
For example, the kubectl drain  subcommand lets you mark a node as going out of service.
When you run kubectl drain , the tool tries to evict all of the Pods on the Node you're taking out

topologyKey : topology.kubernetes.io/zone
    podAntiAffinity :• 
•

kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller#what-is-a-
replicationcontroller
availableReplicas  (int32)
The number of available replicas (ready for at least minReadySeconds) for this replication
controller.• 
• 
• 
• 
• 
•


[/INST]
Here's the previous messages so you can have context about what the user have ask you:

Human: What's kubernetes?

This is the answer for the previous prompt:

In a further exercise a most robust model like gemini-1.0-pro-002 from VertexAI where used showing a huge improvement on the managing of the given context and in the question answering:

Might be worthing exploring the migration to VertexAI instead of continue using open source models from huggingface like Mistral.

gongmax · 2024-10-21T18:49:53Z

Can you adjust the length of the chat history included in the context and see how it can impact the response?
Besides, any insight why it always includes 'AI' at the beginning of the response? Anything to do with how you construct the prompt and instruction?

german-grandas · 2024-10-22T15:50:58Z

Can you adjust the length of the chat history included in the context and see how it can impact the response? Besides, any insight why it always includes 'AI' at the beginning of the response? Anything to do with how you construct the prompt and instruction?

There's not any improvement with the comments you suggested, I sustain the hypothesis about the LLM not able to support the context or respond to the given prompt.

Regarding the LLM including 'AI' at the beginning of the response, looks like is the way how the LLM generates the answer, I include instructions in the prompt to not do that and the LLM continues generating content as it was a conversational agent and not a generation LLM.

… into rag-langchain-chat-history

german-grandas · 2024-10-24T23:53:51Z

/gcbrun

gongmax · 2024-10-29T16:23:29Z

Quote from chat with @german-grandas to keep track: "I made an update into the inference service so the LLM can support the generation of a longer answer. that fixed the issue with the short generation of the rag system when you submitted a question."

alpha-amundson mentioned this pull request May 3, 2024

Adding chat history to RAG frontend app #586

Closed

github-advanced-security bot found potential problems May 3, 2024

View reviewed changes

applications/rag/frontend/container/main.py Fixed Show fixed Hide fixed

alpha-amundson requested a review from imreddy13 May 3, 2024 23:15

alpha-amundson changed the title ~~Also introduced a basic session history mechanism in the browser to k…~~ Adding chat history to RAG app and refactor to better utilize LangChain May 3, 2024

tflint formatting fixes

5cc85b9

nstogner and others added 3 commits May 6, 2024 14:55

TPU Provisioner: JobSet related fixes (#645)

6898666

Updated image to use code in this branch

1d6c052

Reverted breaking change to env var

making tflint happy

981e777

github-advanced-security bot found potential problems Jul 22, 2024

View reviewed changes

applications/rag/frontend/container/main.py Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems Jul 29, 2024

View reviewed changes

applications/rag/frontend/container/main.py Dismissed Show dismissed Hide dismissed

Fixing issues and updating chat history on frontend

e750d12

github-advanced-security bot found potential problems Jul 31, 2024

View reviewed changes

main.py Fixed Show fixed Hide fixed

main.py Fixed Show fixed Hide fixed

Fixing files on working tree

a000c46

Ignoring test rag, to review how the rag application is working

0d853ea

ignoring unit test to review cloud build process

386c437

german-grandas added 2 commits August 6, 2024 12:01

refactoring cloud sql connection helper

be1839d

Merge branch 'main' into rag-langchain-chat-history

7f081ff

german-grandas requested a review from roberthbailey August 9, 2024 23:07

german-grandas added 3 commits September 16, 2024 10:27

Merge branch 'rag-langchain-chat-history' of github.com:GoogleCloudPl…

e9a79ce

…atform/ai-on-gke into rag-langchain-chat-history

resolving conflicts

7fe461c

Delete applications/rag/example_notebooks/ingest_database.ipynb

c489d73

german-grandas added 2 commits September 25, 2024 14:41

updating Embedding model with missing column

aa44fcb

Merge branch 'main' of https://github.com/GoogleCloudPlatform/ai-on-gke…

5882e28

… into rag-langchain-chat-history

german-grandas mentioned this pull request Sep 26, 2024

RAG application leaks DB connection objects #725

Open

Merge branch 'main' of https://github.com/GoogleCloudPlatform/ai-on-gke…

e1b8e50

… into rag-langchain-chat-history

german-grandas added 2 commits October 2, 2024 18:30

Updating packages, improving chain prompt

0ef245f

updating rag frontend sha

8558a46

updating column name

44b3e72

german-grandas requested a review from gongmax October 10, 2024 13:38

german-grandas added 2 commits October 24, 2024 23:53

updating max tokens lenght for inference service

ab06c07

Merge branch 'main' of https://github.com/GoogleCloudPlatform/ai-on-gke…

4209af8

… into rag-langchain-chat-history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding chat history to RAG app and refactor to better utilize LangChain #648

Adding chat history to RAG app and refactor to better utilize LangChain #648

alpha-amundson commented May 3, 2024

imreddy13 commented May 3, 2024

alpha-amundson commented May 6, 2024

alpha-amundson commented May 6, 2024

german-grandas commented Jul 12, 2024

german-grandas commented Jul 31, 2024

german-grandas commented Aug 1, 2024

german-grandas commented Aug 1, 2024

german-grandas commented Aug 9, 2024

german-grandas commented Sep 17, 2024

german-grandas commented Sep 19, 2024

gongmax commented Sep 19, 2024 •

edited

Loading

german-grandas commented Sep 20, 2024

gongmax commented Sep 20, 2024

german-grandas commented Sep 20, 2024

german-grandas commented Sep 25, 2024

gongmax commented Oct 1, 2024

gongmax commented Oct 1, 2024

german-grandas commented Oct 2, 2024

gongmax commented Oct 3, 2024

german-grandas commented Oct 8, 2024

german-grandas commented Oct 21, 2024

gongmax commented Oct 21, 2024

german-grandas commented Oct 22, 2024

german-grandas commented Oct 24, 2024

gongmax commented Oct 29, 2024

Adding chat history to RAG app and refactor to better utilize LangChain #648

Are you sure you want to change the base?

Adding chat history to RAG app and refactor to better utilize LangChain #648

Conversation

alpha-amundson commented May 3, 2024

imreddy13 commented May 3, 2024

alpha-amundson commented May 6, 2024

alpha-amundson commented May 6, 2024

german-grandas commented Jul 12, 2024

german-grandas commented Jul 31, 2024

german-grandas commented Aug 1, 2024

german-grandas commented Aug 1, 2024

german-grandas commented Aug 9, 2024

german-grandas commented Sep 17, 2024

german-grandas commented Sep 19, 2024

gongmax commented Sep 19, 2024 • edited Loading

german-grandas commented Sep 20, 2024

gongmax commented Sep 20, 2024

german-grandas commented Sep 20, 2024

german-grandas commented Sep 25, 2024

gongmax commented Oct 1, 2024

gongmax commented Oct 1, 2024

german-grandas commented Oct 2, 2024

gongmax commented Oct 3, 2024

german-grandas commented Oct 8, 2024

german-grandas commented Oct 21, 2024

gongmax commented Oct 21, 2024

german-grandas commented Oct 22, 2024

german-grandas commented Oct 24, 2024

gongmax commented Oct 29, 2024

gongmax commented Sep 19, 2024 •

edited

Loading