Let document retrieval be more flexible #206

lgabs · 2024-06-07T20:25:05Z

Currently, the LCEL retriever in dialog-lib forces the document content to join question and content together:

https://github.com/talkdai/dialog-lib/blob/4e8de796be1a21c877eb393066a78235e6a193ac/dialog_lib/embeddings/retrievers.py#L31-L39

However, the user already defines which fields should be embedded in load_csv.py`, so this retriever should keep this choice with a simple return like

        return [
            Document(
                page_content=content.content,
                metadata={
                    "title": content.question,
                    "category": content.category,
                    "subcategory": content.subcategory,
                    "dataset": content.dataset,
                    "link": content.link,
                },
            )
            for content in relevant_contents
        ]

Moreover, since the default embedding way of langchain's CSVLoader is to already embedd the field name prefixed to the field value, e.g. category: cat1\nsubcategory: subcat1\ncontent: content1 (see this test), it already achieves the same idea that the current implementation does, but in generic way.

That proposition works normally with default project chains, while giving flexibility to users that would implement their own prompt design. For example, the project default RAG Chain has this format_docs:

dialog/src/dialog/llm/agents/lcel.py

Lines 60 to 61 in fbb13af

    
           def format_docs(docs): 
        
               return "\n\n".join([d.page_content for d in docs])

and users can customize this as they wish to achieve their ideas. Later, when we implement metadata saving to the vectorstore, we could even return other metadata dynamically as well.

The text was updated successfully, but these errors were encountered:

vmesel · 2024-06-13T20:54:28Z

@lgabs want to handle this change?

lgabs · 2024-06-13T20:55:39Z

Sure, I can do it

avelino added enhancement New feature or request embedding labels Jun 13, 2024

vmesel assigned lgabs Jun 14, 2024

lgabs mentioned this issue Jun 20, 2024

[WIP] make retriever page content just the content talkdai/dialog-lib#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let document retrieval be more flexible #206

Let document retrieval be more flexible #206

lgabs commented Jun 7, 2024 •

edited

Loading

vmesel commented Jun 13, 2024

lgabs commented Jun 13, 2024

Let document retrieval be more flexible #206

Let document retrieval be more flexible #206

Comments

lgabs commented Jun 7, 2024 • edited Loading

vmesel commented Jun 13, 2024

lgabs commented Jun 13, 2024

lgabs commented Jun 7, 2024 •

edited

Loading