initial implementation for vectordb #3879

lspinheiro · 2024-10-22T13:30:54Z

Why are these changes needed?

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

lspinheiro · 2024-10-22T13:32:13Z

@jackgerrits @ekzhu , wondering if storage should be an abstraction in the core package and only the implementation should go into autogen-ext. Any thoughts? Still an early draft, just looking for some feedback on the design. I think libraries like langchain and llamaindex have similar abstractions which are useful for rag but could have other uses for for agents that do few-shot prompting

thinkall

Thanks @lspinheiro for the PR! I've just minor suggestions, others look good to me.

python/packages/autogen-ext/pyproject.toml

python/uv.lock

ekzhu

Can we rebase the PR branch to main? right now there are too many merge conflicts

lspinheiro · 2024-10-25T22:22:26Z

@ekzhu , small nudge. This should be ready for review now.

thinkall

Thank you @lspinheiro , LGTM!

colombod · 2024-10-29T00:40:06Z

python/packages/autogen-ext/src/autogen_ext/storage/_base.py

+    """Define Document according to autogen 0.4 specifications."""
+
+    id: ItemID
+    content: Optional[str] = None


why assuming string?

This is the same representation used in 0.2 and I believe it assumes the content has already been preprocessed to a string format. More complex data types can lead to more complex operations in general. Would you suggest something different?

the vector store can have a payload object, which is usually a json. Unless you know the schema mapping returning string is a problem, maybe we can use dictionary or some other model?

I don't think this is meant to capture the full payload but only the main text content of the associated document and the transformation between autogen doc and vectordb doc is handled in each implementation. The pydantic model is defined with arbitrary_types_allowed which should allow implementations to unpack additional attributes as needed at the document level instead of the content. Would that work for the scenario you have in mind?

colombod · 2024-10-29T00:41:28Z

python/packages/autogen-ext/src/autogen_ext/storage/_base.py

+    type: str = ""
+    embedding_function: Optional[Callable[..., Any]] = None  # embeddings = embedding_function(sentences)
+
+    async def create_collection(self, collection_name: str, overwrite: bool = False, get_or_create: bool = True) -> Any:


some vector store requires schema to create collection and some other parameters, may using variable args here can help generalise

That is a great callout. Checked the old code and those additional arguments were being passed in through the class constructor. Not the best design. Pushing a change as soon as the checks complete.

Should be updated now

colombod · 2024-10-29T00:42:34Z

python/packages/autogen-ext/src/autogen_ext/storage/_base.py

+        """
+        ...
+
+    def update_docs(self, docs: List[Document], collection_name: Optional[str] = None, **kwargs: Any) -> None:


the most used method is upsert

For the protocol, my understanding is that the idea is to keep it closer in design the DB theory and enforce CRUD like operations. Implementations can add upsert but they probably shouldn't be forced to

lspinheiro requested a review from thinkall October 22, 2024 22:02

thinkall reviewed Oct 23, 2024

View reviewed changes

python/packages/autogen-ext/pyproject.toml Outdated Show resolved Hide resolved

python/uv.lock Outdated Show resolved Hide resolved

ekzhu requested changes Oct 23, 2024

View reviewed changes

initial vectordb storage

7da80e9

lspinheiro force-pushed the lpinheiro/feat/add-vectordb-chroma-store branch from 22cb336 to 7da80e9 Compare October 23, 2024 21:19

thinkall and others added 3 commits October 24, 2024 12:20

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

f4f3c3f

fix autogen-core dep

347fb60

typing fixes

4b886b0

lspinheiro force-pushed the lpinheiro/feat/add-vectordb-chroma-store branch 2 times, most recently from 22cb336 to 4b886b0 Compare October 24, 2024 04:32

lpinheiroms added 2 commits October 24, 2024 14:33

fix pyproject version

b9b72d6

update shared functions

ae4d8ae

lspinheiro marked this pull request as ready for review October 24, 2024 06:48

lspinheiro and others added 3 commits October 24, 2024 16:48

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

9fdc30c

update lock file

1d59e51

mypy fixes

299a2eb

lspinheiro requested review from thinkall and ekzhu October 24, 2024 07:20

lpinheiroms and others added 5 commits October 25, 2024 12:41

update tests

2354a49

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

1a147cb

fix tests

ab7fd07

add parallel test setup

5745849

test fix

4391961

merge main

b3f0672

lspinheiro force-pushed the lpinheiro/feat/add-vectordb-chroma-store branch from 8bb13f0 to b3f0672 Compare October 27, 2024 01:49

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

0eba50a

thinkall approved these changes Oct 28, 2024

View reviewed changes

colombod reviewed Oct 29, 2024

View reviewed changes

lspinheiro and others added 4 commits October 29, 2024 10:50

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

d427e0b

add create collection kwargs

5116a52

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

fc56024

Merge branch 'main' into lpinheiro/feat/add-vectordb-chroma-store

0b952d4

MohMaz added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Nov 26, 2024

MohMaz assigned lspinheiro Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial implementation for vectordb #3879

initial implementation for vectordb #3879

lspinheiro commented Oct 22, 2024

lspinheiro commented Oct 22, 2024 •

edited

Loading

thinkall left a comment •

edited

Loading

ekzhu left a comment

lspinheiro commented Oct 25, 2024

thinkall left a comment

colombod Oct 29, 2024

lspinheiro Oct 29, 2024

colombod Oct 29, 2024

lspinheiro Oct 29, 2024

colombod Oct 29, 2024

lspinheiro Oct 29, 2024

lspinheiro Oct 29, 2024

colombod Oct 29, 2024

lspinheiro Oct 29, 2024

initial implementation for vectordb #3879

Are you sure you want to change the base?

initial implementation for vectordb #3879

Conversation

lspinheiro commented Oct 22, 2024

Why are these changes needed?

Related issue number

Checks

lspinheiro commented Oct 22, 2024 • edited Loading

thinkall left a comment • edited Loading

Choose a reason for hiding this comment

ekzhu left a comment

Choose a reason for hiding this comment

lspinheiro commented Oct 25, 2024

thinkall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lspinheiro commented Oct 22, 2024 •

edited

Loading

thinkall left a comment •

edited

Loading