Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add BYO vectors ingestion tutorial #115112

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

leemthompo
Copy link
Contributor

@leemthompo leemthompo commented Oct 18, 2024

  • Adds a new bite-sized tutorial to Search your data > Semantic search
  • This is a toy example to learn syntax of ingesting a set of existing vectors. Tries to add enough links to relevant material for follow-up without too much cognitive overload.
  • Don't want to overload with information about the knn search side of things, but still making sure users can get where they need to next if they wanna drill down.

Screenshot (while URL preview loads)

BYO-vectors

@leemthompo leemthompo added the >docs General docs changes label Oct 18, 2024
@leemthompo leemthompo self-assigned this Oct 18, 2024
Copy link
Contributor

Documentation preview:

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

<titleabbrev>Bring your own vector embeddings</titleabbrev>
++++

This tutorial demonstrates how to index documents that already have dense vector embeddings into {es}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding an example for sparse_vector embeddings here as well?

"properties": {
"review_vector": {
"type": "dense_vector",
"dims": 8, <1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could technically omit some of these, as dims can be dynamically calculated.

PUT /amazon-reviews/_doc/1
{
"review_text": "This product is lifechanging! I'm telling all my friends about it.",
"review_vector": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note here emphasizing that the size of the review_vector array is 8 matching the dims count?

[[bring-your-own-vectors-search-documents]]
=== Step 3: Search documents with embeddings

Now you can query these document vectors using a <<knn-retriever,`knn` retriever>>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see retriever examples! 🎉

}
}
----
// TEST[skip:flakeyknnerror]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test flakey?

}
----
// TEST[skip:flakeyknnerror]
<1> In this toy example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<1> In this toy example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.
<1> In this simple example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.


This was a simple example to help you understand the syntax for indexing a set of existing embeddings into {es}.

In this toy example, we're sending a raw vector for the query text.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this toy example, we're sending a raw vector for the query text.
In this simple example, we're sending a raw vector for the query text.

In a real-world scenario you won't know the query text ahead of time.
You'll need to generate vectors for queries, on the fly, using an embedding model.

For this you'll need to deploy a text embedding model in {es} and use the <<knn-query-top-level-parameters,`query_vector_builder` parameter>>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also legitimate to do this client side and just send the vectors in with the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants