[DOCS] Add BYO vectors ingestion tutorial #115112

leemthompo · 2024-10-18T14:26:50Z

Adds a new bite-sized tutorial to Search your data > Semantic search
This is a toy example to learn syntax of ingesting a set of existing vectors. Tries to add enough links to relevant material for follow-up without too much cognitive overload.
Don't want to overload with information about the knn search side of things, but still making sure users can get where they need to next if they wanna drill down.

Screenshot (while URL preview loads)

github-actions · 2024-10-18T14:27:04Z

Documentation preview:

✨ Changed pages

kderusso

Nice work!

kderusso · 2024-10-18T17:50:27Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+<titleabbrev>Bring your own vector embeddings</titleabbrev>
++++
+
+This tutorial demonstrates how to index documents that already have dense vector embeddings into {es}.


Is it worth adding an example for sparse_vector embeddings here as well?

kderusso · 2024-10-18T17:52:51Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+    "properties": {
+      "review_vector": {
+        "type": "dense_vector",
+        "dims": 8, <1>


We could technically omit some of these, as dims can be dynamically calculated.

kderusso · 2024-10-18T17:53:41Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+PUT /amazon-reviews/_doc/1
+{
+  "review_text": "This product is lifechanging! I'm telling all my friends about it.",
+  "review_vector": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]


Maybe add a note here emphasizing that the size of the review_vector array is 8 matching the dims count?

kderusso · 2024-10-18T17:54:22Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+[[bring-your-own-vectors-search-documents]]
+=== Step 3: Search documents with embeddings
+
+Now you can query these document vectors using a <<knn-retriever,`knn` retriever>>.


Nice to see retriever examples! 🎉

kderusso · 2024-10-18T17:55:02Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+  }
+}
+----
+// TEST[skip:flakeyknnerror]


Why is this test flakey?

kderusso · 2024-10-18T17:55:41Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+}
+----
+// TEST[skip:flakeyknnerror]
+<1> In this toy example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.


Suggested change

<1> In this toy example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.

<1> In this simple example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.

kderusso · 2024-10-18T17:56:05Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+
+This was a simple example to help you understand the syntax for indexing a set of existing embeddings into {es}.
+
+In this toy example, we're sending a raw vector for the query text.


Suggested change

In this toy example, we're sending a raw vector for the query text.

In this simple example, we're sending a raw vector for the query text.

kderusso · 2024-10-18T17:56:52Z

docs/reference/search/search-your-data/ingest-vectors.asciidoc

+In a real-world scenario you won't know the query text ahead of time.
+You'll need to generate vectors for queries, on the fly, using an embedding model.
+
+For this you'll need to deploy a text embedding model in {es} and use the <<knn-query-top-level-parameters,`query_vector_builder` parameter>>.


It's also legitimate to do this client side and just send the vectors in with the request.

leemthompo added 2 commits October 18, 2024 16:23

[DOCS] Add BYO vectors ingestion tutorial

887e3c3

Del whitespace

a1ec5a2

leemthompo added the >docs General docs changes label Oct 18, 2024

leemthompo self-assigned this Oct 18, 2024

elasticsearchmachine added the v9.0.0 label Oct 18, 2024

Comment out flakey test, update ids

6490c55

kderusso reviewed Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] Add BYO vectors ingestion tutorial #115112

[DOCS] Add BYO vectors ingestion tutorial #115112

leemthompo commented Oct 18, 2024 •

edited

Loading

github-actions bot commented Oct 18, 2024

kderusso left a comment

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

kderusso Oct 18, 2024

	<1> In this toy example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.
	<1> In this simple example, we're sending a raw vector as the query text. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.


		This was a simple example to help you understand the syntax for indexing a set of existing embeddings into {es}.

		In this toy example, we're sending a raw vector for the query text.

	In this toy example, we're sending a raw vector for the query text.
	In this simple example, we're sending a raw vector for the query text.

[DOCS] Add BYO vectors ingestion tutorial #115112

Are you sure you want to change the base?

[DOCS] Add BYO vectors ingestion tutorial #115112

Conversation

leemthompo commented Oct 18, 2024 • edited Loading

Screenshot (while URL preview loads)

github-actions bot commented Oct 18, 2024

kderusso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leemthompo commented Oct 18, 2024 •

edited

Loading