Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mismatch in Length of Values and Index in generate_text_embeddings #1382

Open
2 of 3 tasks
zz44l43 opened this issue Nov 8, 2024 · 4 comments
Open
2 of 3 tasks
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@zz44l43
Copy link

zz44l43 commented Nov 8, 2024

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Encountered an error when running the generate_text_embeddings function in the pipeline. The error log indicates a mismatch between the length of values (338) and the index (366) when attempting to set values in a DataFrame.

Error Details: The error traceback provided below highlights the issue occurring within the generate_text_embeddings function in generate_text_embeddings.py:
ValueError: Length of values (338) does not match length of index (366)

pipeline with the following steps:

create_base_text_units
create_base_entity_graph
create_final_entities
create_final_nodes
create_final_communities
create_final_relationships
create_final_text_units
create_final_community_reports
create_final_documents
generate_text_embeddings

The pipeline fails at the generate_text_embeddings step with the error message.

LOG:
{
"type": "error",
"data": "Error executing verb "generate_text_embeddings" in generate_text_embeddings: Length of values (338) does not match length of index (366)",
"stack": "Traceback (most recent call last):\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\run.py", line 269, in run_pipeline\n result = await _process_workflow(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\workflow.py", line 105, in _process_workflow\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}

Using azure for both llm and embedding with text-embedding-ada-002

Steps to reproduce

running graphrag index with version of 0.4.0

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here

Logs and screenshots

No response

Additional Information

  • GraphRAG Version:
  • Operating System:
  • Python Version:
  • Related Issues:
@zz44l43 zz44l43 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Nov 8, 2024
@davidloiret
Copy link

davidloiret commented Nov 11, 2024

Hello,

I'm encountering the same error when following the "Getting Started" guide. Specifically, the issue arises when using the input text downloaded from:

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt

Any guidance on how to resolve this would be greatly appreciated. Thank you!

@Match-Yang
Copy link

Hello,

I'm encountering the same error when following the "Getting Started" guide. Specifically, the issue arises when using the input text downloaded from:

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt
Any guidance on how to resolve this would be greatly appreciated. Thank you!

Same!!! Need help

@skyqqcloud
Copy link

same problem with graphrag v0.5

@dipakmeher
Copy link

dipakmeher commented Nov 22, 2024

This issue got resolved for me when I changed chunk size from 1200 to 300 with Open AI embedding model and llama 3 for text generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

5 participants