You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
Encountered an error when running the generate_text_embeddings function in the pipeline. The error log indicates a mismatch between the length of values (338) and the index (366) when attempting to set values in a DataFrame.
Error Details: The error traceback provided below highlights the issue occurring within the generate_text_embeddings function in generate_text_embeddings.py:
ValueError: Length of values (338) does not match length of index (366)
The pipeline fails at the generate_text_embeddings step with the error message.
LOG:
{
"type": "error",
"data": "Error executing verb "generate_text_embeddings" in generate_text_embeddings: Length of values (338) does not match length of index (366)",
"stack": "Traceback (most recent call last):\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\run.py", line 269, in run_pipeline\n result = await _process_workflow(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\workflow.py", line 105, in _process_workflow\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
Using azure for both llm and embedding with text-embedding-ada-002
Steps to reproduce
running graphrag index with version of 0.4.0
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
Logs and screenshots
No response
Additional Information
GraphRAG Version:
Operating System:
Python Version:
Related Issues:
The text was updated successfully, but these errors were encountered:
zz44l43
added
bug
Something isn't working
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Nov 8, 2024
Do you need to file an issue?
Describe the bug
Encountered an error when running the generate_text_embeddings function in the pipeline. The error log indicates a mismatch between the length of values (338) and the index (366) when attempting to set values in a DataFrame.
Error Details: The error traceback provided below highlights the issue occurring within the generate_text_embeddings function in generate_text_embeddings.py:
ValueError: Length of values (338) does not match length of index (366)
pipeline with the following steps:
create_base_text_units
create_base_entity_graph
create_final_entities
create_final_nodes
create_final_communities
create_final_relationships
create_final_text_units
create_final_community_reports
create_final_documents
generate_text_embeddings
The pipeline fails at the generate_text_embeddings step with the error message.
LOG:
{
"type": "error",
"data": "Error executing verb "generate_text_embeddings" in generate_text_embeddings: Length of values (338) does not match length of index (366)",
"stack": "Traceback (most recent call last):\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\run.py", line 269, in run_pipeline\n result = await _process_workflow(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\workflow.py", line 105, in _process_workflow\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
Using azure for both llm and embedding with text-embedding-ada-002
Steps to reproduce
running graphrag index with version of 0.4.0
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: