Indexer Error #464

kinfey · 2024-07-09T17:03:01Z

Describe the bug

When I run Indexer, it always give me this error


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/series.py\", line 4924, in apply\n    ).apply()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1427, in apply\n    return self.apply_standard()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n    mapped = obj._map_values(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/base.py\", line 921, in _map_values\n    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n    return lib.map_infer(values, mapper, convert=convert)\n  File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in <lambda>\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n    clusters = run_leiden(graph, strategy)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n    node_id_to_community_map = _compute_leiden_communities(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n    community_mapping = hierarchical_leiden(\n  File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x330776e60>\", line 304, in hierarchical_leiden\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n    hierarchical_clusters_native = gn.hierarchical_leiden(\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: phi-3-mini
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 13000
  request_timeout: 2800.0
  api_base: http://localhost:5146/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: jinaai
    request_timeout: 2800.0
    api_base: http://localhost:5146/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000
  model_supports_json: false 

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

GraphRAG Version: 0.1.1
Operating System: macOS
Python Version: 3.10.12
Related Issues:

The text was updated successfully, but these errors were encountered:

sriharshaguthikonda · 2024-07-09T20:54:43Z

got similar error...let me know if you find a solution...Tqs in advance.!!

AlonsoGuevara · 2024-07-09T21:54:38Z

Hi

My general rule of thumb when facing this issue is:

Check the outputs of the entity extraction, this will show if the graph is empty
If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check and share any of your llm responses from the cache folder?

kinfey · 2024-07-09T23:32:47Z

cache.zip @AlonsoGuevara this is my cache folder

cove9988 · 2024-07-10T01:37:08Z

check your log report at /outputs/latestdate-time/reports. mostly the llm was not working...

kinfey · 2024-07-11T01:34:09Z

but it can gen something in backend

Nuclear6 · 2024-07-11T03:45:54Z

Can you post the detailed log file?

natoverse · 2024-07-22T23:22:33Z

Consolidating alternate model issues here: #657

kinfey added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 9, 2024

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024

natoverse added community_support Issue handled by community members and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer Error #464

Indexer Error #464

kinfey commented Jul 9, 2024

sriharshaguthikonda commented Jul 9, 2024

AlonsoGuevara commented Jul 9, 2024

kinfey commented Jul 9, 2024

cove9988 commented Jul 10, 2024

kinfey commented Jul 11, 2024

Nuclear6 commented Jul 11, 2024

natoverse commented Jul 22, 2024

Indexer Error #464

Indexer Error #464

Comments

kinfey commented Jul 9, 2024

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

sriharshaguthikonda commented Jul 9, 2024

AlonsoGuevara commented Jul 9, 2024

kinfey commented Jul 9, 2024

cove9988 commented Jul 10, 2024

kinfey commented Jul 11, 2024

Nuclear6 commented Jul 11, 2024

natoverse commented Jul 22, 2024