Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 'sub_community' column does not exist #1410

Open
2 of 3 tasks
adni03 opened this issue Nov 15, 2024 · 1 comment
Open
2 of 3 tasks

[Bug]: 'sub_community' column does not exist #1410

adni03 opened this issue Nov 15, 2024 · 1 comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@adni03
Copy link

adni03 commented Nov 15, 2024

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When following the Global Search notebook (link), I am getting a KeyError raised from the read_indexer_communities method.

KeyError: "Column(s) ['sub_community'] do not exist"

I also noticed that read_indexer_communities method was not present when I installed graphrag via pip. When following the notebook, I copied the method manually into the file and re-ran indexing (deleted the cache and output dirs) before re-indexing.

Image
At line 227

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: azure_openai_chat
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4096
  # request_timeout: 180.0
  api_base: <REDACTED>
  api_version: 2024-02-15-preview
  deployment_name: <REDACTED>
  temperature: 0 # temperature for sampling
  top_p: 0.999 # top-p sampling
  n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  vector_store:
    type: lancedb
    db_uri: 'output/lancedb'
    container_name: default # A prefix for the vector store to create embedding containers. Default: 'default'.
    overwrite: true
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: azure_openai_embedding
    api_base: <REDACTED>
    api_version: 2024-02-15-preview
    deployment_name: <REDACTED>

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"

storage:
  type: file # or blob
  base_dir: "output"

reporting:
  type: file # or console, blob
  base_dir: "logs"

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 0.4.1
  • Operating System: MacOS
  • Python Version: 3.12.5
  • Related Issues:
@adni03 adni03 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Nov 15, 2024
@escano0
Copy link

escano0 commented Nov 25, 2024

same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

2 participants