You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked #657 to validate if my issue is covered by community support
Describe the issue
When I'm creating the graph data by GraphRAG, no matter where I provide a general response role description (entity_extraction and summarize_description), and then re-generate the whole graph data, the general prompt role does not working still, is there any location or any way to add a general response prompt role description to restrict the query result? Thanks in advanced.
Steps to reproduce
No response
GraphRAG Config Used
encoding_model: cl100k_baseskip_workflows: []llm:
api_key: ${GRAPHRAG_API_KEY}type: azure_openai_chat # or azure_openai_chatmodel: gpt-4omodel_supports_json: true # recommended if this is available for your model.# max_tokens: 4000# request_timeout: 180.0api_base: https://api.nlp.dev.uptimize.merckgroup.comapi_version: 2023-09-01-preview# organization: <organization_id>deployment_name: gpt-4o# tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be made# temperature: 0 # temperature for sampling# top_p: 1 # top-p sampling# n: 1 # Number of completions to generateparallelization:
stagger: 0.3# num_threads: 50 # the number of threads to use for parallel processingasync_mode: threaded # or asyncioembeddings:
## parallelization: override the global parallelization settings for embeddingsasync_mode: threaded # or asynciollm:
api_key: ${GRAPHRAG_API_KEY}type: azure_openai_embedding # or azure_openai_embeddingmodel: text-embedding-3-largeapi_base: https://api.nlp.dev.uptimize.merckgroup.comapi_version: 2023-09-01-preview# organization: <organization_id>deployment_name: text-embedding-3-large# tokens_per_minute: 150_000 # set a leaky bucket throttle# requests_per_minute: 10_000 # set a leaky bucket throttle# max_retries: 10# max_retry_wait: 10.0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be made# batch_size: 16 # the number of documents to send in a single request# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request# target: required # or optionalchunks:
size: 1200overlap: 100group_by_columns: [id] # by default, we don't allow chunks to cross documentsinput:
type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"cache:
type: file # or blobbase_dir: "cache"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>storage:
type: file # or blobbase_dir: "output/${timestamp}/artifacts"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>reporting:
type: file # or console, blobbase_dir: "output/${timestamp}/reports"# connection_string: <azure_blob_storage_connection_string># container_name: <azure_blob_storage_container_name>entity_extraction:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "custom_prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 1summarize_descriptions:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "custom_prompts/summarize_descriptions.txt"max_length: 500claim_extraction:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this task# enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 1community_reports:
## llm: override the global llm settings for this task## parallelization: override the global parallelization settings for this task## async_mode: override the global async_mode settings for this taskprompt: "custom_prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:
max_cluster_size: 10embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes# num_walks: 10# walk_length: 40# window_size: 2# iterations: 3# random_seed: 597832umap:
enabled: false # if true, will generate UMAP embeddings for nodessnapshots:
graphml: falseraw_entities: falsetop_level_nodes: falselocal_search:
# text_unit_prop: 0.5# community_prop: 0.1# conversation_history_max_turns: 5# top_k_mapped_entities: 10# top_k_relationships: 10# llm_temperature: 0 # temperature for sampling# llm_top_p: 1 # top-p sampling# llm_n: 1 # Number of completions to generate# max_tokens: 12000global_search:
# llm_temperature: 0 # temperature for sampling# llm_top_p: 1 # top-p sampling# llm_n: 1 # Number of completions to generate# max_tokens: 12000# data_max_tokens: 12000# map_max_tokens: 1000# reduce_max_tokens: 2000# concurrency: 32
Logs and screenshots
No response
Additional Information
GraphRAG Version:
Operating System:
Python Version:
Related Issues:
The text was updated successfully, but these errors were encountered:
wolfhawkld
added
the
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
label
Aug 14, 2024
As I undertand, the issue is related to GraphRAG not picking your custom prompts, right?
Could you please share the prompt_tune command you're using?
Thanks!
natoverse
added
awaiting_response
Maintainers or community have suggested solutions or requested info, awaiting filer response
and removed
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Aug 16, 2024
Thank you for replying me back, this is my cmd:
python -m graphrag.prompt_tune --root .\playground\langchain\knowledge_base\graph_base --domain 'a List of Prohibited Ingredients for Cosmetics' --method random --limit 2 --chunk-size 1200 --output custom_prompts
And I've found after tuning a new custom prompts, I need to re-generate the graph data again to make the prompts working right, am I right?
Is there an existing issue for this?
Describe the issue
When I'm creating the graph data by GraphRAG, no matter where I provide a general response role description (entity_extraction and summarize_description), and then re-generate the whole graph data, the general prompt role does not working still, is there any location or any way to add a general response prompt role description to restrict the query result? Thanks in advanced.
Steps to reproduce
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: