Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ollama][Other] GraphRAG OSS LLM community support #339

Closed
samalanubhab opened this issue Jul 3, 2024 · 68 comments
Closed

[Ollama][Other] GraphRAG OSS LLM community support #339

samalanubhab opened this issue Jul 3, 2024 · 68 comments
Labels
community_support Issue handled by community members

Comments

@samalanubhab
Copy link

What I tried:
I ran this on my local GPU and and tried replacing the api_base to a model served on ollama in settings.yaml file.
model: llama3:latest
api_base: http://localhost:11434/v1 #https://.openai.azure.com

Error:
graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={'input': '\n-Goal-\nGiven a text document that is pot....}

Commands:
#initialize
python -m graphrag.index --init --root .

#index
python -m graphrag.index --root .

#query
python -m graphrag.query --root . --method global "query"

#query
python -m graphrag.query --root . --method local "query"

Does graphrag support other llm hosted server frameworks?

@aaronrsiphone
Copy link

Calm down no need to yell.

Looking at the logs it looks like they are removing the port from the api_base.

settings.yaml -> api_base: "http://127.0.0.1:5000/v1"
erros in logs: http://127.0.0.1/v1/chat/completions

@dx111ge
Copy link

dx111ge commented Jul 3, 2024

Calm down no need to yell.

Looking at the logs it looks like they are removing the port from the api_base.

settings.yaml -> api_base: "http://127.0.0.1:5000/v1" erros in logs: http://127.0.0.1/v1/chat/completions

Sorry for question this, i have the same error and I'm not sure what you try to tell ? at least i did try
api_base: http://127.0.0.1:5000/v1 and api_base: http://localhost:11434/v1 but in both cases same error (btw: LLm -> llama3 and embedding nomic-embed-text)
Thanks for clarification

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

This is anoying... I just tried switching to ollama because... my 1st attempt at running the solution against chat-gpt costed me 45$ and did not work at the end... so I don't want to waste money testing things like that. I would rather take it slow and steady locally until I get the hang of it and switch to a paid model if needed...

How can we force the port to stay? I installed using pip install graphrag... I wish I knew what file to hack to keep the port intact.

@dx111ge
Copy link

dx111ge commented Jul 3, 2024

This is anoying... I just tried switching to ollama because... my 1st attempt at running the solution against chat-gpt costed me 45$ and did not work at the end... so I don't want to waste money testing things like that. I would rather take it slow and steady locally until I get the hang of it and switch to a paid model if needed...

How can we force the port to stay? I installed using pip install graphrag... I wish I knew what file to hack to keep the port intact.

OLLAMA_HOST=127.0.0.1:11435 ollama serve ... now we just need to know which port graphrag is looking for

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

Good news. I got it started. Key was to use the right config to set the concurrent request to 1:

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

@SeppeVanSteenbergen
Copy link

I also managed to get the entity extraction working with Ollama. However, the embeddings seem to be more tricky due to no available OpenAI compatible API for embeddings from Ollama. Anyone found a workaround for this already?

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

I also managed to get the entity extraction working with Ollama. However, the embeddings seem to be more tricky due to no available OpenAI compatible API for embeddings from Ollama. Anyone found a workaround for this already?

Is it the cause of this error afterprocessing the entities:

🚀 Reading settings from settings.yaml
H:\llm_stuff\graphrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be    
removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                  id                                              chunk  ...                        document_ids n_tokens   
0   c6b76a5684badf7d2437c09ab8b5b099  DE TAI L E D DE S I G N SP E CI F I CAT I O N ...  ...        300
1   86c6c7ef7455630118790c9325ccac7d   Unclassified Status: DRFT Subject: SSC NAMING...  ...        300
2   55644895acb440fe7ce68b445aca9340  c – first draft SSC Cloud R&D\nv0.6 2019-09-16...  ...        300
3   437b3490966fcf3d1d8f0f89e3b0209a  .2 2019-12-14 TBS Feedback Updated TBS Governa...  ...        300
4   a60f3dd9757d19a311161ec3ff5d5cd3  12-29 SSC Cloud teams\nReplace field value tab...  ...        300
..                               ...                                                ...  ...                                 ...      ...   
86  11feb9a0b5f521cf93e7a9a06925e3ad   dependencies for maintenance (i.e. windows, p...  ...        300
87  f73a3685cdb77d0345371e90ace81ef3   uses Enterprise Control Desk (ECD) resolver g...  ...        300
88  850737c0488f6fdad4780cb0f4e7e98e  s Canadian Nuclear Safety Commission 12 CSA Sa...  ...        300
89  f1f210b06012245d9a9ec4d6672f1536  : DRFT Subject: SSC NAMING AND TAGGING STANDAR...  ...        212
90  4c9136411292f396d8545750d87c4ed2   Health Canada (Department of)\nTable 15: Depa...  ...         12

[91 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_base_entity_graph
   level                                    clustered_graph
0      0  <graphml xmlns="http://graphml.graphdrawing.or...
1      1  <graphml xmlns="http://graphml.graphdrawing.or...
H:\llm_stuff\graphrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be    
removed in a future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
❌ create_final_entities
None
⠦ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

I configured mine as:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
    ```

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

The crash log state:

08:57:11,537 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 404 page not found
Traceback (most recent call last):
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb
    result = await result
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\index\verbs\text\embed\text_embed.py", line 105, in text_embed
    return await _text_embed_in_memory(
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\index\verbs\text\embed\text_embed.py", line 130, in _text_embed_in_memory
    result = await strategy_exec(texts, callbacks, cache, strategy_args)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\index\verbs\text\embed\strategies\openai.py", line 61, in run
    embeddings = await _execute(llm, text_batches, ticker, semaphore)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\index\verbs\text\embed\strategies\openai.py", line 105, in _execute
    results = await asyncio.gather(*futures)
  File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\asyncio\tasks.py", line 304, in __wakeup
    future.result()
  File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\asyncio\tasks.py", line 232, in __step
    result = coro.send(None)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\index\verbs\text\embed\strategies\openai.py", line 99, in embed
    chunk_embeddings = await llm(chunk)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\caching_llm.py", line 104, in __call__
    result = await self._delegate(input, **kwargs)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
    result, start = await execute_with_retry()
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
    async for attempt in retryer:
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
    do = await self.iter(retry_state=self._retry_state)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
    result = await action(retry_state)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\tenacity\_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "C:\Users\berna\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
    return await do_attempt(), start
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
    return await self._delegate(input, **kwargs)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\base_llm.py", line 49, in __call__
    return await self._invoke(input, **kwargs)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\base\base_llm.py", line 53, in _invoke
    output = await self._execute_llm(input, **kwargs)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\graphrag\llm\openai\openai_embeddings_llm.py", line 36, in _execute_llm
    embedding = await self.client.embeddings.create(
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\openai\resources\embeddings.py", line 215, in create
    return await self._post(
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\openai\_base_client.py", line 1816, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\openai\_base_client.py", line 1514, in request
    return await self._request(
  File "H:\llm_stuff\graphrag\venv\lib\site-packages\openai\_base_client.py", line 1610, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: 404 page not found

and ollama log show:

[GIN] 2024/07/03 - 08:57:11 | 404 | 0s | 127.0.0.1 | POST "/v1/embeddings"

@dx111ge
Copy link

dx111ge commented Jul 3, 2024

I configured mine as:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
    ```

use api instead v1 👍
14:55:29,949 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 9 inputs via 9 snippets using 1 batches. max_batch_size=16, max_tokens=8191
14:55:31,373 httpx INFO HTTP Request: POST http://127.0.0.1:11434/api/embeddings "HTTP/1.1 200 OK"
14:55:31,375 graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={'input': ['"THE TEAM":"The team is portrayed as a group of individuals who have transitioned from passive observers to active participants in a mission, showing a dynamic change in their role."', '"WASHINGTON":', '"OPERATION: DULCE":', '"ALEX":"Alex is the leader of a team attempting first contact with an unknown intelligence, acknowledging the significance of their task."', '"CONTROL":"Control refers to the ability to manage or govern, which is challenged by an intelligence that writes its own rules."', '"INTELLIGENCE":"Intelligence here refers to an unknown entity capable of writing its own rules and learning to communicate."', '"FIRST CONTACT":"First Contact is the potential initial communication between humanity and an unknown intelligence."', '"SAM RIVERA":', '"HUMANITY' RESPONSE":']}
14:55:31,375 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable

at least i get a ok for the embedding, but format seems wrong

@SeppeVanSteenbergen
Copy link

Yes, there is no embeddings endpoint under the v1 of the OpenAI compatible server within Ollama. They are actively working on this: ollama/ollama#5285

@SeppeVanSteenbergen
Copy link

Indeed, so I also tried the normal api endpoint as @dx111ge and having the same problem with the embedding output

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

I also figured the v1 <-> api and I am now stuck with the same final error...

@dx111ge
Copy link

dx111ge commented Jul 3, 2024

did try all 3 different ollama embedding models , same error
mxbai-embed-large nomic-embed-text and all-minilm

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

The weird thing... I reverted the embedings to be openai... but it try to connect to ollama instead... like it is getting the api_base from the llm config for the entities... I wonder what the right api_base might be for openai embeds... maybe we need to set it if we use a custom one for llm?

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

OK, I have been able to specify the openai embeddings API (https://api.openai.com/v1) and moved pas that point... but now... it is failing at the

└── create_final_community_reports
    └── Verb create_community_reports ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━  55% 0:00:29 0:00:46

Running out of memory on my 3090... I tried reducing the max_input_length to no avail:

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 4000

@dx111ge
Copy link

dx111ge commented Jul 3, 2024

OK, I have been able to specify the openai embeddings API (https://api.openai.com/v1) and moved pas that point... but now... it is failing at the

└── create_final_community_reports
    └── Verb create_community_reports ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━  55% 0:00:29 0:00:46

Running out of memory on my 3090... I tried reducing the max_input_length to no avail:

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 4000

can you please explain what you did ? to fix the embeddings stuff?

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.

Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

I am sure the config could be optimised... but this is working at the moment... now I need to test the query part ;-)

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

Well... look like I can't query the results. Keep getting VRAM errors on my 3090... so all this to not be able to query ;-(

@antiblahblah
Copy link

antiblahblah commented Jul 3, 2024

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

OpenAI's embeddings are quite expensive too...

@bmaltais
Copy link

bmaltais commented Jul 3, 2024

I figured the issue with the query... somehow the youtube video I was following was using the "wrong" syntax?

Did not work: python -m graphrag.query --root . --method global "What are the highlights of the naming convention"

Worked: python -m graphrag.query --data .\output\20240703-084750\artifacts\ --method global "What are the highlights of the naming convention"

@KarimJedda
Copy link

@bmaltais thanks a lot. That works.

I think vllm has embeddings now, I will try that tonight for a fully local setup 👍

@jgbradley1 jgbradley1 changed the title graphrag supporting other api bases! graphrag supporting other api bases Jul 3, 2024
@bmaltais
Copy link

bmaltais commented Jul 3, 2024

Quick update... Some of the issues I was having was related to the fact that my 1st attempt at running hraphrag was leveraginf chatgpt-4o. It ended-up creating a lot of files in the cache folder that then got mixed with llama3 generated files. Overall this caused significant issues.

After deleting the cache folder and re-indexing everything I was able to properly query the graph with:

python -m graphrag.query --method local --root . "What does Identity Lifecycle concist of?"

I still have not found an easy solution to generating embeddings locally.

@XinyuShe
Copy link

XinyuShe commented Jul 5, 2024

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.

Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

I use your setting and the default text, and do not change any other thing, but still

❌ create_final_entities
None
⠹ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

@AntoninLeroy
Copy link

Any sucess using vllm inference endpoint for local LLMs ?

@vamshi-rvk
Copy link

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.
Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

I use your setting and the default text, and do not change any other thing, but still

❌ create_final_entities
None
⠹ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

you will need to serve ollama first
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3
ollama serve

@dengkeshun
Copy link

Does anyone face that Global query works but local query doesn't work?

@Ravikumaryadav22
Copy link

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.
Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

I use your setting and the default text, and do not change any other thing, but still

❌ create_final_entities
None
⠹ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

you will need to serve ollama first curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3 ollama serve

i followed your step still getting the same error its showing error invoking llm

@yurochang
Copy link

yurochang commented Jul 19, 2024

"Error Invoking LLM"-- fixed by using LM studio in embedding part.

successfully build the graph, BUT can global search , CAN NOT local search:

ZeroDivisionError: Weights sum to zero, can't be normalized

@VamshikrishnaAluwala
Copy link

[4 rows x 10 columns]
🚀 join_text_units_to_relationship_ids
id relationship_ids
0 63575d4c37be57321538f1938c2fece6 [dde131ab575d44dbb55289a6972be18f, de9e343f2e3...
❌ create_final_community_reports
None
⠏ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports
❌ Errors occurred during the pipeline run, see logs for more details.
Error during GraphRAG setup: Command 'python -m graphrag.index --root ./ragtest' returned non-zero exit status 1.

@xxll88
Copy link

xxll88 commented Jul 22, 2024

❌ create_final_community_reports创建最终社区报告 None  没有一 ⠙ GraphRAG Indexer  GraphRAG索引器 ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) 100% ──正在加载输入(InputFileType.text)-已加载1个文件(0个已过滤)100% ├── create_base_text_units 创建基本文本单位 ├── create_base_extracted_entities 创建基本提取实体 ├── create_summarized_entities 创建汇总实体 ├── create_base_entity_graph 创建基本实体图 ├── create_final_entities 创建最终实体 ├── create_final_nodes  创建最终节点 ├── create_final_communities 创建最终社区 ├── join_text_units_to_entity_ids 连接文本单元到实体id ├── create_final_relationships 创建最终关系 ├── join_text_units_to_relationship_ids 连接文本单元到关系ID └── create_final_community_reports 创建最终社区报告 ❌ Errors occurred during the pipeline run, see logs for more details. 在管道运行期间发生错误,请参阅日志以了解更多详细信息。

C:\Users\shrnema\Downloads\graphrag-main\GRAPHRAG>C:\Users\shrnema\Downloads\graphrag-main\GRAPHRAG>

File "C:\Users\shrnema\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc文件“C:\Users\shrnema\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\range.py”,第417行,在get_loc中 raise KeyError(key)  raise KeyError(key) KeyError: 'community'  KeyError:'community' 11:17:11,578 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None 11:17:11,578 graphrag.index.reporting.file_workflow_callbacks INFO运行管道时出错!详细信息=无

Please help me to fix this请帮我把这个修好

same issue

@natoverse
Copy link
Collaborator

Consolidating Ollama-related issues: #657

@natoverse natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024
@natoverse natoverse unpinned this issue Jul 22, 2024
@yunchonggeng
Copy link

这是我的最终配置。VSCode 崩溃后,当我再次启动它时,摘要报告不知何故开始工作。
这是我迄今为止工作的最终完整配置:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

本质上我通过 ollama 在本地使用 llama3 作为实体,并使用 openai 嵌入(便宜得多),直到我们找到使用 ollama 的解决方案。

我使用你的设置和默认文本,并且不更改任何其他内容,但仍然

❌ create_final_entities
None
⠹ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.

您需要先启动 ollama curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3 ollama serve

我按照你的步骤操作,仍然得到同样的错误,显示调用 llm 的错误

remove cache

@minxiansheng
Copy link

我将其配置为:

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://localhost:11434/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
    ```

改用 api v1 👍 14:55:29,949 graphrag.index.verbs.text.embed.strategies.openai INFO 使用 1 个批次通过 9 个片段嵌入 9 个输入。 max_batch_size=16,max_tokens=8191 14:55:31,373 httpx INFO HTTP 请求:POST http://127.0.0.1:11434/api/embeddings “HTTP/1.1 200 OK” 14:55:31,375 graphrag.index.reporting.file_workflow_callbacks INFO 调用 LLM 时出错,详细信息 = {'input':['“团队”:“团队被描绘成一群从被动观察者转变为任务中积极参与者的个人,展示了他们角色的动态变化。”','“华盛顿”:','“行动:杜尔塞”:','“亚历克斯”:“亚历克斯是试图与未知情报进行首次接触的团队的领导者,承认他们任务的重要性。”','“控制”:“控制是指管理或治理的能力,这是受到编写自己规则的情报的挑战。”', '“INTELLIGENCE”:“这里的情报是指能够编写自己的规则并学习交流的未知实体。”', '“FIRST CONTACT”:“第一次接触”:“第一次接触是人类与未知情报之间潜在的初次交流。”', '“SAM RIVERA”:', '“人类”的回应:']} 14:55:31,375 datashaper.workflow.workflow 错误在 create_final_entities 中执行动词“text_embed”时出错:“NoneType”对象不可迭代

至少我嵌入成功了,但格式似乎不对

How to solve this text_embed problem. I have the same problem. And then,the whole log problem is as follows

datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [4, 513]. Tensor sizes: [1, 512])', 'code': 50001}

@yurochang
Copy link

是否有人遇到过全局查询有效但本地查询无效的情况?

did you solve it?

@zhangyanli-pro
Copy link

zhangyanli-pro commented Jul 24, 2024

Here is my final config. Somehow after VSCode crashed the summary reports started working when I started it again.

Here is my final full config that work so far:


encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 2000
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  max_retries: 1
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: text-embedding-3-small
    api_base: https://api.openai.com/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    max_retries: 1
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    batch_size: 1 # the number of documents to send in a single request
    batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 7000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Essentially I use llama3 localy via ollama for the entities and use openai embeddings (much cheaper) until we have a solution to use ollama.

@bmaltais hi,I don't understand what value should be set for your api-key in your example. Can you tell me more about it? Thanks.

@rushizirpe
Copy link

If you want to use open-source models, I've put together a repository for deploying models from HuggingFace to local endpoints, having similar endpoints with compatible format as OpenAI API. Here’s the link to the repo: https://github.com/rushizirpe/open-llm-server

Also, I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

@vivisol
Copy link

vivisol commented Jul 29, 2024

I use ollama as local LLM API provider , and set chat model and embedding model api_base the same both with :http://loclahost:11434/v1.
global search works ok, but local search failed with this message:

 (graphRAG) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest09 --method local "谁是叶文洁"


INFO: Reading settings from newTest09\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\__main__.py", line 75, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

I doubted maybe the embedding process doesn't work correct, beacause it mentioned code:400 error with OpenAIEmbedding api. But the indexing process seems working fine:

⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.

As my understanding, indexing process also need do embedding , how come it doesn't work when local search ?
Does anyone have the same issues with GraphRAG ?

@galen1980guo
Copy link

I had the same problem and I was wondering if anyone had completely solved it?

$ ollama -v ollama version is 0.1.34

and my settings.yaml
`encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama # ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: llama3:70b
model_supports_json: false # recommended if this is available for your model.
api_base: http://10.110.0.25:11434/v1
...

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: ollama
type: openai_embedding # or azure_openai_embedding
model: nomic-embed-text:latest
api_base: http://10.110.0.25:11434/api
...`

then run the command:
`poetry run poe index --root .

...

❌ create_final_entities
None
⠦ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
└── create_final_entities
❌ Errors occurred during the pipeline run, see logs for more details.`

checked the error log:
httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable

@rushizirpe
Copy link

I had the same problem and I was wondering if anyone had completely solved it?

$ ollama -v ollama version is 0.1.34

and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ...

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...`

then run the command: `poetry run poe index --root .

...

❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.`

checked the error log: httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable

#339 (comment)

@rushizirpe
Copy link

ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers.

This is working: https://github.com/rushizirpe/open-llm-server
I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

@galen1980guo
Copy link

I had the same problem and I was wondering if anyone had completely solved it?
$ ollama -v ollama version is 0.1.34
and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ...
embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...then run the command:poetry run poe index --root .
...
❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.checked the error log:httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable`

#339 (comment)

I have read the comment above, where the embedding uses openai, but I hope it will be local. : -)

@rushizirpe
Copy link

rushizirpe commented Aug 1, 2024

I had the same problem and I was wondering if anyone had completely solved it?
$ ollama -v ollama version is 0.1.34
and my settings.yaml `encoding_model: cl100k_base skip_workflows: [] llm: api_key: ollama # ${GRAPHRAG_API_KEY} type: openai_chat # or azure_openai_chat model: llama3:70b model_supports_json: false # recommended if this is available for your model. api_base: http://10.110.0.25:11434/v1 ...
embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio llm: api_key: ollama type: openai_embedding # or azure_openai_embedding model: nomic-embed-text:latest api_base: http://10.110.0.25:11434/api ...then run the command:poetry run poe index --root .
...
❌ create_final_entities None ⠦ GraphRAG Indexer ├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00 ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph └── create_final_entities ❌ Errors occurred during the pipeline run, see logs for more details.checked the error log:httpx INFO HTTP Request: POST http://10.110.0.25:11434/api/embeddings "HTTP/1.1 200 OK" graphrag.index.reporting.file_workflow_callbacks INFO Error Invoking LLM details={....} 17:19:55,244 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: 'NoneType' object is not iterable`

#339 (comment)

I have read the comment above, where the embedding uses openai, but I hope it will be local. : -)

if you take a look at notebook you'll find it uses nomic-ai/nomic-embed-text-v1.5 as mentioned in .yaml config (You can use any valid model which can be loaded from huggingface). You need GROQ API key only in terms of chat completion aif you don't have higher end GPU, and if you have it then you just need to replace API endpoint as http://localhost:1234/v1 and model name (From HF) you want to use.

@galen1980guo
Copy link

ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers.

This is working: https://github.com/rushizirpe/open-llm-server I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

oh...are you actually replacing ollama with open llm server in this notebook?

@rushizirpe
Copy link

ZeroDivisionError: Weights sum to zero, can't be normalized

The locally running embedding model in OLLAMA returns the weights in an incorrect format. OpenAI internally uses base64 encoded floats, whereas most other models return floats as numbers.
This is working: https://github.com/rushizirpe/open-llm-server I have created a Colab notebook (working for global as well as local search) for Graphrag: https://colab.research.google.com/drive/1uhFDnih1WKrSRQHisU-L6xw6coapgR51?usp=sharing

oh...are you actually replacing ollama with open llm server in this notebook?

Yes, hope it helps!

@ZhengRui
Copy link

ZhengRui commented Aug 20, 2024

I use ollama as local LLM API provider , and set chat model and embedding model api_base the same both with :http://loclahost:11434/v1. global search works ok, but local search failed with this message:

 (graphRAG) D:\Learn\GraphRAG>python -m graphrag.query --root ./newTest09 --method local "谁是叶文洁"


INFO: Reading settings from newTest09\settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_embedding", 'model': 'nomic-embed-text', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
Error embedding chunk {'OpenAIEmbedding': "Error code: 400 - {'error': {'message': 'invalid input type', 'type': 'api_error', 'param': None, 'code': None}}"}
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\__main__.py", line 75, in <module>
    run_local_search(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\cli.py", line 154, in run_local_search
    result = search_engine.search(query=query)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\search.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\structured_search\local_search\mixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 55, in map_query_to_entities
    search_results = text_embedding_vectorstore.similarity_search_by_text(
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\vector_stores\lancedb.py", line 118, in similarity_search_by_text
    query_embedding = text_embedder(text)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\context_builder\entity_extraction.py", line 57, in <lambda>
    text_embedder=lambda t: text_embedder.embed(t),
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\graphrag\query\llm\oai\embedding.py", line 96, in embed
    chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
  File "D:\ProgramData\miniconda3\envs\graphRAG\lib\site-packages\numpy\lib\function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

I doubted maybe the embedding process doesn't work correct, beacause it mentioned code:400 error with OpenAIEmbedding api. But the indexing process seems working fine:

⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.

As my understanding, indexing process also need do embedding , how come it doesn't work when local search ? Does anyone have the same issues with GraphRAG ?

I am experiencing the same issue here, from my understanding Ollama now has oai compatible api endpoint v1/embeddings for embeddings, so indexing step works fine when using http://localhost:11434/v1 as api_base for both llm and embeddings. Global query also works. Local query may have its separate embedding logics that caused the issue, it generates lancedb files, digging into it now.

Note:

concurrent_requests, chunks.size and the specific ollama served model's num_ctx have to be set properly to pass the indexing step. In my case:

In the beginning I used llama3.1 model, its modelfile (you can check using ollama show --modelfile llama3.1 or ollama show --parameters llama3.1) does not specify num_ctx, so by default when Ollama will serve the llama3.1 model with num_ctx = 2048, in Ollama's console log where it loads the model you will find something like: image
n_ctx in the image is 4 x num_ctx. Graphrag default chunks size is 1200, plus default prompts (inside promtps folder after initializing the project) are lengthy, the final prompt sent to Ollama can easily surpass num_ctx (you will find input truncated in Ollama console logs for /v1/chat/completions api calls). The result of this is llm will not follow the specified output format in returned response, you can check the generated files inside cache folder to see if it follows the output format as specified in prompts folder. In my case it does output any entity and caused this issue: #443 (comment)

The way to solve this is:

  • Either make chunks.size smaller, e.g 300. (However I still noticed input truncated from Ollama at the final step of indexing create_community_report, because prompts/community_report.txt is too long).

  • Or (I would prefer this way) increase the num_ctx from Ollama side. However, we can only set/change num_ctx

    • either through options in Ollama api endpoint /api/chat (not oai endpoint /v1/chat/completions)
    curl http://localhost:11434/api/chat -d '{
      "model": "llama3.1",
      "messages": [
        {
          "role": "user",
          "content": "Tell me a random story:"
        }
      ], "stream": false, "options": {"num_ctx": 4096}
    }' | jq .

I specified 4096 as num_ctx in the new modelfile (instead of 8192, now I can see n_ctx is 16384 in ollama console log when loading the model) and kept chunks.size as 1200.

As for concurrent_requests setting, the default value 25 will cause request timed out error during indexing, ollama console log looks fine but you can find request timed out errors in indexing-engine.log and logs.json file. how Ollama handle concurrent requests. I set llm.concurrent_requests: 1 and embeddings.llm.concurrent_requests: 4.

With these settings, indexing and global query work smoothly. Local query still has issue.

@ZhengRui
Copy link

ZhengRui commented Aug 20, 2024

local query issue is solved by #451 (comment) and https://github.com/microsoft/graphrag/pull/568/files , but the query result seems not so good.

Update:

  1. turns out local search result not good is because 4k context window is too small for local search prompt (it easily reaches 40k chars), after I change num_ctx from 4096 to 10240, local search result starts looking relevant and give references.
  2. using smaller chunks.size will make prompts longer, in my case global search prompt surpasses num_ctx and llm response does not return json structured response as required in the prompt. in this case increase chunks.size or num_ctx.

@st-rope
Copy link

st-rope commented Aug 22, 2024

regarding generating embeddings with ollama.
I had this issue already in another project. Calling function embeddings() of ollama -> _client.py -> Client returns:

      'POST',
      '/api/embeddings',
      json={
        'model': model,
        'prompt': prompt,
        'options': options or {},
        'keep_alive': keep_alive,
      },
    ).json()

At this point '/api/embeddings/' is not replaced with the specified api_base.
Replacing it manually with ollama's 'http://localhost:11434/api/embeddings/ at least makes it possible to generate embeddings with ollama

@ArwaALyahyai
Copy link

I also managed to get the entity extraction working with Ollama. However, the embeddings seem to be more tricky due to no available OpenAI compatible API for embeddings from Ollama. Anyone found a workaround for this already?

did you find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members
Projects
None yet
Development

No branches or pull requests