Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

hemangjoshi37a · 2024-07-08T10:29:25Z

Is it possible to use Ollama or any other local LLM for indexing instead of openai ?

adieyal · 2024-07-08T10:53:36Z

There are a few threads discussing this: https://github.com/microsoft/graphrag/issues?q=ollama

beginor · 2024-07-09T06:37:08Z

Yes, I think you can use any open ai complaint API with GraphRAG!

I use llama-server from llama.cpp, create two llama-server instance, one for completion and one for embedding.

listen to 8080 for completion:

llama.cpp/llama-server --host 0.0.0.0 --port 8080 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --model ./models/qwen2-7b-instruct-fp16.gguf

listen to 8081 for embeddings:

llama.cpp/llama-server --host 0.0.0.0 --port 8081 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --embeddings \
  --pooling mean \
  --batch-size 8192 \
  --ubatch-size 4096 \
  --model ./models/qwen2-7b-instruct-fp16.gguf

change llm and embeddings in settings.yaml

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: qwen2-7b-instruct
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 512
  # request_timeout: 180.0
  api_base: http://localhost:8080
  api_version: v1

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: qwen2-7b-instruct
    api_base: http://localhost:8081
    api_version: v1

Kingatlas115 · 2024-07-09T16:26:31Z

i got global search working but not local

AlonsoGuevara · 2024-07-09T22:03:12Z

Hi! We are centralizing other LLM discussions in these threads:
Other LLM/Api bases: #339
Ollama: #345
Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

hemangjoshi37a · 2024-07-12T09:16:15Z

Actually my end application is long context multi-file coding development using this repo. here we want to index every edit that is made by developer and then when next AI editing inference is ran it should take input from graphrag and use it as context and then make code edits. Please anyone has done anything similar please let me know .

wanglufei1 · 2024-07-17T07:54:09Z

是的，我认为您可以将任何开放的 AI 投诉 API 与 GraphRAG 一起使用！

我llama-server从llama.cpp使用，创建两个llama-server实例，一个用于完成，一个用于嵌入。

收听 8080 以了解完成情况：

llama.cpp/llama-server --host 0.0.0.0 --port 8080 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --model ./models/qwen2-7b-instruct-fp16.gguf

收听 8081 以了解嵌入：

llama.cpp/llama-server --host 0.0.0.0 --port 8081 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --embeddings \
  --pooling mean \
  --batch-size 8192 \
  --ubatch-size 4096 \
  --model ./models/qwen2-7b-instruct-fp16.gguf

更改settings.yaml 中的llm内容embeddings

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: qwen2-7b-instruct
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 512
  # request_timeout: 180.0
  api_base: http://localhost:8080
  api_version: v1

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: qwen2-7b-instruct
    api_base: http://localhost:8081
    api_version: v1

That's a good plan.I ran through the offline model with this solution and solved my problem.

mostafaghadimi · 2024-09-08T09:45:54Z

Yes, you can. For running llama3.1 as an example try running the following command:

ollama run llama3.1

And then change your LLM configuration in settings.yaml file to the following:

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3.1
  model_supports_json: true # recommended if this is available for your model.
  api_base: http://127.0.0.1:11434/v1

AlonsoGuevara closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

hemangjoshi37a commented Jul 8, 2024

adieyal commented Jul 8, 2024

beginor commented Jul 9, 2024 •

edited

Loading

Kingatlas115 commented Jul 9, 2024

AlonsoGuevara commented Jul 9, 2024

hemangjoshi37a commented Jul 12, 2024

wanglufei1 commented Jul 17, 2024

mostafaghadimi commented Sep 8, 2024

Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

Comments

hemangjoshi37a commented Jul 8, 2024

adieyal commented Jul 8, 2024

beginor commented Jul 9, 2024 • edited Loading

Kingatlas115 commented Jul 9, 2024

AlonsoGuevara commented Jul 9, 2024

hemangjoshi37a commented Jul 12, 2024

wanglufei1 commented Jul 17, 2024

mostafaghadimi commented Sep 8, 2024

beginor commented Jul 9, 2024 •

edited

Loading