Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use Ollama or any other local LLM for indexing instead of openai ? #432

Closed
hemangjoshi37a opened this issue Jul 8, 2024 · 7 comments

Comments

@hemangjoshi37a
Copy link

Is it possible to use Ollama or any other local LLM for indexing instead of openai ?

@adieyal
Copy link

adieyal commented Jul 8, 2024

There are a few threads discussing this: https://github.com/microsoft/graphrag/issues?q=ollama

@beginor
Copy link

beginor commented Jul 9, 2024

Yes, I think you can use any open ai complaint API with GraphRAG!

I use llama-server from llama.cpp, create two llama-server instance, one for completion and one for embedding.

  1. listen to 8080 for completion:
llama.cpp/llama-server --host 0.0.0.0 --port 8080 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --model ./models/qwen2-7b-instruct-fp16.gguf
  1. listen to 8081 for embeddings:
llama.cpp/llama-server --host 0.0.0.0 --port 8081 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --embeddings \
  --pooling mean \
  --batch-size 8192 \
  --ubatch-size 4096 \
  --model ./models/qwen2-7b-instruct-fp16.gguf
  1. change llm and embeddings in settings.yaml
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: qwen2-7b-instruct
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 512
  # request_timeout: 180.0
  api_base: http://localhost:8080
  api_version: v1

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: qwen2-7b-instruct
    api_base: http://localhost:8081
    api_version: v1

@Kingatlas115
Copy link

i got global search working but not local

@AlonsoGuevara
Copy link
Contributor

Hi! We are centralizing other LLM discussions in these threads:
Other LLM/Api bases: #339
Ollama: #345
Local embeddings: #370

I'll resolve this issue so we can keep the focus on those threads

@hemangjoshi37a
Copy link
Author

Actually my end application is long context multi-file coding development using this repo. here we want to index every edit that is made by developer and then when next AI editing inference is ran it should take input from graphrag and use it as context and then make code edits. Please anyone has done anything similar please let me know .

@wanglufei1
Copy link

是的,我认为您可以将任何开放的 AI 投诉 API 与 GraphRAG 一起使用!

llama-serverllama.cpp使用,创建两个llama-server实例,一个用于完成,一个用于嵌入。

  1. 收听 8080 以了解完成情况:
llama.cpp/llama-server --host 0.0.0.0 --port 8080 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --model ./models/qwen2-7b-instruct-fp16.gguf
  1. 收听 8081 以了解嵌入:
llama.cpp/llama-server --host 0.0.0.0 --port 8081 \
  --threads 8 \
  --parallel 1 \
  --gpu-layers 999 \
  --ctx-size 0 \
  --n-predict -1 \
  --defrag-thold 1 \
  --embeddings \
  --pooling mean \
  --batch-size 8192 \
  --ubatch-size 4096 \
  --model ./models/qwen2-7b-instruct-fp16.gguf
  1. 更改settings.yaml 中的llm内容embeddings
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: qwen2-7b-instruct
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 512
  # request_timeout: 180.0
  api_base: http://localhost:8080
  api_version: v1

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: qwen2-7b-instruct
    api_base: http://localhost:8081
    api_version: v1

That's a good plan.I ran through the offline model with this solution and solved my problem.

@mostafaghadimi
Copy link

Yes, you can. For running llama3.1 as an example try running the following command:

ollama run llama3.1

And then change your LLM configuration in settings.yaml file to the following:

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: llama3.1
  model_supports_json: true # recommended if this is available for your model.
  api_base: http://127.0.0.1:11434/v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants