Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users/amrsing/changes for browser auth fallback #1201

Draft
wants to merge 135 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
91370aa
Index, Query by context + SA integration + Placeholder + Context Swit…
Aug 19, 2024
649edcd
Minor changes + Context Swtich flow not working
dabbcomputers Aug 20, 2024
4875079
Writing edges and vertices into graphdb
Aug 20, 2024
a0f9a54
Fixed context switching issue
dabbcomputers Aug 20, 2024
5f80f66
addressed comment
dabbcomputers Aug 20, 2024
b231058
Merge pull request #2 from prateejain-linked/users/amrsing/POCChanges2
amritpalms Aug 21, 2024
d38a026
Adding azure-kusto-data as a dependency
logomachic Aug 7, 2024
a4efadb
Initial add of Kusto related file changes.
logomachic Aug 7, 2024
da4ca97
Drop & Remove db query split due to syntax error
logomachic Aug 7, 2024
c408799
Adding kusto documentation
logomachic Aug 7, 2024
303e9c4
I cleaned up the code I was working on and added TODOs to blocks of f…
logomachic Aug 9, 2024
027988a
Merge pull request #3 from prateejain-linked/KUSTO-1
sirus-ms Aug 21, 2024
16295e5
Add Reading from graphdb
Aug 21, 2024
01fb617
Some modifications to kusto flow.
sirus-ms Aug 21, 2024
c63718f
Merge pull request #4 from prateejain-linked/sirusbr
sirus-ms Aug 21, 2024
f78bd40
Add a skleton for launch.json
sirus-ms Aug 21, 2024
3fd7ada
Merge pull request #5 from prateejain-linked/sirusbr
sirus-ms Aug 21, 2024
be00a93
Fix a typo
sirus-ms Aug 21, 2024
73306be
Merge pull request #6 from prateejain-linked/sirusbr
sirus-ms Aug 21, 2024
0918186
Creating entities table in Kusto.
logomachic Aug 21, 2024
f40f101
Merge pull request #7 from prateejain-linked/KUSTO-2
sirus-ms Aug 21, 2024
a6b0689
Minor fixes
dabbcomputers Aug 21, 2024
6837a01
Merge pull request #8 from prateejain-linked/users/amrsing/MinorFixes…
sirus-ms Aug 21, 2024
116f524
minor updates: lancedb style entities
sirus-ms Aug 22, 2024
8dc4181
Merge pull request #9 from prateejain-linked/sirusbr
sirus-ms Aug 22, 2024
83798a4
syntax
sirus-ms Aug 22, 2024
2f40f69
Merge pull request #10 from prateejain-linked/sirusbr
sirus-ms Aug 22, 2024
02b3a95
Writing edges and vertices into graphdb
Aug 20, 2024
d79fa7b
Add Reading from graphdb
Aug 21, 2024
3d2be1c
Integrating with latest PR
Aug 21, 2024
16c4759
Correct file path in text files
Aug 22, 2024
d6e2645
Resolve merge conflicts
Aug 22, 2024
2c9edd7
Resolve final merge conflict
Aug 22, 2024
3fc8f8d
Merge pull request #11 from prateejain-linked/users/gbarrnsnchez/grap…
sirus-ms Aug 22, 2024
b9fbb6a
merge final_entities and final_nodes
sirus-ms Aug 22, 2024
95f8a20
Merge pull request #12 from prateejain-linked/sirusbr
sirus-ms Aug 22, 2024
621ea11
Add config parameters for graphdb
Aug 22, 2024
cae28b3
Add default values in graphrag/index/init_content.py
Aug 23, 2024
a151172
Merge pull request #13 from prateejain-linked/users/gbarrnsnchez/add_…
gbarroutlook Aug 23, 2024
d5a8e7c
Moving Pipeline Storage to Common + Export query artifacts
dabbcomputers Aug 23, 2024
8e20637
Add missing config file
Aug 23, 2024
b4d3817
Merge pull request #15 from prateejain-linked/users/gbarrnsnchez/add_…
gbarroutlook Aug 23, 2024
1f95114
Merge branch 'main' of https://github.com/prateejain-linked/graphrag …
dabbcomputers Aug 23, 2024
c46d71c
Small fixes to incline with GraphDBClient
dabbcomputers Aug 23, 2024
b9dff03
saving changes
dabbcomputers Aug 24, 2024
1088da6
Fixing minor issue in main branch
dabbcomputers Aug 24, 2024
50fa5a9
Merge pull request #16 from prateejain-linked/users/amrsing/BringingM…
amritpalms Aug 24, 2024
079c034
Only using Kusto store for all entities.
logomachic Aug 26, 2024
5d9f27a
Merge pull request #17 from prateejain-linked/KUSTO-3
sirus-ms Aug 26, 2024
f6a8f63
Merge branch 'main' of https://github.com/prateejain-linked/graphrag …
dabbcomputers Aug 26, 2024
c2f14b6
Merge branch 'users/amrsing/ExportQueryOutputArtifacts' of https://gi…
dabbcomputers Aug 26, 2024
28594c6
Include context into read and write calls for graphdb
Aug 26, 2024
aefa924
Using create_final_entities table rather than entity_description_embe…
logomachic Aug 26, 2024
8a1d17c
Optimized Search
dabbcomputers Aug 27, 2024
3e484e0
Merge pull request #18 from prateejain-linked/KUSTO-4
sirus-ms Aug 27, 2024
1c21376
Include context into read and write calls for graphdb
Aug 26, 2024
ca0dcc9
Changing similarity search (query to entity embedding search) to use …
logomachic Aug 27, 2024
a87071d
Working changes
dabbcomputers Aug 27, 2024
e51f325
Merge pull request #14 from prateejain-linked/users/amrsing/ExportQue…
amritpalms Aug 27, 2024
c0d628c
Merge pull request #19 from prateejain-linked/COSINE-VECTOR16
sirus-ms Aug 28, 2024
8bc2c71
Kusto minor edits
sirus-ms Aug 28, 2024
13eb167
Merge pull request #20 from prateejain-linked/sirusbr
logomachic Aug 28, 2024
1927e99
Merging Kusto local search into local search
logomachic Aug 28, 2024
38449c3
Merge pull request #21 from prateejain-linked/MERGE-KUSTO
sirus-ms Aug 28, 2024
0039b0d
Fixing lancedb from Sirus suggestion
logomachic Aug 28, 2024
cadd7e3
Merge pull request #23 from prateejain-linked/MERGE-KUSTO
sirus-ms Aug 28, 2024
16a5766
Add functionality for context graph creation
Aug 29, 2024
fc79c32
Solve merge conflicts
Aug 29, 2024
b1d4f22
Kusto context-switch
sirus-ms Aug 30, 2024
05e721c
Include context into read and write calls for graphdb
Aug 26, 2024
689b983
Add functionality for context graph creation
Aug 29, 2024
628a4c2
Address comments
Aug 30, 2024
b7749b1
Fix conflicts
Aug 30, 2024
0e5a4b9
Merge pull request #24 from prateejain-linked/users/gbarrnsnchez/add_…
gbarroutlook Aug 30, 2024
fd76db5
Adding community reports to Kusto
logomachic Aug 30, 2024
aa6095b
Adding configurations use_kusto_community_reports, updating - to _ co…
logomachic Aug 30, 2024
98ff552
Merge pull request #26 from prateejain-linked/COMMUNITY_REPORTS
sirus-ms Sep 3, 2024
3efb530
Seperate out setup so that context switcher can call setup & load sep…
logomachic Sep 3, 2024
4eb4ede
Merge pull request #27 from prateejain-linked/MULTI_QUERY
sirus-ms Sep 3, 2024
c80510a
Fixed bug where defaults for vector store weren't being set for index…
logomachic Sep 3, 2024
67e1033
Setup for vector store happens once per activation instead of for eve…
logomachic Sep 3, 2024
82a2034
Merge pull request #29 from prateejain-linked/MULTI_QUERY
sirus-ms Sep 3, 2024
c353d68
Arg mismatch & not calling load kusto in query anymore.
logomachic Sep 3, 2024
3ecccdf
Merge pull request #30 from prateejain-linked/MULTI_QUERY
sirus-ms Sep 3, 2024
dae1bee
Adding report_name var to query, load_ doesn't overwrite automaticall…
logomachic Sep 4, 2024
eb6cbe4
Get rid of concat in context_switcher so each file gets uploaded sepe…
logomachic Sep 4, 2024
560bbe1
Configuring in_memory embedding storage even with vector_store config…
logomachic Sep 4, 2024
ed14b6a
Merge pull request #32 from prateejain-linked/MULTI_QUERY
sirus-ms Sep 4, 2024
fa42510
Change entity ID generation
sirus-ms Sep 4, 2024
71263a4
Add graphdb calls directly where relationships are filtered (#31)
gbarroutlook Sep 4, 2024
376578e
Merge pull request #33 from prateejain-linked/fix_ids
sirus-ms Sep 5, 2024
c333009
Adding graphdb into for-loop per data_path of context b/c it should
logomachic Sep 5, 2024
1b66bd3
logs on file & stdout + unbuffered logs
Sep 5, 2024
5521da9
Merge pull request #35 from prateejain-linked/users/amrsing/LogsOnFil…
amritpalms Sep 5, 2024
3604ca2
Merge pull request #34 from prateejain-linked/MINOR_CHANGE
sirus-ms Sep 6, 2024
44374d9
Fix cli when graphdb is not enabled. (#36)
sirus-ms Sep 6, 2024
72a866b
Add graphdb parameters for local emulator support
Sep 6, 2024
2baa18d
incline to run in azure
amritpalms Sep 8, 2024
0e1df62
commenting managed identity code
amritpalms Sep 8, 2024
a92787e
minor fix
amritpalms Sep 8, 2024
4bc605c
Merge pull request #39 from prateejain-linked/users/amrsing/RagFixesF…
sirus-ms Sep 9, 2024
beb0285
Merge pull request #37 from prateejain-linked/users/gbarrnsnchez/add_…
sirus-ms Sep 9, 2024
6a13d55
fix kusto cli
sirus-ms Sep 9, 2024
dc4a1fb
Merge pull request #40 from prateejain-linked/query_cli_kusto2
gbarroutlook Sep 9, 2024
25aad8c
fix legacy
sirus-ms Sep 10, 2024
51c7720
Implement deactivation switch
sirus-ms Sep 10, 2024
8b79c00
Merge pull request #41 from prateejain-linked/query_cli_kusto3
sirus-ms Sep 10, 2024
4d35524
Merge pull request #42 from prateejain-linked/ctx_switch_deact
sirus-ms Sep 10, 2024
544f4b7
Add deactivation switch for graphdb
sirus-ms Sep 10, 2024
52ec890
Merge pull request #43 from prateejain-linked/ctx_switch_deact_gdb
sirus-ms Sep 10, 2024
a50c65c
Allowing multiple files to be indexed.
logomachic Sep 12, 2024
7232a68
Initial code for the different query paths.
logomachic Sep 16, 2024
5afc137
Merge pull request #44 from prateejain-linked/MULTI_FILE
sirus-ms Sep 16, 2024
fd8ac3f
Merge pull request #45 from prateejain-linked/PATHS
sirus-ms Sep 16, 2024
38254ef
Add missing args
sirus-ms Sep 16, 2024
b5cbe43
Update __main__.py paths type
logomachic Sep 17, 2024
62a3495
Add text units to kusto
sirus-ms Sep 17, 2024
9a03893
Text units 2
sirus-ms Sep 17, 2024
0f8c7c4
Merge pull request #47 from prateejain-linked/txt_units
sirus-ms Sep 17, 2024
951e847
minor fix
sirus-ms Sep 17, 2024
175b075
Minor fix
sirus-ms Sep 18, 2024
58d30de
Graphrag using Azure OpenAI uses Managed Identity when no API_KEY pre…
logomachic Sep 19, 2024
1f6a49c
Query & Embedding Manged Identities changes.
logomachic Sep 19, 2024
43806a2
Added the func app compatible code
Sep 23, 2024
7a87843
Added one req for windows local debug
Sep 23, 2024
3c1781a
Removing the redundant settings.yaml
Sep 23, 2024
05ee3b9
Added the func app compatible code (#50)
prateejain-linked Sep 23, 2024
2f32844
added the local settings file forcefully
Sep 23, 2024
1db67f9
Merge branch 'main' of https://github.com/prateejain-linked/graphrag …
Sep 23, 2024
cdad208
Update local.settings.json
logomachic Sep 23, 2024
d8cb19f
Merge pull request #51 from prateejain-linked/user/prateejain/graphra…
sirus-ms Sep 23, 2024
08d80b8
config for debugger
amritpalms Sep 24, 2024
569a5a2
Merge pull request #53 from prateejain-linked/users/amrsing/WorkingDe…
amritpalms Sep 24, 2024
984fa0d
DefaultAuthCredes for llm
amritpalms Sep 24, 2024
81f0bca
fix for browser not opening
amritpalms Sep 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,5 @@ __blobstorage__/
ragtest/
.ragtest/
.pipelines
.pipeline
.pipeline
ragtest*/
25 changes: 14 additions & 11 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach to Node Functions",
"type": "node",
"request": "attach",
"port": 9229,
"preLaunchTask": "func: host start"
}
]
}
"version": "0.2.0",
"configurations": [
{
"name": "<define a name here>",
"type": "debugpy",
"python": "<Path to a python interpreter of your choice>",
"request": "launch",
"cwd": "${workspaceFolder}",
"module": "poetry",
"args": ["poe", "<command here: index or query?>", "other args"],
"stopOnEntry": false
}
]
}
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,7 @@ Any use of third-party trademarks or logos are subject to those third-party's po
## Privacy

[Microsoft Privacy Statement](https://privacy.microsoft.com/en-us/privacystatement)


## Updates
- add new settings query_context -> files [file1, file2, file3]
158 changes: 158 additions & 0 deletions common/graph_db_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
import os
import pandas as pd

from graphrag.config.models.graphdb_config import GraphDBConfig
import numpy as np

import ast

from gremlin_python.driver import client, serializer
from azure.identity import ManagedIdentityCredential

import time
import os
import json

# Azure Cosmos DB Gremlin Endpoint and other constants
COSMOS_DB_SCOPE = "https://cosmos.azure.com/.default" # The scope for Cosmos DB
class GraphDBClient:
def __init__(self,graph_db_params: GraphDBConfig|None,context_id: str|None):
self.username_prefix=graph_db_params.username
token = f"{graph_db_params.account_key}"
#if(os.environ.get("ENVIRONMENT") == "AZURE"):
# credential = ManagedIdentityCredential(client_id="295ce65c-28c6-4763-be6f-a5eb36c3ceb3")
# token = credential.get_token(COSMOS_DB_SCOPE)
self._client=client.Client(
url=f"{graph_db_params.gremlin_url}",
traversal_source="g",
username=self.username_prefix+"-contextid-"+context_id,
password=token,
message_serializer=serializer.GraphSONSerializersV2d0(),
)

def result_to_df(self,result) -> pd.DataFrame:
json_data = []
for row in result:
json_row = row[0]
properties_dict = json_row.pop('properties')
formatted_properties={}
for k,v in properties_dict.items():
new_val=v
if isinstance(v,list) and isinstance(v[0],dict):
new_val=v[0]['value']
if k=='description_embedding' or k =='text_unit_ids' or k=='graph_embedding':
new_val=ast.literal_eval(new_val)
if isinstance(new_val,list):
new_val=np.array(new_val)
formatted_properties[k]=new_val
json_row.update(formatted_properties)
json_data.append(json_row)
df = pd.DataFrame(json_data)
return df

def remove_graph(self):
self._client.submit(message=("g.V().drop()"))

def query_vertices(self,context_id:str) -> pd.DataFrame:
result = self._client.submit(
message=(
"g.V()"
),
)
return self.result_to_df(result)

def query_edges(self,context_id:str) -> pd.DataFrame:
result = self._client.submit(
message=(
"g.E()"
),
)
return self.result_to_df(result)

def element_exists(self,element_type:str,element_id:int,conditions:str="")->bool:
result=self._client.submit(
message=(
element_type+
".has('id',prop_id)"+
conditions+
".count()"
),
bindings={
"prop_id":element_id,
}
)
element_count=0
for counts in result:
element_count=counts[0]
return element_count>0

def write_vertices(self,data: pd.DataFrame)->None:
for row in data.itertuples():
if self.element_exists("g.V()",row.id):
continue
else:
self._client.submit(
message=(
"g.addV('entity')"
".property('id', prop_id)"
".property('name', prop_name)"
".property('type', prop_type)"
".property('description','prop_description')"
".property('human_readable_id', prop_human_readable_id)"
".property('category', prop_partition_key)"
".property(list,'description_embedding',prop_description_embedding)"
".property(list,'graph_embedding',prop_graph_embedding)"
".property(list,'text_unit_ids',prop_text_unit_ids)"
),
bindings={
"prop_id": row.id,
"prop_name": row.name,
"prop_type": row.type,
"prop_description": row.description,
"prop_human_readable_id": row.human_readable_id,
"prop_partition_key": "entities",
"prop_description_embedding":json.dumps(row.description_embedding.tolist() if row.description_embedding is not None else []),
"prop_graph_embedding":json.dumps(row.graph_embedding.tolist() if row.graph_embedding is not None else []),
"prop_text_unit_ids":json.dumps(row.text_unit_ids.tolist() if row.text_unit_ids is not None else []),
},
)
time.sleep(5)


def write_edges(self,data: pd.DataFrame)->None:
for row in data.itertuples():
if self.element_exists("g.E()",row.id):
continue
self._client.submit(
message=(
"g.V().has('name',prop_source_id)"
".addE('connects')"
".to(g.V().has('name',prop_target_id))"
".property('weight',prop_weight)"
".property(list,'text_unit_ids',prop_text_unit_ids)"
".property('description',prop_description)"
".property('id',prop_id)"
".property('human_readable_id',prop_human_readable_id)"
".property('source_degree',prop_source_degree)"
".property('target_degree',prop_target_degree)"
".property('rank',prop_rank)"
".property('source',prop_source)"
".property('target',prop_target)"
),
bindings={
"prop_partition_key": "entities",
"prop_source_id": row.source,
"prop_target_id": row.target,
"prop_weight": row.weight,
"prop_text_unit_ids":json.dumps(row.text_unit_ids.tolist() if row.text_unit_ids is not None else []),
"prop_description": row.description,
"prop_id": row.id,
"prop_human_readable_id": row.human_readable_id,
"prop_source_degree": row.source_degree,
"prop_target_degree": row.target_degree,
"prop_rank": row.rank,
"prop_source": row.source,
"prop_target": row.target,
},
)
time.sleep(5)
49 changes: 49 additions & 0 deletions func-app/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
bin
obj
csx
.vs
edge
Publish

*.user
*.suo
*.cscfg
*.Cache
project.lock.json

/packages
/TestResults

/tools/NuGet.exe
/App_Data
/secrets
/data
.secrets
appsettings.json
local.settings.json

node_modules
dist

# Local python packages
.python_packages/

# Python Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Azurite artifacts
__blobstorage__
__queuestorage__
__azurite_db*__.json

13 changes: 13 additions & 0 deletions func-app/.vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach to Python Functions",
"type": "python",
"request": "attach",
"port": 9091,
"preLaunchTask": "func: host start",
"justMyCode": true
}
]
}
8 changes: 8 additions & 0 deletions func-app/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"azureFunctions.deploySubpath": ".",
"azureFunctions.scmDoBuildDuringDeployment": true,
"azureFunctions.pythonVenv": ".venv",
"azureFunctions.projectLanguage": "Python",
"azureFunctions.projectRuntime": "~4",
"debug.internalConsoleOptions": "neverOpen",
}
26 changes: 26 additions & 0 deletions func-app/.vscode/tasks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"version": "2.0.0",
"tasks": [
{
"type": "func",
"command": "host start",
"problemMatcher": "$func-python-watch",
"isBackground": true,
"dependsOn": "pipInstall"
},
{
"label": "pipInstall",
"type": "shell",
"osx": {
"command": "${config:azureFunctions.pythonVenv}/bin/python -m pip install -r requirements.txt"
},
"windows": {
"command": "${config:azureFunctions.pythonVenv}\\Scripts\\python -m pip install -r requirements.txt"
},
"linux": {
"command": "${config:azureFunctions.pythonVenv}/bin/python -m pip install -r requirements.txt"
},
"problemMatcher": []
}
]
}
Loading