Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algolia搜索分页 #2243

Open
3 tasks done
Do1e opened this issue Nov 26, 2024 · 2 comments
Open
3 tasks done

Algolia搜索分页 #2243

Do1e opened this issue Nov 26, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@Do1e
Copy link

Do1e commented Nov 26, 2024

Clear and concise description of the problem

免费的Algolia有10KB大小的限制,然而文章长度会偶尔超出。

Suggested solution

一个可行的解决方案是将文章分片上传。Algolia索引是根据objectID,因此我将这个改了,并进行分片

Python分片示例:

url = "http://127.0.0.1:2333/api/v2/search/algolia/import-json"
headers = {
    "Authorization": "xxxxxxxxx",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0",
}

ret = requests.get(url, headers=headers)
ret = ret.json()

to_push = []

for item in ret:
    content = item["text"]
    encoded = content.encode("utf-8")
    if len(encoded) > MAXSIZE:
        print("Too large, splitting")
        content_list = cut_content(encoded)
        template = item.copy()
        for i, content in enumerate(content_list):
            t = template.copy()
            t["text"] = content
            t["objectID"] = f'{t["objectID"]}_{i}'
            to_push.append(t)
    else:
        to_push.append(item)

如果我改了objectID上传会导致mx-space出错:

 ERROR   [Catch]  Cast to ObjectId failed for value "xxxxxxxxx-1" (type string) at path "_id" for model "posts"

  at SchemaObjectId.cast (/app/entrypoints.js:1073:883)
  at SchemaType.applySetters (/app/entrypoints.js:1187:226)
  at SchemaType.castForQuery (/app/entrypoints.js:1199:338)
  at cast (/app/entrypoints.js:159:5360)
  at Query.cast (/app/entrypoints.js:799:583)
  at Query._castConditions (/app/entrypoints.js:765:9879)
  at pn.Query._findOne (/app/entrypoints.js:768:4304)
  at pn.Query.exec (/app/entrypoints.js:784:5145)
  at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  at async Promise.all (index 0) 

因此可能需要这边相关代码支持此功能。

Alternative

使用其他键,比如'id'作为结果。

Additional context

No response

Validations

@Do1e Do1e added the enhancement New feature or request label Nov 26, 2024
Copy link

linear bot commented Nov 26, 2024

@YKMdB3p
Copy link

YKMdB3p commented Nov 27, 2024

实际上代码中是有对超过10KB的索引进行截断的,但是截断逻辑存在问题实际超过了大小。修改一下search.service.ts的代码或者直接丢掉text字段都能解决问题。我并不使用text字段作为搜索条件也不建议。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants