Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to download CLIP model for search #4117

Closed
1 of 3 tasks
dankasak opened this issue Sep 17, 2023 · 54 comments · Fixed by #4700
Closed
1 of 3 tasks

[BUG] Unable to download CLIP model for search #4117

dankasak opened this issue Sep 17, 2023 · 54 comments · Fixed by #4700

Comments

@dankasak
Copy link

dankasak commented Sep 17, 2023

Important

🟢 See this comment for temporary solution 🟢


The bug

When I search for anything in immich, I get generic errors in the UI. In docker logs, I can see that something is trying to download Downloading clip model 'ViT-B-32::openai' ... and "This may take a while". However it fails within about 3 seconds. I've downloaded this on the host using curl. Can I persist this somewhere for whatever needs it ... and if so, where? Why is it failing so quickly?

This seems to be triggered from:
https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/model/clip_onnx.py

c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/nodes/0.c95dfcd6.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/menu-option.36f2860d.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/image-thumbnail.ef5e539c.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/download-action.de99beb0.js
c69e23fa6733 [hooks.server.ts]:handleError Not found: /_app/immutable/chunks/thumbnail.5d0111e5.js
1058d5367490 I20230914 00:44:31.263882 353 raft_server.cpp:546] Term: 8, last_index index: 53064, committed_index: 53064, known_applied_index: 53064, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 848797
1058d5367490 I20230914 00:44:31.263913 442 raft_server.h:60] Peer refresh succeeded!
279bec116ed3 [09/14/23 00:44:33] INFO Downloading clip model 'ViT-B-32::openai'.This may
279bec116ed3 take a while.
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 0th attempt
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 1th attempt
c69e23fa6733 {
279bec116ed3 Failed to download
f9ab2bb52a73 [Nest] 2 - 09/14/2023, 12:44:39 AM ERROR [ExceptionsHandler] Request for clip failed with status 500: Internal Server Error
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
c69e23fa6733 status: 500,
f9ab2bb52a73 Error: Request for clip failed with status 500: Internal Server Error
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
c69e23fa6733 url: 'GET /search?q=tree&clip=true',
279bec116ed3 Satisfiable'> at the 2th attempt
c69e23fa6733 response: { statusCode: 500, message: 'Internal server error' }
f9ab2bb52a73 at [MachineLearningRepository.post](https://machinelearningrepository.post/) (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:29:19)
279bec116ed3 textual.onnx 0.0% • 0.0/254.1 MB • ? • 0:00:00
c69e23fa6733 }
f9ab2bb52a73 at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
279bec116ed3
c69e23fa6733 [hooks.server.ts]:handleError Internal server error
f9ab2bb52a73 at async SearchService.search (/usr/src/app/dist/domain/search/search.service.js:114:35)
279bec116ed3 Exception in ASGI application
f9ab2bb52a73 at async /usr/src/app/node_modules/@nestjs/core/router/router-execution-context.js:46:28
279bec116ed3 Traceback (most recent call last):
f9ab2bb52a73 at async /usr/src/app/node_modules/@nestjs/core/router/router-proxy.js:9:17
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
279bec116ed3 result = await app( # type: ignore[func-returns-value]
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
279bec116ed3 return await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
279bec116ed3 await super().__call__(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
279bec116ed3 await self.middleware_stack(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
279bec116ed3 raise exc
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, _send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
279bec116ed3 raise exc
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, sender)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
279bec116ed3 raise e
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
279bec116ed3 await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
279bec116ed3 await route.handle(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
279bec116ed3 await [self.app](https://self.app/)(scope, receive, send)
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
279bec116ed3 response = await func(request)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
279bec116ed3 raw_response = await run_endpoint_function(
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
279bec116ed3 return await [dependant.call](https://dependant.call/)(**values)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 75, in predict
279bec116ed3 model = await load(await app.state.model_cache.get(model_name, model_type, **kwargs))
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 101, in load
279bec116ed3 await loop.run_in_executor(app.state.thread_pool, _load)
279bec116ed3 File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
279bec116ed3 result = self.fn(*self.args, **self.kwargs)
279bec116ed3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279bec116ed3 File "/usr/src/app/main.py", line 94, in _load
279bec116ed3 model.load()
279bec116ed3 File "/usr/src/app/models/base.py", line 63, in load
279bec116ed3 [self.download](https://self.download/)()
279bec116ed3 File "/usr/src/app/models/base.py", line 58, in download
279bec116ed3 self._download()
279bec116ed3 File "/usr/src/app/models/clip.py", line 51, in _download
279bec116ed3 self._download_model(*models[0])
279bec116ed3 File "/usr/src/app/models/clip.py", line 123, in _download_model
279bec116ed3 download_model(
279bec116ed3 File "/opt/venv/lib/python3.11/site-packages/clip_server/model/pretrained_models.py", line 239, in download_model
279bec116ed3 raise RuntimeError(
279bec116ed3 RuntimeError: Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx within retry limit 3
279bec116ed3 [09/14/23 00:44:39] INFO Downloading clip model 'ViT-B-32::openai'.This may
279bec116ed3 take a while.
1058d5367490 I20230914 00:44:41.235440 354 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
1058d5367490 I20230914 00:44:41.264710 353 raft_server.cpp:546] Term: 8, last_index index: 53064, committed_index: 53064, known_applied_index: 53064, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 848797
1058d5367490 I20230914 00:44:41.264742 442 raft_server.h:60] Peer refresh succeeded!
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 0th attempt
279bec116ed3 Failed to download
279bec116ed3 https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
279bec116ed3 76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 416: 'Requested Range Not
279bec116ed3 Satisfiable'> at the 1th attempt

The OS that Immich Server is running on

Docker

Version of Immich Server

v1.78.0

Version of Immich Mobile App

v1.78.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - ${PHOTOPRISM_LOCATION}:/photoprism:ro
    env_file:
      - .env
    depends_on:
      - redis
#      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.yml
      service: hwaccel
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - ${PHOTOPRISM_LOCATION}:/photoprism:ro
    env_file:
      - .env
    depends_on:
      - redis
#      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - ${MODEL_CACHE_LOCATION}:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    volumes:
      - ${TYPESENSE_LOCATION}:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always

#  database:
#    container_name: immich_postgres
#    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
#    env_file:
#      - .env
#    environment:
#      POSTGRES_PASSWORD: ${DB_PASSWORD}
#      POSTGRES_USER: ${DB_USERNAME}
#      POSTGRES_DB: ${DB_DATABASE_NAME}
#    volumes:
#      - ${PG_LOCATION}:/var/lib/postgresql/data
#    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/mnt/array0/immich/uploads
PG_LOCATION=/mnt/array0/immich/postgres
MODEL_CACHE_LOCATION=/mnt/array0/immich/model-cache
TYPESENSE_LOCATION=/mnt/array0/immich/typesense

PHOTOPRISM_LOCATION=/mnt/array0/photoprism/originals

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secrets for postgres and typesense. You should change these to random passwords
TYPESENSE_API_KEY=blah-bliggedy-blah
DB_PASSWORD=********

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=192.168.1.128
DB_USERNAME=immich
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

1.search for anything in immich

Additional information

A search in the UI will trigger a download of https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/model/clip_onnx.py which will fail almost immediately

@alextran1502
Copy link
Contributor

Hello, is this a clean instance, or this instance had been successfully using the search mechanism before?

Please try to remove the volume for model-cache and try again

@dankasak
Copy link
Author

Hi! Thanks for responding.

Hello, is this a clean instance

Yes - all I've done so far is a batch import via the cli, and had a quick play.

or this instance had been successfully using the search mechanism before?

No, it's never worked.

Please try to remove the volume for model-cache and try again

I backed up the model-cache directory, then created a fresh one, and restarted everything. It still fails, however I see the HTTP error code has changed from:

<HTTPError 416: 'Requested Range Not Satisfiable'>
... to:
<HTTPError 403: 'Forbidden'>

I've tried with curl, and I can see that curl now concurs - I'm now not able to download this file manually. I still have the one I previously downloaded.

I also tried commenting out the model-cache volume stuff - same error as above.

@shdwlkr
Copy link

shdwlkr commented Sep 18, 2023

hey there!
same problem here with 403

RuntimeError: Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx within retry limit 3
Failed to download
https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 403: 'Forbidden'> at the 0th
attempt
Failed to download
https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d536572
76696365/onnx/ViT-B-32/textual.onnx with <HTTPError 403: 'Forbidden'> at the 1th

@GJCav
Copy link

GJCav commented Sep 18, 2023

sadly, same problem for me.

open the link https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx in the Edge. It responds that:

Snipaste_2023-09-18_21-10-51

I tested in the mainland of China. The problem may be relevant to the GFW or it is simply because amazonaws blocks all Chinese IPs.

@Core447
Copy link

Core447 commented Sep 18, 2023

I have exactly the same problem on a new install

@shdwlkr
Copy link

shdwlkr commented Sep 18, 2023

imho this is not an immich issue, looks like there is a problem with clip-as-service
jina-ai/clip-as-service#931

@ghunkins
Copy link

Having this issue as well, does not appear to be geographic.

@mouie
Copy link

mouie commented Sep 19, 2023

Unfortunately same issue for me with smart search. Metadata search (i.e., prefix with 'm:' still works as expected).

Relevant immich-machinelearning logs (not to look like a broken record):

[09/19/23 11:41:40] INFO Downloading clip model 'ViT-B-32::openai'. This may take a while.

Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx with <HTTPError 403: 'Forbidden'> at the 0th attempt

Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx with <HTTPError 403: 'Forbidden'> at the 1th attempt

Failed to download https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx with <HTTPError 403: 'Forbidden'> at the 2th attempt

textual.onnx 0.0% • 0/100 bytes • ? • -:--:--

@Teonyr
Copy link

Teonyr commented Sep 20, 2023

As this issue still exists and seems to affect more and more people, because new installs can't download the clip model, there should be implemented some sort of fallback to counter such situations in the future.

Maybe a fallback model could be distributed which would spring into action if no new (/updated) model can be downloaded?

Because I think there is no sense in a (local and) aspecially vital function like a search to fail, because of some external service not being available or something.

@alextran1502
Copy link
Contributor

@mertalev do you have any thought ok this issue?

@wechsler42
Copy link

wechsler42 commented Sep 20, 2023

Hi there,
with a new Install of immich v1.78.1 I run in the same issue with smart search. Metadata search, i.e., prefix with 'm:' ,still works as expected. The <HTTPError 403: 'Forbidden'> is present in the logs.

@mertalev
Copy link
Contributor

Hmm, it might be better to use the models provided by Marqo instead since they're hosted in HF. This would have faster download speeds as well.

@alextran1502
Copy link
Contributor

@mertalev can we change that to make the model as default?

@mertalev
Copy link
Contributor

I can look at it later today. Marqo uses a different naming scheme, so I'd need to map it to the same cache folder names we use to avoid duplicating models, and also migrate the model name in the system config. It shouldn't be too much work outside of that.

@PhilippWoelfel
Copy link

Is there any workaround for those who don't have the file in cache?

@PhilippWoelfel
Copy link

Thank you for the quick response - but I had tried using the Marqo models, but loading the model causes an error in the ML component. Not sure if it's a different bug?

Sep 20 19:25:33 nixi systemd[1]: Started docker-immich-machine-learning.service.
Sep 20 19:25:34 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:34] INFO     Starting gunicorn 21.2.0
Sep 20 19:25:34 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:34] INFO     Listening at: http://0.0.0.0:3003 (9)
Sep 20 19:25:34 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:34] INFO     Using worker: uvicorn.workers.UvicornWorker
Sep 20 19:25:34 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:34] INFO     Booting worker with pid: 10
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:43] INFO     Created in-memory cache with unloading disabled.
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:43] INFO     Initialized request thread pool with 8 threads.
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: [09/20/23 19:25:43] INFO     Loading clip model 'ViT-B-32::openai'
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: Exception in ASGI application
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: Traceback (most recent call last):
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     result = await app(  # type: ignore[func-returns-value]
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     return await self.app(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await super().__call__(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await self.middleware_stack(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     raise exc
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await self.app(scope, receive, _send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     raise exc
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await self.app(scope, receive, sender)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     raise e
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await self.app(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await route.handle(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     await self.app(scope, receive, send)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     response = await func(request)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:                ^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     raw_response = await run_endpoint_function(
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     return await dependant.call(**values)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/usr/src/app/main.py", line 77, in predict
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     outputs = await run(model, inputs)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:               ^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/usr/src/app/main.py", line 85, in run
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     return await asyncio.get_running_loop().run_in_executor(app.state.thread_pool, model.predict, inputs)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     result = self.fn(*self.args, **self.kwargs)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/usr/src/app/models/base.py", line 72, in predict
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     return self._predict(inputs)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:            ^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/usr/src/app/models/clip.py", line 101, in _predict
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     outputs = self.text_model.run(self.text_outputs, inputs)
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 213, in run
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     self._validate_input(list(input_feed.keys()))
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:   File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 195, in _validate_input
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]:     raise ValueError(
Sep 20 19:25:43 nixi docker-immich-machine-learning-start[2567551]: ValueError: Required inputs (['input']) are missing from input feed (['input_ids', 'attention_mask']).

@mertalev
Copy link
Contributor

Oh, that's interesting. Looking at the text model's graph, I think it expects the attention mask to be pre-applied so there's only one input. Looks like I'll need to change the preprocessing.

Jina:

jina_graph

Marqo:

marqo_graph

@nebulade
Copy link
Contributor

For the use-case of Immich app packages (in our case for Cloudron), would it make sense to fetch and include those files during package/image building? If so, is there some pre-fetch command available somewhere to do this or a common place to check the URLs and versions for the models which should be fetched?

@mouie
Copy link

mouie commented Sep 21, 2023

PSA - the Jina AI textual.onnx bucket is working again, as is smart search on Immich.

Suspect there will still be good enhancements that come out of this - thank you community!

@Schluggi
Copy link

Sorry, what is the final fix for these issue?

I tried to copy this files now I got this error.

@mertalev
Copy link
Contributor

Since the S3 bucket now works, we haven't needed to make any changes for the time-being.

I'm assuming you came to this issue because you're having problems downloading the CLIP model. Can you delete your model-cache docker volume, restart the ML container, and start a CLIP job in the Jobs panel? If this doesn't work, could you share the error logs you get?

@Schluggi
Copy link

Thank you @mertalev, after deleting the volume the search now works.

I guess I have to run the "ENCODE CLIP" job again?

@mertalev
Copy link
Contributor

You can run a "missing" job for Encode CLIP, but no need to run it on all images.

@Schluggi
Copy link

Thank you @mertalev. It works :)

@traktuner
Copy link
Contributor

Hello!
Is the bucket down again?
I get that "Failed to download"..

https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-B-32/textual.onnx

@apprisix
Copy link

@dylangovender
Copy link

Firstly, just want to say, I installed Immich for the first time this weekend and, wow. The software is amazing and the community is even better! So much support and documentation.

I was also having the issue of not being able to download the models.

What solved it for me was:

sudo su
cd /var/lib/docker/volumes/immich_model-cache/_data/clip/ViT-B-32__openai
wget http://95.216.206.130/clip/ViT-B-32__openai/textual.onnx
wget http://95.216.206.130/clip/ViT-B-32__openai/visual.onnx

@Gecko-with-a-hat
Copy link

The manual fix also worked for me! thank you very much.

I would just like to point out that this functionality is still broken on the Immich demo page.

@FelixBuehler
Copy link

FelixBuehler commented Oct 27, 2023

@dylangovender

thanks for that! am i right in assuming that i need to re-run "ENCODE CLIP" and "TAG OBJECTS" in the jobs page?

@dylangovender
Copy link

dylangovender commented Oct 27, 2023

Hi @FelixBuehler,

Yes, I re-ran "ENCODE CLIP" and search started working for me again.

In my case, it was a new instance of Immich, so that was the first time ENCODE CLIP actually ran at all.

@alextran1502 alextran1502 pinned this issue Oct 27, 2023
@alextran1502 alextran1502 changed the title [BUG] Search for anything ==> Immich fails to download textual.onnx [BUG] Unable to download CLIP model for search Oct 27, 2023
@Mansour-J
Copy link
Contributor

@KjeldsenDK I am on windows + docker desktop as well.

The way I added it to my docker volume model cache:

  1. Open explorer
  2. In the explorer url type \\wsl$
  3. Based on my docker desktop version 4.22.0 (117440) my model cache was located at \\wsl.localhost\docker-desktop-data\version-pack-data\community\docker\volumes\docker_model-cache\_data

@aviv926
Copy link
Contributor

aviv926 commented Oct 31, 2023

All links have been fixed and working again

@yyyyyyyysssss
Copy link

@NiklasRosenstein hello Does this model have it microsoft/resnet-50

@yyyyyyyysssss
Copy link

@alextran1502 hi microsoft/resnet-50 model Do you have resources

@aviv926
Copy link
Contributor

aviv926 commented Nov 9, 2023

@alextran1502 hi microsoft/resnet-50 model Do you have resources

Can't download this model?

@uniform641
Copy link

I've uploaded my local default models for clip, facial-recognition and image-classification to Google Drive, you can download it from here

After extracting the zip file, you will need to copy these files to the location of your model-cache volume, can typically be found in /var/lib/docker/volumes/<volume-name>/_data

Or you can find that information with

docker volume inspect <model-cache-volume-name>

image

It seems that the download link is down. Due to network issue I have to download every model manually. But I don't know the file structure of model-cache foldr and naming rule of model in the folder. Would anyone offer a file structure of model-cache folder? I would appreciate it very much.

@aviv926
Copy link
Contributor

aviv926 commented Nov 17, 2023

I've uploaded my local default models for clip, facial-recognition and image-classification to Google Drive, you can download it from here

After extracting the zip file, you will need to copy these files to the location of your model-cache volume, can typically be found in /var/lib/docker/volumes/<volume-name>/_data

Or you can find that information with

docker volume inspect <model-cache-volume-name>

image

It seems that the download link is down. Due to network issue I have to download every model manually. But I don't know the file structure of model-cache foldr and naming rule of model in the folder. Would anyone offer a file structure of model-cache folder? I would appreciate it very much.

If you are having network issues while downloading I would recommend you to use a free VPN like Proton to bypass the limit temporarily

If that not work for you I can send you the
file structure later...

@acios
Copy link

acios commented Nov 18, 2023

I've uploaded my local default models for clip, facial-recognition and image-classification to Google Drive, you can download it from here

After extracting the zip file, you will need to copy these files to the location of your model-cache volume, can typically be found in /var/lib/docker/volumes/<volume-name>/_data

Or you can find that information with

docker volume inspect <model-cache-volume-name>

image

i've searched multiple issues and coomments and this seems to be a feasable fix for my problem, but the link is down.
Could someone help to upload another version or privide a file structure for us to manually download these files? Thanks

@uniform641
Copy link

I've uploaded my local default models for clip, facial-recognition and image-classification to Google Drive, you can download it from here
After extracting the zip file, you will need to copy these files to the location of your model-cache volume, can typically be found in /var/lib/docker/volumes/<volume-name>/_data
Or you can find that information with

docker volume inspect <model-cache-volume-name>

image

It seems that the download link is down. Due to network issue I have to download every model manually. But I don't know the file structure of model-cache foldr and naming rule of model in the folder. Would anyone offer a file structure of model-cache folder? I would appreciate it very much.

If you are having network issues while downloading I would recommend you to use a free VPN like Proton to bypass the limit temporarily

If that not work for you I can send you the file structure later...

Thanks for your advice. To permanently solve the problem I managed to build a tproxy on the server.

@acios
Copy link

acios commented Nov 18, 2023

i checked my log and I'm having trouble downloading all the models needed by machine learning

[11/18/23 11:03:16] INFO Initialized request thread pool with 8 threads.
[11/18/23 11:03:16] INFO Downloading facial recognition model
'buffalo_l'.This may take a while.
[11/18/23 11:05:27] INFO Downloading facial recognition model
'buffalo_l'.This may take a while.
[11/18/23 11:05:27] WARNING Failed to load facial-recognition model
'buffalo_l'.Clearing cache and retrying.
[11/18/23 11:05:27] WARNING Attempted to clear cache for model 'buffalo_l' but
cache directory does not exist.
[11/18/23 11:07:38] INFO Downloading clip model 'ViT-B-32__openai'.This may
take a while.
[11/18/23 11:07:38] WARNING Failed to load facial-recognition model
'buffalo_l'.Clearing cache and retrying.
[11/18/23 11:07:38] WARNING Attempted to clear cache for model 'buffalo_l' but
cache directory does not exist.
[11/18/23 11:09:49] INFO Downloading image classification model
'microsoft/resnet-50'.This may take a while.
[11/18/23 11:09:49] WARNING Failed to load clip model
'ViT-B-32__openai'.Clearing cache and retrying.
[11/18/23 11:09:49] INFO Cleared cache directory for model
'ViT-B-32__openai'.

I don't know if it is a connection issue or what, seems like the program failed to even create the folders to save those files, including not only buffalo_l but also the ones mentioned in the above comments. I thought manually put those files in the cache folder might help but did not work, probably because I put them in the wrong place?

@aviv926
Copy link
Contributor

aviv926 commented Nov 18, 2023

i checked my log and I'm having trouble downloading all the models needed by machine learning

[11/18/23 11:03:16] INFO Initialized request thread pool with 8 threads. [11/18/23 11:03:16] INFO Downloading facial recognition model 'buffalo_l'.This may take a while. [11/18/23 11:05:27] INFO Downloading facial recognition model 'buffalo_l'.This may take a while. [11/18/23 11:05:27] WARNING Failed to load facial-recognition model 'buffalo_l'.Clearing cache and retrying. [11/18/23 11:05:27] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist. [11/18/23 11:07:38] INFO Downloading clip model 'ViT-B-32__openai'.This may take a while. [11/18/23 11:07:38] WARNING Failed to load facial-recognition model 'buffalo_l'.Clearing cache and retrying. [11/18/23 11:07:38] WARNING Attempted to clear cache for model 'buffalo_l' but cache directory does not exist. [11/18/23 11:09:49] INFO Downloading image classification model 'microsoft/resnet-50'.This may take a while. [11/18/23 11:09:49] WARNING Failed to load clip model 'ViT-B-32__openai'.Clearing cache and retrying. [11/18/23 11:09:49] INFO Cleared cache directory for model 'ViT-B-32__openai'.

I don't know if it is a connection issue or what, seems like the program failed to even create the folders to save those files, including not only buffalo_l but also the ones mentioned in the above comments. I thought manually put those files in the cache folder might help but did not work, probably because I put them in the wrong place?

What happens in terms of permissions? Do you have permissions to access the folder?
Try to give more details about your system and the YML file

@acios
Copy link

acios commented Nov 19, 2023

'microsoft/resnet-50'

I don't know how to check permissions under docker, I opened a new issue for details of yml files, please check:

#5134

thanks for the help, I'm new to linux and still learning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.