Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Encode Clips fails #4623

Closed
1 of 3 tasks
gvillo opened this issue Oct 24, 2023 · 10 comments
Closed
1 of 3 tasks

[BUG] Encode Clips fails #4623

gvillo opened this issue Oct 24, 2023 · 10 comments

Comments

@gvillo
Copy link

gvillo commented Oct 24, 2023

The bug

It's failing while processing encode clips, I've copied manually the onnx files like mentioned on #4117, but it's still not working I am getting this error:

Exception in ASGI application
Traceback (most recent call last):
  File "/lsiopy/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/lsiopy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/lsiopy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/lsiopy/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/lsiopy/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/lsiopy/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/immich/machine-learning/app/main.py", line 77, in predict
    outputs = await run(model, inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/immich/machine-learning/app/main.py", line 85, in run
    return await asyncio.get_running_loop().run_in_executor(app.state.thread_pool, model.predict, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/immich/machine-learning/app/models/base.py", line 72, in predict
    return self._predict(inputs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/immich/machine-learning/app/models/clip.py", line 84, in _predict
    outputs = self.vision_model.run(self.vision_outputs, {"pixel_values": pixel_values})
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 213, in run
    self._validate_input(list(input_feed.keys()))
  File "/lsiopy/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 195, in _validate_input
    raise ValueError(
ValueError: Required inputs (['input']) are missing from input feed (['pixel_values']).
[Nest] 745  - 10/24/2023, 3:33:42 PM   ERROR [JobService] Unable to run job handler (clipEncoding/clip-encode): Error: Request for clip failed with status 500: Internal Server Error
[Nest] 745  - 10/24/2023, 3:33:42 PM   ERROR [JobService] Error: Request for clip failed with status 500: Internal Server Error
    at MachineLearningRepository.post (/app/immich/server/dist/infra/repositories/machine-learning.repository.js:29:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async SmartInfoService.handleEncodeClip (/app/immich/server/dist/domain/smart-info/smart-info.service.js:118:31)
    at async /app/immich/server/dist/domain/job/job.service.js:108:37
    at async Worker.processJob (/app/immich/server/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
    at async Worker.retryIfFailed (/app/immich/server/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)
[Nest] 745  - 10/24/2023, 3:33:42 PM   ERROR [JobService] Object:
{
  "id": "84472909-77aa-449a-970e-b55cd4f86c77",
  "source": "upload"
}

The OS that Immich Server is running on

Docker

Version of Immich Server

v1.82.1

Version of Immich Mobile App

NA

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

I am using https://cosmos-cloud.io/ I couldn't find it quickly, I'll try to get this ASAP

Your .env content

DB_DATABASE_NAME=xxxxx
DB_HOSTNAME=Immich-postgres
DB_PASSWORD=xxxxxxxxxxxxx
DB_USERNAME=xxxxxxxx
HOME=/root
IMMICH_MACHINE_LEARNING_URL=http://127.0.0.1:3003
IMMICH_MEDIA_LOCATION=/photos
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
MACHINE_LEARNING_CACHE_FOLDER=/config/machine-learning
NODE_ENV=production
PATH=/lsiopy/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PUBLIC_IMMICH_SERVER_URL=http://127.0.0.1:3001
REDIS_HOSTNAME=Immich-redis
REDIS_PASSWORD=xxxxxxxxxxxxxxxxxxx
S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0
S6_STAGE2_HOOK=/docker-mods
S6_VERBOSITY=1
TERM=xterm
TRANSFORMERS_CACHE=/config/machine-learning
TYPESENSE_API_KEY=xyz
TYPESENSE_DATA_DIR=/config/typesense
TYPESENSE_HOST=127.0.0.1
TYPESENSE_VERSION=0.24.1
VIRTUAL_ENV=/lsiopy


### Reproduction steps

1. Start encode clips job
2. Check logs



### Additional information

_No response_
@alextran1502
Copy link
Contributor

alextran1502 commented Oct 24, 2023

Can you describe the exact folder that you copied the model into?

@gvillo
Copy link
Author

gvillo commented Oct 24, 2023

I copied model files into /config/machine-learning/clip/ViT-B-32__openai, /config/machine-learning is the cache folder inside of the container. There were no files under clip/ViT-B-32__openapi folder before. I've downloaded those files from here

image

@gvillo
Copy link
Author

gvillo commented Oct 24, 2023

as we speak I am downloading the files from the latest links on #4117 (it's an IP address, and it's so slow) and test it out if those files are the issue (I don't think so)

@alextran1502
Copy link
Contributor

I think the files aren't place in the correct folder where the machine learning is supposed to look into. You are not using a standard setup so I am not having a lot of knowledge here

@gvillo
Copy link
Author

gvillo commented Oct 24, 2023

Yeah I know, this is the setup that cosmos-cloud does, the machine learning cache folder is defined on .env file inside of the same container, instead of in a external volume, that's the only difference.

@alextran1502
Copy link
Contributor

alextran1502 commented Oct 24, 2023

Hmm I don't see the error is from missing the models' files. Does it fail on one asset or all assets that get uploaded?

@gvillo
Copy link
Author

gvillo commented Oct 24, 2023

I think in all assets, but at least I can confirm many, I am seeing lot of errors in the log :(. I have more than 10k assets on my library now

@alextran1502
Copy link
Contributor

Let me ask the ML expert, @mertalev, do you have any thought on this?

@mertalev
Copy link
Contributor

mertalev commented Oct 24, 2023

The error looks like the error you get if you follow my comment to download from Hugging Face. This actually doesn't work since the models turned out to be very slightly different. I left that comment with a disclaimer for posterity, but I just deleted it to avoid confusion.

If that's the case, the bottom of the thread is where you'll find the right models.

@gvillo
Copy link
Author

gvillo commented Oct 24, 2023

Thank you all for your replies! after downloading the files for 2 hours (download was so slow today for that server), they are different and I can confirm it works ok now!.

@gvillo gvillo closed this as completed Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants