We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for your work. I have some questions related to training. I tried to train the model with a small portion of the data, but when I tried to train using dataset online like: https://huggingface.co/datasets/imageomics/TreeOfLife-10M/blob/main/dataset/EOL/image_set_01.tar.gz, and download the dataset in local
python -m src.training.main \ --train-data 'https://huggingface.co/datasets/imageomics/TreeOfLife-10M/resolve/main/dataset/EOL/image_set_01.tar.gz' \ --val-data 'https://huggingface.co/datasets/imageomics/TreeOfLife-10M/resolve/main/dataset/EOL/image_set_01.tar.gz' \ --dataset-type 'webdataset' \ --pretrained 'openai' \ --text_type 'random' \ --warmup 100 \ --batch-size 1 \ --accum-freq 1 \ --epochs 10 \ --workers 1 \ --model ViT-B-16 \ --lr 1e-4 \ --log-every-n-steps 1 \ --dataset-resampled \ --local-loss \ --gather-with-grad \ --grad-checkpointing \ --logs '../storage/log/' \ --train-num-samples 98000 \
it always gets stuck at the following position
2024-12-11,23:16:02 | INFO | wandb_notes: 2024-12-11,23:16:02 | INFO | wandb_project_name: open-clip 2024-12-11,23:16:02 | INFO | warmup: 100 2024-12-11,23:16:02 | INFO | wd: 0.2 2024-12-11,23:16:02 | INFO | workers: 1 2024-12-11,23:16:02 | INFO | world_size: 1 2024-12-11,23:16:02 | INFO | zeroshot_frequency: 2 2024-12-11,23:16:02 | INFO | Finish counting shard total size: 98000. 2024-12-11,23:16:02 | INFO | Finish counting shard total size: 0. 2024-12-11,23:16:02 | INFO | Start epoch 0 <webdataset.compat.WebLoader object at 0x719706e3a170>
In addition, I found the missing "data/resolved.jsonl" file when creating the data,
python scripts/evobio10m/make_metadata.py --db /fs/ess/PAS2136/open_clip/data/evobio10m-v3.3/mapping.sqlite
and the ToL-EDA HF Repo mentioned in the readme has disappeared
Can you provide me with some help to solve these problems Or where can I find the details about training
Thank you very much
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Thanks for your work.
I have some questions related to training.
I tried to train the model with a small portion of the data, but when I tried to train using dataset online like:
https://huggingface.co/datasets/imageomics/TreeOfLife-10M/blob/main/dataset/EOL/image_set_01.tar.gz,
and download the dataset in local
it always gets stuck at the following position
In addition, I found the missing "data/resolved.jsonl" file when creating the data,
and the ToL-EDA HF Repo mentioned in the readme has disappeared
Can you provide me with some help to solve these problems
Or where can I find the details about training
Thank you very much
The text was updated successfully, but these errors were encountered: