Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use fine-tuned model? #157

Open
aldialimucaj opened this issue Apr 18, 2024 · 3 comments
Open

How to use fine-tuned model? #157

aldialimucaj opened this issue Apr 18, 2024 · 3 comments

Comments

@aldialimucaj
Copy link

I did a small fine-tuning and the process finishes correctly. The output model is too small though and there are no weights. The list of the files is this

config.json
generation_config.json
model.safetensors (around 250 MiB)
runs/
special_tokens_map.json
tokenizer.json
tokenizer_config.json
trainer_state.json
training_args.bin

Im using the same command that you suggest:
deepspeed finetune_deepseekcoder.py
--model_name_or_path $MODEL_PATH
--data_path $DATA_PATH
--output_dir $OUTPUT_PATH
--num_train_epochs 3
--model_max_length 1024
--per_device_train_batch_size 16
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 100
--learning_rate 2e-5
--warmup_steps 10
--logging_steps 1
--lr_scheduler_type "cosine"
--gradient_checkpointing True
--report_to "tensorboard"
--deepspeed configs/ds_config_zero3.json
--bf16 True

Could you also give an example on how to use the output model?

@neuyuri
Copy link

neuyuri commented May 7, 2024

If you use deepspeed stage3 to finetune the model, the weights you got are incomplete. My solution is to set “stage3_gather_16bit” to false.

@TobiMoelti
Copy link

I probably ran into the same problem, any solution to that?

@ipratik8001
Copy link

Try merging new checkpoints on base model you have used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants