You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thank you so much for you work!
I am trying to finetune the mantis model for multi-image question answering. For the time being I just want to check if my script works. Using mixed precision causes this error. So I removed all the encoder and decoder layers from the model except 1 of each only to fit the model in memory. However, this still gives CUDA out of memory during validation, in spite of the reduced model size (trainable params: 1,322,308,128). I am trying to run this on A40 GPU. I am puzzled as to how training works fine but validation causes an out of memory issue.
Approaches tried:
Using LORA- training works fine, CUDA out of memory after certain number of validation steps
Using accelerate- The script hangs. Meaning not a single training iteration is performed even after a considerable time.
I have also added torch.cuda.empty_cache() before every training_step and prediction_step call.
Here are relevant portion of the code:
Imports that set environment:
import sys
sys.path.append("") # path to some library
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
Model and processor initialization:
model_name = "TIGER-Lab/Mantis-8B-siglip-llama3"
processor = MLlavaProcessor.from_pretrained(model_name)
model = CustomLlavaForConditionalGeneration.from_pretrained(model_name).cuda()# Customer model inherits LlavaForConditionalGeneration and performs torch.cuda.empty_cache() before every prediction and training step
model.vision_tower.vision_model.encoder.layers = torch.nn.ModuleList([model.vision_tower.vision_model.encoder.layers[0]])
model.language_model.model.layers = torch.nn.ModuleList([model.language_model.model.layers[0]])
torch.cuda.empty_cache()
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
model.language_model.resize_token_embeddings(len(processor.tokenizer))
model.config.text_config.vocab_size = len(processor.tokenizer)
Trainer and training arguments:
training_args = TrainingArguments(
output_dir=args.output_dir,
per_device_train_batch_size=args.batch_size, # train batch size is 2
per_device_eval_batch_size=1,
gradient_accumulation_steps = 2,
num_train_epochs=args.num_epochs,
warmup_ratio=args.warmup_ratio,
eval_strategy="steps",
eval_steps=100, # set to this value only to check eval
logging_dir=os.path.join(args.output_dir, "logs"),
logging_steps=10,
remove_unused_columns=False,
learning_rate = 5e-6,
lr_scheduler_type = "linear",
weight_decay=0.01,
max_grad_norm = 1.0,
dataloader_pin_memory = True,
report_to=None
)
trainer = CustomTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
data_collator=data_collator,
tokenizer=processor.tokenizer,
compute_metrics=compute_metrics
)
torch.cuda.empty_cache()
trainer.train()
torch.cuda.empty_cache()
Error message:
CUDA out of memory. Tried to allocate 13.89 GiB. GPU 0 has a total capacity of 44.35 GiB of which 13.54 GiB is free. Including non-PyTorch memory, this process has 30.79 GiB memory in use. Of the allocated memory 28.70 GiB is allocated by PyTorch, and 1.78 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables).
This occurs after 16 iterations in validation have completed.
Could you please help me figure out what could be causing this issue?
The text was updated successfully, but these errors were encountered:
Hello, thank you so much for you work!
I am trying to finetune the mantis model for multi-image question answering. For the time being I just want to check if my script works. Using mixed precision causes this error. So I removed all the encoder and decoder layers from the model except 1 of each only to fit the model in memory. However, this still gives CUDA out of memory during validation, in spite of the reduced model size (trainable params: 1,322,308,128). I am trying to run this on A40 GPU. I am puzzled as to how training works fine but validation causes an out of memory issue.
Approaches tried:
Using LORA- training works fine, CUDA out of memory after certain number of validation steps
Using accelerate- The script hangs. Meaning not a single training iteration is performed even after a considerable time.
I have also added
torch.cuda.empty_cache()
before everytraining_step
andprediction_step
call.Here are relevant portion of the code:
Imports that set environment:
Model and processor initialization:
Trainer and training arguments:
Error message:
CUDA out of memory. Tried to allocate 13.89 GiB. GPU 0 has a total capacity of 44.35 GiB of which 13.54 GiB is free. Including non-PyTorch memory, this process has 30.79 GiB memory in use. Of the allocated memory 28.70 GiB is allocated by PyTorch, and 1.78 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables).
This occurs after 16 iterations in validation have completed.
Could you please help me figure out what could be causing this issue?
The text was updated successfully, but these errors were encountered: