-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add finetuned METL-Local and METL-Global target models #8
Comments
I wrote a short bash script to convert all the checkpoints for Zenodo using the code and environment at commit 45237bb of the METL repo. #!/bin/bash
for f in finetuned_model_checkpoints/*/checkpoints/epoch*.ckpt; do
python code/convert_ckpt.py --ckpt_path $f --output_dir models/
done Script output
Sam selected those checkpoints from the epoch with the lowest validation loss. We'll use the attached index.csv to update the readme. |
mv ./models/PeT2D92j ./models/METL_G_20M_1D_avgfp |
no promises that it's correct. I don't know when to use the syntax here vs here But for most of the models it looks correct Script: import os
import polars as pl
index = pl.read_csv('./index.csv', separator=',')
#index.head()
for row in index.rows(named=True):
uuid:str = row['uuid']
plt_name:str = row['plot_name']
ds_name:str = row['ds_name']
new_name = plt_name.upper()
new_name = new_name.replace('GLOBAL', 'G')
new_name = new_name.replace('LOCAL', 'L')
tail_idx = new_name.index('D_')
new_name = new_name[:tail_idx+1]
new_name = new_name + f'_{ds_name}'
print(f'mv ./models/{uuid} ./models/{new_name}') |
Actually not sure when to caps the ending for these either. It's inconsistent in the tables in metl-pretrained so it's just whatever the ds_name is for the ones I printed. I think GB1 needs all cap but Pab1 doesn't and it's confusing |
This looks great. We may need to modify the final line in the Python script to print(f'mv ./models/{uuid}.pt ./models/{new_name}-{uuid}.pt')
That mostly comes from how the proteins and domains are referred to in literature, so it isn't entirely consistent. |
For model filenames, I suggest prefixing the name with FT to signify finetuned, followed by the ident of the base METL model, followed by the UUID. For instance: |
for row in index.rows(named=True):
uuid:str = row['uuid']
plt_name:str = row['plot_name']
ds_name:str = row['ds_name']
new_name = plt_name.upper()
new_name = new_name.replace('GLOBAL', 'G')
new_name = new_name.replace('LOCAL', 'L')
tail_idx = new_name.index('D_')
new_name = new_name[:tail_idx+1]
new_name = new_name + f'_{ds_name}'
new_name = new_name.replace('_', '-')
new_name = new_name.replace('gb1', 'GB1')
new_name = new_name.replace('avgfp', 'avGFP')
new_name = new_name.replace('DLG4', 'DLG4')
new_name = new_name.replace('gb1', 'GB1')
new_name = new_name.replace('grb2', 'Grb2')
new_name = new_name.replace('tem', 'TEM')
new_name = new_name.replace('ube4b', 'UBE4B')
print(f'mv ./models/{uuid} ./models/FT-{new_name}-{uuid}.pt') |
Thanks for working this out. The script above was missing the
Once I started uploading models I noticed some of the names didn't match (avGFP vs. GFP). I used the local source model identifiers as the reference dataset names
I adjusted those with a second script.
Zenodo has a limit of 100 files per repository and then requires depositors to archive individual files. We're only at 58 files, but that is worth considering if we add many more DMS datasets and target models. Here is a preview of the Zenodo dataset with the new files. If this looks good, I'll release it. |
Looks good to me. You can go ahead and release it! |
I published version 2.0: https://doi.org/10.5281/zenodo.13377502 I'm leaving this open until we update the readme describing all the new models and the models in Our naming convention isn't very clear about distinguishing the METL-Local GFP models that were training on most of the data versus the low-N models:
so let's be sure that is clear in the readme. The final step will be updating the model list in the Colab notebook. |
We will add these additional models to Zenodo
The text was updated successfully, but these errors were encountered: