You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@lucidrains@amorehead Hi, thank you very much for your efforts in the reproduction of AlphaFold3. I have downloaded the preprocessed mmCIF files and chain/interface clustering files as described in the README and would like to use the clustered test set to evaluate AF3.
Based on my understanding, the json, csv, and fasta files should contain information on the chain IDs, cluster mapping, and sequences. However, I noticed inconsistencies between them and the RCSB PDB. For example, in filtered_all_chain_sequences.json:
8a14-assembly1: The file only records 2 chains, whereas RCSB shows that it has 6 chains.
8sza-assembly1: The file does not seem to include ligand information.
The sequences in both cases appear to be cropped compared to the original sequences in RCSB.
Other entries have similar inconsistencies as well. Am I missing something here? How to use the chain/interface clustering files to evaluate AF3?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered:
My first thoughts are that these differences may be the result of the PDB dataset's preprocessing scripts, as described in the AF3 paper. This preprocessing script will (in several cases) drop residues or chains that do not meet AF3's strict filtering criteria. For more details, I recommend reviewing the preprocessing scripts in scripts/, and let know if you have any other questions.
@amorehead Thank you for the quick response! I still have some doubts regarding the evaluation process. Should I use the filtered and cropped sequences from filtered_all_chain_sequences.json for inference? I couldn’t find any description in the AF3 paper or its Supplementary Information about cropping the sequences for the evaluation (only the training process was mentioned). Did I miss something?
Hi, @zqcai19. This filtering of the train, val, and test structures (particularly for the test structures) seems to be implicitly suggested by the AF3 paper. To standardize all three dataset splits, this is how I interpreted the paper.
@lucidrains @amorehead Hi, thank you very much for your efforts in the reproduction of AlphaFold3. I have downloaded the preprocessed mmCIF files and chain/interface clustering files as described in the README and would like to use the clustered test set to evaluate AF3.
Based on my understanding, the
json
,csv
, andfasta
files should contain information on the chain IDs, cluster mapping, and sequences. However, I noticed inconsistencies between them and the RCSB PDB. For example, infiltered_all_chain_sequences.json
:Other entries have similar inconsistencies as well. Am I missing something here? How to use the chain/interface clustering files to evaluate AF3?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: