Questions about the chain/interface clustering files #307

zqcai19 · 2024-10-11T11:37:10Z

@lucidrains @amorehead Hi, thank you very much for your efforts in the reproduction of AlphaFold3. I have downloaded the preprocessed mmCIF files and chain/interface clustering files as described in the README and would like to use the clustered test set to evaluate AF3.

Based on my understanding, the json, csv, and fasta files should contain information on the chain IDs, cluster mapping, and sequences. However, I noticed inconsistencies between them and the RCSB PDB. For example, in filtered_all_chain_sequences.json:

8a14-assembly1: The file only records 2 chains, whereas RCSB shows that it has 6 chains.
8sza-assembly1: The file does not seem to include ligand information.
The sequences in both cases appear to be cropped compared to the original sequences in RCSB.

Other entries have similar inconsistencies as well. Am I missing something here? How to use the chain/interface clustering files to evaluate AF3?

Thank you in advance for your help!

The text was updated successfully, but these errors were encountered:

amorehead · 2024-10-11T11:42:13Z

Hi, @zqcai19.

My first thoughts are that these differences may be the result of the PDB dataset's preprocessing scripts, as described in the AF3 paper. This preprocessing script will (in several cases) drop residues or chains that do not meet AF3's strict filtering criteria. For more details, I recommend reviewing the preprocessing scripts in scripts/, and let know if you have any other questions.

zqcai19 · 2024-10-11T12:37:26Z

@amorehead Thank you for the quick response! I still have some doubts regarding the evaluation process. Should I use the filtered and cropped sequences from filtered_all_chain_sequences.json for inference? I couldn’t find any description in the AF3 paper or its Supplementary Information about cropping the sequences for the evaluation (only the training process was mentioned). Did I miss something?

amorehead · 2024-10-11T14:03:58Z

Hi, @zqcai19. This filtering of the train, val, and test structures (particularly for the test structures) seems to be implicitly suggested by the AF3 paper. To standardize all three dataset splits, this is how I interpreted the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the chain/interface clustering files #307

Questions about the chain/interface clustering files #307

zqcai19 commented Oct 11, 2024

amorehead commented Oct 11, 2024

zqcai19 commented Oct 11, 2024

amorehead commented Oct 11, 2024

Questions about the chain/interface clustering files #307

Questions about the chain/interface clustering files #307

Comments

zqcai19 commented Oct 11, 2024

amorehead commented Oct 11, 2024

zqcai19 commented Oct 11, 2024

amorehead commented Oct 11, 2024