A successful test of data downloading. #6

Kaihui-Cheng · 2024-09-20T11:02:06Z

Make sure you have Git LFS installed:

sudo apt-get install git-lfs 
# Initialize Git LFS
git lfs install

Navigate to your DATA_ROOT and clone the source:

GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/datasets/fudan-generative-vision/dynamicPDB.git dynamicPDB_raw

Download data with a specific protein_id, for example 1a62_A:

cd dynamicPDB_raw
git lfs pull --include="{protein_id}/*"

Merge the split-volume compression into one file and then unzip the .tar.gz file:

cat {protein_id}/{protein_id}.tar.gz.part* > {protein_id}/{protein_id}.tar.gz
cd ${Your Storage Root}
mkdir dynamicPDB  # ignore if directory exists
tar -xvzf dynamicPDB_raw/{protein_id}/{protein_id}.tar.gz -C dynamicPDB

Ok! Now we have the simulation data for protein_id.
Note: Sufficient storage space is required for the data. For 1a62_A, 33GB is needed for the unzipped files and 24GB for the zipped files.

The text was updated successfully, but these errors were encountered:

meatball1982 · 2024-09-24T01:57:02Z

Dear Kaihui-Cheng:
01:
There are 10 pdb ID in 1a62_A, ..., 1bq8_A.
If you are so kind to provide a list of all the PDB ID(12.6k filtered proteins) in all your dataset(only PDB ID). Then we( most readers of your paper) can choose the specific PDB to download.
02:
In README
"we have decided to provide the 100ns simulation data for all proteins for online download". Still, I see no instruction to download the 100ns of all protein. Could you help me about that.
Thank you so much and I am looking forward of your reply.
Best
M

zqcai19 · 2024-09-29T04:53:29Z

@meatball1982 Hi! Thank you for your valuable suggestions.

We are still working on uploading the complete dataset, as its size is significantly large. However, we can provide a list on ModelScope to record the currently available protein data. This list may make it easier for users to choose the specific PDBs they want to download.
The instruction described above by @Kaihui-Cheng is for downloading the 100ns simulation data, which we are actively uploading. If you would like to download all currently available protein data at once, you can use the command git lfs pull (without specifying --include="{protein_id}/*") in step 3.

Please let us know if you have any other questions or suggestions.

meatball1982 · 2024-10-10T01:09:27Z

@zqcai19

Thank you very much for your reply, and I truly appreciate your willingness to provide the original dynamic trajectories, as I know this can be very time-consuming.
I am currently working on a project related to RMSF, and if possible, could you please share all of your PDB IDs (not just the ones you have uploaded)? I would be even more grateful if you could also provide the initial conformations and the RMSF values corresponding to all the PDBs you calculated. This data should not be very large, and uploading and downloading it shouldn't take too much time.

Thank you again for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A successful test of data downloading. #6

A successful test of data downloading. #6

Kaihui-Cheng commented Sep 20, 2024

meatball1982 commented Sep 24, 2024

zqcai19 commented Sep 29, 2024

meatball1982 commented Oct 10, 2024

A successful test of data downloading. #6

A successful test of data downloading. #6

Comments

Kaihui-Cheng commented Sep 20, 2024

meatball1982 commented Sep 24, 2024

zqcai19 commented Sep 29, 2024

meatball1982 commented Oct 10, 2024