Skip to content

BEAM-Labs/pi-PrimeNovo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$\pi$-PrimeNovo

This is the official repo for the paper: π-PrimeNovo: An Accurate and Efficient Non-Autoregressive Deep Learning Model for De Novo Peptide Sequencing

We will release the future model update (user-interface, new model weight, optimized modules etc) here, please leave a star and watching if you want to get notified and follow up. prime

Environment Setup

Note: we have developed our algorithm in CentOS Linux Version 7, other OS system need to check compability themselves.

Create a new conda environment first:

conda create --name PrimeNovo python=3.10

This will create an anaconda environment

Activate this environment by running:

conda activate PrimeNovo

then install dependencies:

pip install -r ./requirements.txt

installing gcc and g++:

conda install -c conda-forge gcc
conda install -c conda-forge cxx-compiler

then install ctcdecode, which is the package for ctc-beamsearch decoding

git clone --recursive https://github.com/WayenVan/ctcdecode.git
cd ctcdecode
pip install .
cd ..  #this is needed as ctcdecode can not be imported under the current directory
rm -rf ctcdecode

(if there are no errors, ignore the next line and proceed to CuPy install)

if you encountered issues with C++ (gxx and gcc) version errors in this step, install gcc with version specified as :

conda install -c conda-forge gcc_linux-64=9.3.0

lastly, install CuPy to use our CUDA-accelerated precise mass-control decoding:

Please install the following Cupy package in a GPU available env, If you are using a slurm server, this means you have to enter a interative session with sbatch to install Cupy, If you are using a machine with GPU already on it (checking by nvidia-smi), then there's no problem

Check your CUDA version using command nvidia-smi, the CUDA version will be on the top-right corner

cuda version command
v10.2 (x86_64 / aarch64) pip install cupy-cuda102
v11.0 (x86_64) pip install cupy-cuda110
v11.1 (x86_64) pip install cupy-cuda111
v11.2 ~ 11.8 (x86_64 / aarch64) pip install cupy-cuda11x
v12.x (x86_64 / aarch64) pip install cupy-cuda12x

Model Settings

n_beam: number of CTC-paths (beams) considered during inference. We recommend a value of 40.

mass_control_tol: This setting is only useful when PMC_enable is True. The tolerance of PMC-decoded mass from the measured mass by MS, when mass control algorithm (PMC) is used. For example, if this is set to 0.1, we will only obtain peptides that fall under the mass range [measured_mass-0.1, measured_mass+0.1]. Measured mass is calculated by : (pepMass - 1.007276) * charge - 18.01. pepMass and charge are given by input spectrum file (MGF).

PMC_enable: Weather use PMC decoding unit or not, either True or False.

n_peaks: Number of the most intense peaks to retain, any remaining peaks are discarded. We recommend a value of 800.

min_mz: Minimum peak m/z allowed, peaks with smaller m/z are discarded. We recommend a value of 1.

max_mz: Maximum peak m/z allowed, peaks with larger m/z are discarded. We recommend a value of 6500.

min_intensity: Min peak intensity allowed, less intense peaks are discarded. We recommend a value of 0.0.

Run Instructions

Note!!!!!!!!!!!!!!!!!!: All the following steps should be performed under the main directory: pi-PrimeNovo. Do not use cd PrimeNovo !!!!!!!!!!!!!!!!!!!

Step 1: Download Required Files

To evaluate the provided test MGF file (you can replace this MGF file with your own), download the following files:

  1. Model Checkpoint: model_massive.ckpt
  2. Test MGF File: Bacillus.10k.mgf

Note: If you are using a remote server, you can use the gdown package to easily download the content from Google Drive to your server disk.

Step 2: Choose the Mode

The --mode argument can be set to either:

  • eval: Use this mode when evaluating data with a labeled dataset.
  • denovo: Use this mode for de novo analysis on unlabeled data.

Important: Select eval only if your data is labeled.

Step 3: Run the Commands

Execute the following command in the terminal:

python -m PrimeNovo.PrimeNovo --mode=eval --peak_path=./bacillus.10k.mgf --model=./model_massive.ckpt

This automatically uses all GPUs available in the current machine.

Step 4: analyze the output

We include a sample running output ./output.txt. The performance for evaluation will be reported at the end of the output file.

If you are using denovo mode, you will get a denovo.tsv file under the current directory. The file has the following structure:

label prediction charge score
Title in MGF document Sequence in ProForma notation Charge, as a number Confidence score as number in range 0 and 1 using scientific notation

The example below contains two peptides predicted based on some given spectrum:

label	prediction	charge	score
MS_19321_2024_02_DDA	ATTALP	2	0.99
MS_19326_2024_02_DDA	TAM[+15.995]TR	2	0.87

Citation

@article{zhang2024pi,
  title={$\pi$-PrimeNovo: An Accurate and Efficient Non-Autoregressive Deep Learning Model for De Novo Peptide Sequencing},
  author={Zhang, Xiang and Ling, Tianze and Jin, Zhi and Xu, Sheng and Gao, Zhiqiang and Sun, Boyan and Qiu, Zijie and Dong, Nanqing and Wang, Guangshuai and Wang, Guibin and others},
  journal={bioRxiv},
  pages={2024--05},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages