Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene_peak matrix in RP_model #93

Open
wxy-omg opened this issue Dec 21, 2020 · 3 comments
Open

gene_peak matrix in RP_model #93

wxy-omg opened this issue Dec 21, 2020 · 3 comments

Comments

@wxy-omg
Copy link

wxy-omg commented Dec 21, 2020

The value range of gene_peak matrix in RP_model should between 0 and 1. Why does "genes_peaks_score_csr" in scATAC_Genescore.py have a value greater than 1, with a maximum of 1600+.

@crazyhottommy
Copy link
Collaborator

Hi, how did you get the gene activity score? Did you use the enhanced model or simple model?
If you get the activity score from running the MAESTRO snakemake pipeline, the count matrix is binarized. The gene activity score is calculated by summing up the peaks (within 120kb) nearby a gene using an exponential decay model. details here https://github.com/liulab-dfci/MAESTRO/blob/master/example/Gene_activity_modelling/Gene_activity_modelling.md

if a peak is at the promoter, it will have a weight of 1, and then adding up other more distal peaks' regulatory potential, so it could be great than 1.

If you used the MAESTRO scatac-genescore command line tool, and feed it with the raw count matrix, it could have big gene activity scores because now you can have a big count in a peak. You can binarize that count matrix and rerun the same command.

I am doing a benchmark for those different ways to calculate gene activity score and will update here.

Thanks!

@wxy-omg
Copy link
Author

wxy-omg commented Jan 23, 2021

thanks, but i mean the gene_peak without multiplication by count matrix, the element in this gene_peak matrix represent the peak regulatory the gene potential. Should max(gene_peak) is equal to 1? But i got some values greater than 1.

@crazyhottommy
Copy link
Collaborator

how did you get the gene_peak matrix? you are right, for every peak, the max weight should be 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants