Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning about percentile filter with repeated values #150

Open
s-andrews opened this issue Jan 21, 2019 · 0 comments
Open

Add warning about percentile filter with repeated values #150

s-andrews opened this issue Jan 21, 2019 · 0 comments

Comments

@s-andrews
Copy link
Owner

In the value percentile filter, if you have a lot of probes with exactly the same value then the split point for the range being specified is somewhat arbitrary. We sort by value, but beyond that the ordering will relate to genomic position for tied values.

What this means in practice is that if you filter for 0-10% and 10-20% in a dataset where around 20% of values are exactly zero then you'll get probes from chr1 - chr9 ish in the first bin and the later chromosomes (including X and Y) in the second, which can lead to false conclusions when doing downstream analysis.

There isn't an obvious fix for this. We can add a note to the documentation on the off chance that anyone reads it, and maybe we can do a quick count to see how many probes are potentially mis-classified (comparing identical values at each end of the range) and put up a warning if this represents a significant fraction of the probes returned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant