You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the value percentile filter, if you have a lot of probes with exactly the same value then the split point for the range being specified is somewhat arbitrary. We sort by value, but beyond that the ordering will relate to genomic position for tied values.
What this means in practice is that if you filter for 0-10% and 10-20% in a dataset where around 20% of values are exactly zero then you'll get probes from chr1 - chr9 ish in the first bin and the later chromosomes (including X and Y) in the second, which can lead to false conclusions when doing downstream analysis.
There isn't an obvious fix for this. We can add a note to the documentation on the off chance that anyone reads it, and maybe we can do a quick count to see how many probes are potentially mis-classified (comparing identical values at each end of the range) and put up a warning if this represents a significant fraction of the probes returned?
The text was updated successfully, but these errors were encountered:
In the value percentile filter, if you have a lot of probes with exactly the same value then the split point for the range being specified is somewhat arbitrary. We sort by value, but beyond that the ordering will relate to genomic position for tied values.
What this means in practice is that if you filter for 0-10% and 10-20% in a dataset where around 20% of values are exactly zero then you'll get probes from chr1 - chr9 ish in the first bin and the later chromosomes (including X and Y) in the second, which can lead to false conclusions when doing downstream analysis.
There isn't an obvious fix for this. We can add a note to the documentation on the off chance that anyone reads it, and maybe we can do a quick count to see how many probes are potentially mis-classified (comparing identical values at each end of the range) and put up a warning if this represents a significant fraction of the probes returned?
The text was updated successfully, but these errors were encountered: