Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_Runs() returns nothing. And empty file when asked to save the plot #38

Open
Somatogenomics opened this issue Apr 25, 2023 · 7 comments
Labels

Comments

@Somatogenomics
Copy link

Somatogenomics commented Apr 25, 2023

I used the following:

plot_Runs(runs = slidingRuns)

I used the following to get the file into my current working directory but the file was empty.

plot_Runs(runs = slidingRuns, suppressInds = FALSE, savePlots = TRUE, outputName = "ROHom")

In addition, the following just printed result showed below.
plot_StackedRuns(slidingRuns, savePlots = FALSE, separatePlots = FALSE,
outputName = NULL)

[1] "Current population: mydataname"

@Somatogenomics Somatogenomics changed the title plot_RU´ plot_Runs() returns nothing. And empty file when asked to save the plot Apr 25, 2023
@bunop bunop added the question label Apr 26, 2023
@bunop
Copy link
Contributor

bunop commented Apr 26, 2023

Dear @Somatogenomics ,

thank you for your interest in detectRUNS. Regarding your problem, you need to be more informative to help us to determine where the problem is. Which version of detectRUNS you are using? Are you able to call plot_Runs and plot_StackedRuns with the sample data provided with this package? If yes, the problem could be in your data files. Could you give us more information?

@Somatogenomics
Copy link
Author

Somatogenomics commented Apr 26, 2023

Thank you for the response. Let me provide more context to the problem. I am using the ( Using detectRUNS 0.9.6.9000).

Yes, I just use the sample data to plot without any problem (plots shown).

My data has a non conventional chromosome (with char and numbers: CA0000123, some like this) and I followed one of the instructions here to convert the chromosome to numerical, which I have done. Also, the summaryList function doesn't run too, throwing up ( Error in names(x) <- value :
'names' attribute [2] must be the same length as the vector [1]).

My question now is, is there any need for me to convert the chromosome to numerical since I am using the GitHub version ( 0.9.6.9000) of detectRUNS? Also, if I want to have the plot and stacked plot to show all the runs across ALL the chromosomes, how would I do that?

Thank you for the anticipated feedback.

#######Update##########

I have read in the ped and map files directly with the non conventional chromosome and no problem was reported. However, I am unable to plot anything.

For the manhattanplot, the following erroe was encountered:

|===================================================================================| 100%
[1] "Calculation % SNP in ROH finish"
[1] "Manhattan plot: START"
[1] "Processing Groups: mysample"
[1] "Creating Manhattan plot for mysample"
Error in check_breaks_labels():
! breaks and labels must have the same length
Backtrace:

  1. detectRUNS::plot_manhattanRuns(...)
  2. ggplot2::scale_x_continuous(labels = as.character(chroms), breaks = bpMidVec)
  3. ggplot2::continuous_scale(...)
  4. ggplot2:::check_breaks_labels(breaks, labels)
    Error in check_breaks_labels(breaks, labels) :

For the plot and stacked plot:

Nothing shows for the ploRUNS()

[1] "Current population: mysample" (this was shown for the stacked plot)

For the summaryRuns():

Error in names(x) <- value :
'names' attribute [2] must be the same length as the vector [1]

I got an output table with the tableRuns() function.

Overall, I got no plot(s) from the analysis.

@bunop
Copy link
Contributor

bunop commented Apr 26, 2023

Dear @Somatogenomics ,

Thank you for your feedback. Regarding your problem, you are using a development version of this library which is not published on CRAN. If I remember correctly, I fix something to deal with X chromosomes but unfortunately this library doesn't support custom chromosome names yet, as described in #24 , not only when calculating RUNs but even when drawing something. My suggestion is to use numerical names from chromosomes by replacing them in the map file and then re-do all the stuff (run calculations + graphs). Don't try to modify files after RUNs calculation. If you can provide a minimal sample of your data (a subset of .map and .ped, with all the instruction to reproduce this error), I can try to take a closer look.

@Somatogenomics
Copy link
Author

Dear @bunop thank you for your suggestions. I have followed the first suggestion of converting the chromosome to numerical but still facing the same problem.

I have a subset of the files attached here[https://drive.google.com/drive/folders/1hXAARirGYzocYUirrO4BexbqRRlRZUAz?usp=share_link]

https://drive.google.com/drive/folders/1hXAARirGYzocYUirrO4BexbqRRlRZUAz?usp=share_link

Following are the instructions to reproduce the errors:

genotypeFilePath <- "./subset.ped"

#############################
mapFilePath <-"./subset.fix.map" (map with chr converted to numericals). Also, I used the one without conversion to numericals (subset.map)

sliding-window-based run detection

slidingRuns <- slidingRUNS.run(
genotypeFile = genotypeFilePath,
mapFile = mapFilePath,
windowSize = 15,
threshold = 0.1,
minSNP = 15,
ROHet = FALSE,
maxOppWindow = 1,
maxMissWindow = 1,
maxGap = 10^6,
minLengthBps = 100000,
minDensity = 1/10^3, # SNP/kbps
maxOppRun = NULL,
maxMissRun = NULL
)

consecutive SNP-based run detection

consecutiveRuns <- consecutiveRUNS.run(
genotypeFile =genotypeFilePath,
mapFile = mapFilePath,
minSNP = 15,
ROHet = FALSE,
maxGap = 10^6,
minLengthBps = 100000,
maxOppRun = 1,
maxMissRun = 1
)

Summary stats

summaryList <- summaryRuns(
runs = slidingRuns, mapFile = mapFilePath, genotypeFile = genotypeFilePath,
Class = 2, snpInRuns = TRUE)

violin plot

plot_ViolinRuns(slidingRuns, method = c("sum", "mean"), outputName = NULL,
plotTitle = NULL, savePlots = FALSE)

Manhattanruns

plot_manhattanRuns(slidingRuns, genotypeFilePath, mapFilePath, savePlots = FALSE,
outputName = NULL, plotTitle = NULL)

plotting results

plot_Runs(runs = slidingRuns)

plot_Runs(runs = slidingRuns, suppressInds = FALSE, savePlots = F, outputName = "ROHom")
plot_StackedRuns(slidingRuns, savePlots = FALSE, separatePlots = FALSE,
outputName = NULL)

plot snpruns

plot_SnpsInRuns(
runs = slidingRuns[slidingRuns$chrom== 2,], genotypeFile = genotypeFilePath,
mapFile = mapFilePath)

plot_SnpsInRuns(
runs = slidingRuns[slidingRuns$chrom==4,], genotypeFile = genotypeFilePath,
mapFile = mapFilePath)

table of runs

tableRuns(runs = slidingRuns, genotypeFile = genotypeFilePath, mapFile = mapFilePath,
threshold = 0.5)

@bunop
Copy link
Contributor

bunop commented Apr 26, 2023

Ty. I will take a look. Hope to come back soon

@bunop
Copy link
Contributor

bunop commented Apr 26, 2023

Dear @Somatogenomics ,

Unfortunately this sample is too small to act as a test case. My suggestion is take one (or two chromosomes) only for two or a few samples. You can do this using plink software using --chr and --keep option respectively. For example:

plink --allow-extra-chr --chr CAJHJT010000001.1 --keep samples_list.txt --file <plink prefix> --out subset

where

--allow-extra-chr: is required to deal with non numerical chromosomes
--chr: allow to extract a list of chromosome of interest
--keep: specify a Tab separated files with one column from IID and one column for FID (the first two columns of plink text file)
--file: the plink file prefix (without .ped or .map extensions)
--out: the prefix for binary output files

This is necessary to get a file where there are the same number of SNPs in all files for a subset of samples (the data you sent to me have 30 SNPs in mapfile but all the SNPs for one sample in .ped: I need to truncate all the SNPs after the 30th, and I can't find RUNs even relaxing the parameters).

There's another problem I see in some of your SNPs, for example this one:

CAJHJT010000001.1       .       0       3038    0       *

This SNP is */* which isn't managed properly by detectRUNs, which considers this as a OMOZYGOUS SNP: I think this could be a missing SNP, could you check your genotypes to understand why you find a * instead of a base call?

@Somatogenomics
Copy link
Author

Dear @Somatogenomics ,

Unfortunately this sample is too small to act as a test case. My suggestion is take one (or two chromosomes) only for two or a few samples. You can do this using plink software using --chr and --keep option respectively. For example:

plink --allow-extra-chr --chr CAJHJT010000001.1 --keep samples_list.txt --file <plink prefix> --out subset

where

--allow-extra-chr: is required to deal with non numerical chromosomes --chr: allow to extract a list of chromosome of interest --keep: specify a Tab separated files with one column from IID and one column for FID (the first two columns of plink text file) --file: the plink file prefix (without .ped or .map extensions) --out: the prefix for binary output files

This is necessary to get a file where there are the same number of SNPs in all files for a subset of samples (the data you sent to me have 30 SNPs in mapfile but all the SNPs for one sample in .ped: I need to truncate all the SNPs after the 30th, and I can't find RUNs even relaxing the parameters).

There's another problem I see in some of your SNPs, for example this one:

CAJHJT010000001.1       .       0       3038    0       *

This SNP is */* which isn't managed properly by detectRUNs, which considers this as a OMOZYGOUS SNP: I think this could be a missing SNP, could you check your genotypes to understand why you find a * instead of a base call?

Dear @bunop ,

Thank you so much for you time. I am actually using ONE sample for now. All what you have seen came from one sample.

The * is the missing genotype call. I will try and filter out the missing genotype calls ( -- geno ) and subset the sample by chromosome and get back to you afterwards.

Thank you once again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants