How to use rcOutDir
option to process multiple samples and avoid out of memory errors?
#344
Replies: 2 comments 3 replies
-
Hi @bernardo-heberle, usually the processing memory hikes when preprocessing the bam files, so what we would recommend for very large samples, is to 1) process them individually with discovery mode only |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick and effective response @cying111! The solutions you suggested worked well and I am able to run all of the samples with no issues now. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am running 9 cDNA samples sequenced on the PromethION with Bambu. Each ".bam" file is ~30GB long after filtering the reads, so the total amount of data being processed by Bambu is ~270GB. I noticed that the memory requirements are getting pretty high, as the job will fail with 500GB of RAM memory, but will complete if I increase the RAM to 1000GB. Here is the command I am running:
se_novel <- bambu(reads = bam, annotations = bambuAnnotations, rcOutDir = "./bambu_processed_files/", genome = fa_file, lowMemory=TRUE, ncore=8, opt.discovery = list(min.sampleNumber = 5, min.readCount = 5))
The
bam
variable is a vector with the paths for the 9 ".bam" files.I was wondering if I am using the
rcOutDir
option correctly?It is my understanding that this option is supposed to help with runs utilizing multiple samples, but I am not sure if there is an intermediary step that I am missing. I ask this because in the long run I intend to use bambu to process several dozens, if not hundreds, of cDNA samples generated on the PromethION. However, with the increase in RAM requirements, memory will probably become a limiting factor for processing a larger number of samples.
Any help with this and/or tips on how to avoid out of memory errors with a large number of large samples will be much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions