Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cellranger-multi => incorrect feature_type assigned to samples #373

Open
nick-youngblut opened this issue Oct 3, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@nick-youngblut
Copy link
Contributor

nick-youngblut commented Oct 3, 2024

Description of the bug

sample.csv:

sample,fastq_1,fastq_2,feature_type
NPC_Astro_Diff_aBeta,NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R1_001.fastq.gz,NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R2_001.fastq.gz,gex

sample_barcodes.csv:

sample,multiplexed_sample_id,probe_barcode_ids,cmo_ids,description
NPC_Astro_Diff_aBeta,Control_uBeads_1,BC001,,
NPC_Astro_Diff_aBeta,Control_uBeads_2,BC002,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_1,BC003,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_2,BC004,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_3,BC005,,
NPC_Astro_Diff_aBeta,No_Beads_1,BC006,,
NPC_Astro_Diff_aBeta,No_Beads_2,BC007,,
NPC_Astro_Diff_aBeta,Primary_Astro,BC008,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_1,BC009,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_2,BC010,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_3,BC011,,
NPC_Astro_Diff_aBeta,Day_67_CD44_plus_DL4_1,BC012,,
NPC_Astro_Diff_aBeta,NPC,BC013,,
NPC_Astro_Diff_aBeta,20240909_D21_NPC,BC014,,

The error:

Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).
Please refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.

Running the pipeline with:

workflow CELLRANGER_MULTI_ALIGN {
    take:
        ch_fasta
        ch_gtf
        ch_fastq
        cellranger_gex_index
        cellranger_vdj_index
        ch_multi_samplesheet

    main:
        ch_versions    = Channel.empty()

       ch_fastq
          .flatten()
          .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
          }
          .view()

Shows:

[[id:NPC_Astro_Diff_aBeta, expected_cells:null, seq_center:null, sample_type:null, feature_type:gex, single_end:false, options:[data-available:true, create-bam:true]], [/large_storage/konermann/public/scRNAfiles/scRNAfiles/NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R1_001.fastq.gz, /large_storage/konermann/public/scRNAfiles/scRNAfiles/NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R2_001.fastq.gz]]
[[id:NPC_Astro_Diff_aBeta, feature_type:vdj, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:ab, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:beam, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:crispr, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:cmo, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]

...which then results in the ab error in the branch operation:

        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    if (params.fb_reference){
                        return [ meta, fastq ]
                    } else {
                        error ("Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).\nPlease refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.")
                    }
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }

Where are the extra feature types (e.g., vdj and ab) coming from?

I believe that I have my input csv files set up as in cellrangermulti_samplesheet.csv and cellranger_barcodes_samplesheet.csv.

Command used and terminal output

nextflow run main.nf \
  --aligner cellrangermulti \
  --skip_cellrangermulti_vdjref \
  --gex_frna_probe_set Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv \
  --cellranger_multi_barcodes sample_barcodes.csv \
  --fasta refdata-gex-GRCh38-2024-A/fasta/genome.fa \
  --gtf refdata-gex-GRCh38-2024-A/genes/genes.gtf \
  --input samples.csv \
  --outdir scrnaseq_output

Relevant files

No response

System information

  • nextflow version 24.04.4.5917
  • Ubuntu 22.04.3
  • nf-core/scrnaseq v2.7.1
@nick-youngblut nick-youngblut added the bug Something isn't working label Oct 3, 2024
@nick-youngblut
Copy link
Contributor Author

I'm guessing that the issue comes from the code block:

        if (!map_collection_clone.any{ it.feature_type == 'gex' })    { map_collection_clone.add( [id: sample_id, feature_type: 'gex'   , gex:    empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'vdj' })    { map_collection_clone.add( [id: sample_id, feature_type: 'vdj'   , vdj:    empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'ab' })     { map_collection_clone.add( [id: sample_id, feature_type: 'ab'    , ab:     empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'beam' })   { map_collection_clone.add( [id: sample_id, feature_type: 'beam'  , beam:   empty_file, options:[:] ] ) } // currently not implemented, the input samplesheet checking will not allow it.
            if (!map_collection_clone.any{ it.feature_type == 'crispr' }) { map_collection_clone.add( [id: sample_id, feature_type: 'crispr', crispr: empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'cmo' })    { map_collection_clone.add( [id: sample_id, feature_type: 'cmo'   , cmo:    empty_file, options:[:] ] ) }

@nick-youngblut
Copy link
Contributor Author

nick-youngblut commented Oct 3, 2024

The following modification seems to fix the issue:

        ch_fastq
        .flatten()
        .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
        }
        .filter { meta, fastq ->
            // Exclude entries where fastq is 'EMPTY'
            if (fastq instanceof List) {
                return true  // Keep entries where fastq is a list of files
            } else {
                return !fastq.toString().endsWith('/EMPTY')
            }
        }
        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    if (params.fb_reference){
                        return [ meta, fastq ]
                    } else {
                        error ("Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).\nPlease refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.")
                    }
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }
        .set { ch_grouped_fastq }

I'm guessing that this issue was missed, since the test data cellrangermulti_samplesheet.csv contains the ab feature type.

@nick-youngblut
Copy link
Contributor Author

A problem with my updated code is that CELLRANGER_MULTI never runs. The run trace:

task_id	hash	native_id	name	status	exit	submit	duration	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
2	a4/e5b1d1	35330	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:PARSE_CELLRANGERMULTI_SAMPLESHEET	COMPLETED	0	2024-10-03 11:15:25.868	4.5s	0ms	41.1%	9.1 MB	14.9 MB	932.9 KB	633 B
1	b8/327626	35328	NFCORE_SCRNASEQ:SCRNASEQ:GTF_GENE_FILTER (genome.fa)	COMPLETED	0	2024-10-03 11:15:25.807	14.5s	9s	82.0%	1.1 GB	1.1 GB	3.5 GB	927.5 MB
4	61/37c05e	35331	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:CELLRANGER_MKGTF (genome_genes.gtf)	COMPLETED	0	2024-10-03 11:15:40.396	59.9s	55.3s	99.0%	61.1 MB	247.3 MB	935.7 MB	927.5 MB
5	b1/0a59d6	35332	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:CELLRANGER_MKREF (genome.fa)	COMPLETED	0	2024-10-03 11:16:40.459	11m 40s	11m 34s	452.3%	17.2 GB	24.4 GB	39.6 GB	28.2 GB
3	b8/33167b	35329	NFCORE_SCRNASEQ:SCRNASEQ:FASTQC_CHECK:FASTQC (NPC_Astro_Diff_aBeta)	COMPLETED	0	2024-10-03 11:15:25.844	2h 35m 35s	2h 35m 32s	140.1%	5.2 GB	63.7 GB	118.2 GB	4 MB
6	67/f16c82	35381	NFCORE_SCRNASEQ:SCRNASEQ:MULTIQC	COMPLETED	0	2024-10-03 13:51:01.396	19.9s	13.2s	57.6%	646.2 MB	9.8 GB	91.2 MB	24.3 MB

@nick-youngblut
Copy link
Contributor Author

I ended up using the following:

   main:
        ch_versions    = Channel.empty()

        //
        // TODO: Include checkers for cellranger multi parameter combinations. For example, when VDJ data is given, require VDJ ref. If FFPE, require frna probe sets, etc.
        //

        // since we merged all data as a meta, now we have a channel per sample, which
        // every item is a meta map for each data-type
        // now we can split it back for passing as input to the module
        ch_fastq
        .flatten()
        .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
        }
        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    return [ meta, fastq ] 
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }
        .set { ch_grouped_fastq }

...with the following nextflow command:

nextflow run main.nf \
  -ansi-log false \
  -profile singularity \
  -process.executor slurm \
  -process.queue cpu_batch \
  -work-dir /scratch/$(id -gn)/$(whoami)/nextflow-work/scrnaseq \
  --aligner cellrangermulti \
  --skip_cellrangermulti_vdjref \
  --skip_emptydrops \
  --gex_frna_probe_set Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv \
  --cellranger_multi_barcodes sample_barcodes.csv \
  --cellranger_index refdata-gex-GRCh38-2024-A/ \
  --input samples.csv \
  --outdir scrnaseq_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant