cellranger-multi => incorrect feature_type assigned to samples #373

nick-youngblut · 2024-10-03T17:44:03Z

Description of the bug

sample.csv:

sample,fastq_1,fastq_2,feature_type
NPC_Astro_Diff_aBeta,NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R1_001.fastq.gz,NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R2_001.fastq.gz,gex

sample_barcodes.csv:

sample,multiplexed_sample_id,probe_barcode_ids,cmo_ids,description
NPC_Astro_Diff_aBeta,Control_uBeads_1,BC001,,
NPC_Astro_Diff_aBeta,Control_uBeads_2,BC002,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_1,BC003,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_2,BC004,,
NPC_Astro_Diff_aBeta,D30_DL4_uBeads_3,BC005,,
NPC_Astro_Diff_aBeta,No_Beads_1,BC006,,
NPC_Astro_Diff_aBeta,No_Beads_2,BC007,,
NPC_Astro_Diff_aBeta,Primary_Astro,BC008,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_1,BC009,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_2,BC010,,
NPC_Astro_Diff_aBeta,D30_DL4_plus_aBeta_3,BC011,,
NPC_Astro_Diff_aBeta,Day_67_CD44_plus_DL4_1,BC012,,
NPC_Astro_Diff_aBeta,NPC,BC013,,
NPC_Astro_Diff_aBeta,20240909_D21_NPC,BC014,,

The error:

Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).
Please refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.

Running the pipeline with:

workflow CELLRANGER_MULTI_ALIGN {
    take:
        ch_fasta
        ch_gtf
        ch_fastq
        cellranger_gex_index
        cellranger_vdj_index
        ch_multi_samplesheet

    main:
        ch_versions    = Channel.empty()

       ch_fastq
          .flatten()
          .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
          }
          .view()

Shows:

[[id:NPC_Astro_Diff_aBeta, expected_cells:null, seq_center:null, sample_type:null, feature_type:gex, single_end:false, options:[data-available:true, create-bam:true]], [/large_storage/konermann/public/scRNAfiles/scRNAfiles/NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R1_001.fastq.gz, /large_storage/konermann/public/scRNAfiles/scRNAfiles/NovaSeqX_20240927_LH00181_0041_B22TF5VLT3_bcl-convert_KL_NPC_Astro_Diff_20240916_scRNA_10X_Flex_NPC_Astro_Diff_aBeta_R2_001.fastq.gz]]
[[id:NPC_Astro_Diff_aBeta, feature_type:vdj, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:ab, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:beam, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:crispr, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]
[[id:NPC_Astro_Diff_aBeta, feature_type:cmo, options:[:]], /home/nickyoungblut/dev/nextflow/scrnaseq/assets/EMPTY]

...which then results in the ab error in the branch operation:

        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    if (params.fb_reference){
                        return [ meta, fastq ]
                    } else {
                        error ("Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).\nPlease refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.")
                    }
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }

Where are the extra feature types (e.g., vdj and ab) coming from?

I believe that I have my input csv files set up as in cellrangermulti_samplesheet.csv and cellranger_barcodes_samplesheet.csv.

Command used and terminal output

nextflow run main.nf \
  --aligner cellrangermulti \
  --skip_cellrangermulti_vdjref \
  --gex_frna_probe_set Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv \
  --cellranger_multi_barcodes sample_barcodes.csv \
  --fasta refdata-gex-GRCh38-2024-A/fasta/genome.fa \
  --gtf refdata-gex-GRCh38-2024-A/genes/genes.gtf \
  --input samples.csv \
  --outdir scrnaseq_output

Relevant files

No response

System information

nextflow version 24.04.4.5917
Ubuntu 22.04.3
nf-core/scrnaseq v2.7.1

The text was updated successfully, but these errors were encountered:

nick-youngblut · 2024-10-03T17:46:01Z

I'm guessing that the issue comes from the code block:

        if (!map_collection_clone.any{ it.feature_type == 'gex' })    { map_collection_clone.add( [id: sample_id, feature_type: 'gex'   , gex:    empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'vdj' })    { map_collection_clone.add( [id: sample_id, feature_type: 'vdj'   , vdj:    empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'ab' })     { map_collection_clone.add( [id: sample_id, feature_type: 'ab'    , ab:     empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'beam' })   { map_collection_clone.add( [id: sample_id, feature_type: 'beam'  , beam:   empty_file, options:[:] ] ) } // currently not implemented, the input samplesheet checking will not allow it.
            if (!map_collection_clone.any{ it.feature_type == 'crispr' }) { map_collection_clone.add( [id: sample_id, feature_type: 'crispr', crispr: empty_file, options:[:] ] ) }
            if (!map_collection_clone.any{ it.feature_type == 'cmo' })    { map_collection_clone.add( [id: sample_id, feature_type: 'cmo'   , cmo:    empty_file, options:[:] ] ) }

nick-youngblut · 2024-10-03T18:00:47Z

The following modification seems to fix the issue:

        ch_fastq
        .flatten()
        .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
        }
        .filter { meta, fastq ->
            // Exclude entries where fastq is 'EMPTY'
            if (fastq instanceof List) {
                return true  // Keep entries where fastq is a list of files
            } else {
                return !fastq.toString().endsWith('/EMPTY')
            }
        }
        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    if (params.fb_reference){
                        return [ meta, fastq ]
                    } else {
                        error ("Antibody reference was not specified. Please provide a reference file for feature barcoding (e.g. antibody measurements).\nPlease refer to https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-feature-ref-csv for more details.")
                    }
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }
        .set { ch_grouped_fastq }

I'm guessing that this issue was missed, since the test data cellrangermulti_samplesheet.csv contains the ab feature type.

nick-youngblut · 2024-10-03T23:46:51Z

A problem with my updated code is that CELLRANGER_MULTI never runs. The run trace:

task_id	hash	native_id	name	status	exit	submit	duration	realtime	%cpu	peak_rss	peak_vmem	rchar	wchar
2	a4/e5b1d1	35330	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:PARSE_CELLRANGERMULTI_SAMPLESHEET	COMPLETED	0	2024-10-03 11:15:25.868	4.5s	0ms	41.1%	9.1 MB	14.9 MB	932.9 KB	633 B
1	b8/327626	35328	NFCORE_SCRNASEQ:SCRNASEQ:GTF_GENE_FILTER (genome.fa)	COMPLETED	0	2024-10-03 11:15:25.807	14.5s	9s	82.0%	1.1 GB	1.1 GB	3.5 GB	927.5 MB
4	61/37c05e	35331	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:CELLRANGER_MKGTF (genome_genes.gtf)	COMPLETED	0	2024-10-03 11:15:40.396	59.9s	55.3s	99.0%	61.1 MB	247.3 MB	935.7 MB	927.5 MB
5	b1/0a59d6	35332	NFCORE_SCRNASEQ:SCRNASEQ:CELLRANGER_MULTI_ALIGN:CELLRANGER_MKREF (genome.fa)	COMPLETED	0	2024-10-03 11:16:40.459	11m 40s	11m 34s	452.3%	17.2 GB	24.4 GB	39.6 GB	28.2 GB
3	b8/33167b	35329	NFCORE_SCRNASEQ:SCRNASEQ:FASTQC_CHECK:FASTQC (NPC_Astro_Diff_aBeta)	COMPLETED	0	2024-10-03 11:15:25.844	2h 35m 35s	2h 35m 32s	140.1%	5.2 GB	63.7 GB	118.2 GB	4 MB
6	67/f16c82	35381	NFCORE_SCRNASEQ:SCRNASEQ:MULTIQC	COMPLETED	0	2024-10-03 13:51:01.396	19.9s	13.2s	57.6%	646.2 MB	9.8 GB	91.2 MB	24.3 MB

nick-youngblut · 2024-10-16T15:45:29Z

I ended up using the following:

   main:
        ch_versions    = Channel.empty()

        //
        // TODO: Include checkers for cellranger multi parameter combinations. For example, when VDJ data is given, require VDJ ref. If FFPE, require frna probe sets, etc.
        //

        // since we merged all data as a meta, now we have a channel per sample, which
        // every item is a meta map for each data-type
        // now we can split it back for passing as input to the module
        ch_fastq
        .flatten()
        .map{ meta ->
            def meta_clone = meta.clone()
            def data_dict  = meta_clone.find{ it.key == "${meta_clone.feature_type}" }
            fastqs = data_dict?.value
            meta_clone.remove( data_dict?.key )
            [ meta_clone, fastqs ]
        }
        .branch {
            meta, fastq ->
                gex: meta.feature_type == "gex"
                    return [ meta, fastq ]
                vdj: meta.feature_type == "vdj"
                    return [ meta, fastq ]
                ab: meta.feature_type == "ab"
                    return [ meta, fastq ] 
                beam: meta.feature_type == "beam"
                    return [ meta, fastq ]
                crispr: meta.feature_type == "crispr"
                    return [ meta, fastq ]
                cmo: meta.feature_type == "cmo"
                    return [ meta, fastq ]
        }
        .set { ch_grouped_fastq }

...with the following nextflow command:

nextflow run main.nf \
  -ansi-log false \
  -profile singularity \
  -process.executor slurm \
  -process.queue cpu_batch \
  -work-dir /scratch/$(id -gn)/$(whoami)/nextflow-work/scrnaseq \
  --aligner cellrangermulti \
  --skip_cellrangermulti_vdjref \
  --skip_emptydrops \
  --gex_frna_probe_set Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv \
  --cellranger_multi_barcodes sample_barcodes.csv \
  --cellranger_index refdata-gex-GRCh38-2024-A/ \
  --input samples.csv \
  --outdir scrnaseq_output

nick-youngblut added the bug Something isn't working label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cellranger-multi => incorrect feature_type assigned to samples #373

cellranger-multi => incorrect feature_type assigned to samples #373

nick-youngblut commented Oct 3, 2024 •

edited

Loading

nick-youngblut commented Oct 3, 2024

nick-youngblut commented Oct 3, 2024 •

edited

Loading

nick-youngblut commented Oct 3, 2024

nick-youngblut commented Oct 16, 2024

cellranger-multi => incorrect feature_type assigned to samples #373

cellranger-multi => incorrect feature_type assigned to samples #373

Comments

nick-youngblut commented Oct 3, 2024 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

nick-youngblut commented Oct 3, 2024

nick-youngblut commented Oct 3, 2024 • edited Loading

nick-youngblut commented Oct 3, 2024

nick-youngblut commented Oct 16, 2024

nick-youngblut commented Oct 3, 2024 •

edited

Loading

nick-youngblut commented Oct 3, 2024 •

edited

Loading