--- name: bio-workflows-longread-sv-pipeline description: End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads. tool_type: cli primary_tool: Sniffles workflow: true depends_on: - long-read-sequencing/long-read-alignment - long-read-sequencing/long-read-qc - long-read-sequencing/structural-variants qc_checkpoints: - after_qc: "Read N50 >10kb, quality score >Q10" - after_alignment: "Mapping rate >90%, coverage sufficient" - after_calling: "SV count reasonable, genotypes concordant" --- # Long-Read SV Pipeline Complete workflow for detecting structural variants from ONT or PacBio long-read data. ## Workflow Overview ``` Long reads (ONT/PacBio) | v [1. QC] ----------------> NanoPlot | v [2. Alignment] ---------> minimap2 | v [3. SV Calling] --------> Sniffles / cuteSV | v [4. Filtering] ---------> bcftools | v [5. Annotation] --------> AnnotSV (optional) | v Filtered SV VCF ``` ## Primary Path: minimap2 + Sniffles ### Step 1: Quality Control ```bash # ONT reads QC NanoPlot --fastq reads.fastq.gz \ --outdir nanoplot_output \ --threads 8 # Check key metrics # - Read N50 should be >10kb # - Mean quality >Q10 # - Total bases sufficient for coverage ``` ### Step 2: Alignment with minimap2 ```bash # ONT reads minimap2 -ax map-ont \ -t 16 \ --MD \ -Y \ reference.fa \ reads.fastq.gz | \ samtools sort -@ 4 -o aligned.bam samtools index aligned.bam # PacBio HiFi minimap2 -ax map-hifi \ -t 16 \ --MD \ -Y \ reference.fa \ reads.fastq.gz | \ samtools sort -@ 4 -o aligned.bam # PacBio CLR minimap2 -ax map-pb \ -t 16 \ --MD \ -Y \ reference.fa \ reads.fastq.gz | \ samtools sort -@ 4 -o aligned.bam ``` **QC Checkpoint:** Check alignment stats ```bash samtools flagstat aligned.bam samtools depth -a aligned.bam | awk '{sum+=$3} END {print "Average coverage:",sum/NR}' ``` - Mapping rate >90% - Average coverage >10x for SV calling (>20x preferred) ### Step 3: SV Calling with Sniffles ```bash # Sniffles2 (recommended) sniffles \ --input aligned.bam \ --vcf svs.vcf.gz \ --reference reference.fa \ --threads 8 \ --minsvlen 50 # With tandem repeat annotations (recommended) sniffles \ --input aligned.bam \ --vcf svs.vcf.gz \ --reference reference.fa \ --tandem-repeats tandem_repeats.bed \ --threads 8 ``` ### Alternative: cuteSV ```bash # cuteSV (faster, good for ONT) cuteSV \ aligned.bam \ reference.fa \ svs.vcf \ work_dir/ \ --threads 8 \ --min_size 50 \ --genotype bgzip svs.vcf tabix svs.vcf.gz ``` ### Step 4: Filtering ```bash # Filter by quality and size bcftools view -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf.gz -Oz -o svs.filtered.vcf.gz # Filter by SV type bcftools view -i 'SVTYPE="DEL" || SVTYPE="INS"' svs.filtered.vcf.gz -Oz -o del_ins.vcf.gz # Filter by genotype bcftools view -i 'GT="1/1" || GT="0/1"' svs.filtered.vcf.gz -Oz -o genotyped.vcf.gz # Stats bcftools stats svs.filtered.vcf.gz > sv_stats.txt ``` ### Step 5: Annotation (Optional) ```bash # AnnotSV for gene/clinical annotations AnnotSV -SVinputFile svs.filtered.vcf.gz \ -outputFile annotated_svs \ -genomeBuild GRCh38 ``` ## Multi-Sample SV Calling ```bash # Call SVs per sample for sample in sample1 sample2 sample3; do sniffles --input ${sample}.bam \ --snf ${sample}.snf \ --reference reference.fa done # Merge and joint genotype sniffles --input sample1.snf sample2.snf sample3.snf \ --vcf merged_svs.vcf.gz \ --reference reference.fa ``` ## Parameter Recommendations | Tool | Parameter | ONT | PacBio HiFi | |------|-----------|-----|-------------| | minimap2 | -ax | map-ont | map-hifi | | Sniffles | --minsvlen | 50 | 50 | | Sniffles | --minsupport | auto | auto | | cuteSV | --min_size | 50 | 50 | | cuteSV | --min_support | 3 | 3 | ## SV Types Detected | Type | Abbreviation | Description | |------|--------------|-------------| | Deletion | DEL | Sequence removed | | Insertion | INS | Sequence added | | Duplication | DUP | Sequence copied | | Inversion | INV | Sequence reversed | | Translocation | BND | Breakend (interchromosomal) | ## Troubleshooting | Issue | Likely Cause | Solution | |-------|--------------|----------| | Few SVs | Low coverage | Increase sequencing depth | | Many false positives | Low quality reads | Filter by QUAL, increase min support | | Missing known SV | Repeat region | Use tandem repeat annotations | | High breakend count | Mapping artifacts | Check alignment quality | ## Complete Pipeline Script ```bash #!/bin/bash set -e THREADS=16 READS="reads.fastq.gz" REF="reference.fa" SAMPLE="sample1" OUTDIR="sv_results" mkdir -p ${OUTDIR}/{qc,aligned,sv} # Step 1: QC echo "=== QC ===" NanoPlot --fastq ${READS} --outdir ${OUTDIR}/qc -t ${THREADS} # Step 2: Alignment echo "=== Alignment ===" minimap2 -ax map-ont -t ${THREADS} --MD -Y ${REF} ${READS} | \ samtools sort -@ 4 -o ${OUTDIR}/aligned/${SAMPLE}.bam samtools index ${OUTDIR}/aligned/${SAMPLE}.bam echo "Alignment stats:" samtools flagstat ${OUTDIR}/aligned/${SAMPLE}.bam # Step 3: SV calling echo "=== SV Calling ===" sniffles --input ${OUTDIR}/aligned/${SAMPLE}.bam \ --vcf ${OUTDIR}/sv/${SAMPLE}.vcf.gz \ --reference ${REF} \ --threads ${THREADS} # Step 4: Filter echo "=== Filtering ===" bcftools view -i 'QUAL>=20' ${OUTDIR}/sv/${SAMPLE}.vcf.gz \ -Oz -o ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz bcftools index ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz # Stats bcftools stats ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz > ${OUTDIR}/sv/stats.txt echo "=== Complete ===" echo "SVs: $(bcftools view -H ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz | wc -l)" ``` ## Related Skills - long-read-sequencing/long-read-alignment - minimap2 details - long-read-sequencing/structural-variants - Sniffles, cuteSV options - long-read-sequencing/long-read-qc - NanoPlot metrics - variant-calling/structural-variant-calling - Short-read SV methods