Smart-seq2#

Smart-seq2 is a full-length single-cell RNA sequencing method that requires specific analytical considerations due to its unique characteristics. Here are the key features that distinguish Smart-seq2 from other scRNA-seq methods:

  • Demultiplexed format.

  • Higher sequencing depth

  • Can sequence entire transcript length.

  • Does not support UMIs

In this example, the Smart-seq2 datasets are downloaded from PRJNA213629

[18]:
import precellar
precellar.__version__
import scanpy as sc
import hdf5plugin # add this for Anndata process.

Due to Smart-seq2’s demultiplexed data structure, where each cell is represented by individual FASTQ files, we consolidated all Smart-seq2 datasets into unified FASTQ files (one file for single-end sequencing, or two files for paired-end sequencing). To maintain compatibility with standard processing pipelines, we implemented a pseudo-barcode in the seqspec templates. While these barcodes don’t physically exist in the original sequences, their addition facilitates streamlined data processing and analysis workflows.

[22]:
assay = precellar.Assay("seqspec_template/smartseq2.yaml")
print(assay)
rna
└── RNA(139-236)
    ├── illumina_p5(29)
    ├── i5(10)
    ├── s5(14)
    ├── ME1(19) [↓R1(1-250)✗]
    ├── cDNA(1-98)
    ├── ME2(19) [↑R2(1-250)✗]
    ├── s7(15)
    ├── i7(8)
    ├── umi(4-10)
    ├── pseudo_barcode(4)
    └── illumina_p7(24) [↑I1(1-10)✗]

[20]:
!ls /data2/litian/precellar_temp/*.fastq.zst | head
/data2/litian/precellar_temp/SRR944282.fastq.zst
/data2/litian/precellar_temp/SRR944283.fastq.zst
/data2/litian/precellar_temp/SRR944284.fastq.zst
/data2/litian/precellar_temp/SRR944285.fastq.zst
/data2/litian/precellar_temp/SRR944286.fastq.zst
/data2/litian/precellar_temp/SRR944287.fastq.zst
/data2/litian/precellar_temp/SRR944288.fastq.zst
/data2/litian/precellar_temp/SRR944289.fastq.zst
/data2/litian/precellar_temp/SRR944290.fastq.zst
/data2/litian/precellar_temp/SRR944291.fastq.zst
[ ]:
%%time
precellar.utils.merge_fastq_files(input1_files = "/data2/litian/precellar_temp/*.fastq.zst",
                                  output1_file = "/data2/litian/precellar_temp/merged_fastq.fastq.zst",
                                  barcode_file="/data2/litian/precellar_temp/barcode.fastq.zst",
                                  num_threads=32)
[23]:

assay.update_read('R1', fastq='/data2/litian/precellar_temp/merged_fastq.fastq.zst') assay.update_read('I1', fastq='/data2/litian/precellar_temp/barcode.fastq.zst')
[24]:
assay
[24]:
rna
└── RNA(139-236)
    ├── illumina_p5(29)
    ├── i5(10)
    ├── s5(14)
    ├── ME1(19) [↓R1(43)✓]
    ├── cDNA(1-98)
    ├── ME2(19) [↑R2(1-250)✗]
    ├── s7(15)
    ├── i7(8)
    ├── umi(4-10)
    ├── pseudo_barcode(4)
    └── illumina_p7(24) [↑I1(8)✓]
[25]:
star = precellar.aligners.STAR("/data/Public/STAR_reference/refdata-gex-GRCh38-2024-A/star/")
[4]:
precellar.align(
    assay,
    modality='rna',
    aligner=star,
    output='20250218_gene_matrix_smartseq.h5ad',
    output_type='gene_quantification',
    num_threads=32,
)
[2025-02-19T06:10:48Z INFO] Starting alignment process
[2025-02-19T06:10:48Z INFO] Using provided Assay object
[2025-02-19T06:10:48Z INFO] Using modality: RNA
[2025-02-19T06:10:48Z INFO] Initialized aligner: STAR
[2025-02-19T06:10:49Z INFO] Initializing FastqProcessor with 32 threads and chunk size 10000000
[2025-02-19T06:10:49Z INFO] Adding mitochondrial DNA references: ["chrM", "M"]
[2025-02-19T06:10:49Z INFO] Generating alignments
[2025-02-19T06:10:49Z INFO] Using STAR aligner
[2025-02-19T06:10:49Z INFO] Counting barcodes...
[2025-02-19T06:15:09Z INFO] Aligning 866471826 reads...
[2025-02-19T06:15:09Z INFO] Processing output type: gene_quantification
[2025-02-19T06:15:09Z INFO] Starting gene quantification
 70%|�██████   | 602352960/866471826 [44:13<19:23, 226971.17it/s]/s]
[4]:
{'frac_q30_bases_barcode': 1.0,
 'frac_duplicates': 0.0,
 'sequenced_read_pairs': 0.0,
 'frac_unmapped': 0.15283863482480964,
 'sequenced_reads': 866471826.0,
 'frac_q30_bases_read1': 0.9663986458638546,
 'frac_confidently_mapped': 0.643611707000846,
 'frac_mitochondrial': 0.10567590354961628,
 'frac_transcriptome': 0.32021040000901313,
 'frac_valid_barcode': 1.0}
100%|██████████| 866471826/866471826 [01:03:07<00:00, 228788.02it/s][2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:19:30Z WARN] Encountered alignment(s) without UMI information. Using generated UMIs.
[2025-02-19T07:23:38Z INFO] Completed gene quantification, writing to: "20250218_gene_matrix_smartseq.h5ad"
[2025-02-19T07:23:38Z INFO] Alignment process completed in 4370.28s
[ ]: