precellar.align#

precellar.align(assay, aligner, *, output, modality=None, output_type='alignment', mito_dna=['chrM', 'M'], shift_left=4, shift_right=-5, compute_snv=False, compression=None, compression_level=None, temp_dir=None, num_threads=8, chunk_size=10000000)#

Align fastq reads to the reference genome and generate unique fragments.

Parameters:

assay (Assay | Path | list[Assay | Path]) – A Assay object or file path to the yaml sequencing specification file, see pachterlab/seqspec. The assay can also be a list of Assay objects or file paths. In this case, the results will be concatenated into a single output file.
aligner (STAR | BWAMEM2) – The aligner to use for the alignment. Available aligners can be found at precellar.aligners submodule.
output (Path) – File path to the output file. The type of the output file is determined by the output_type parameter (see below).
modality (str | None) – The modality of the sequencing data, e.g., “rna” or “atac”.
output_type (Literal["alignment", "fragment", "gene_quantification"]) – The type of the output file. If “alignment”, the output will be a BAM file containing the alignments. If “fragment”, the output will be a fragment file containing the unique fragments. If “gene_quantification”, the output will be a h5ad file containing the gene quantification.
mito_dna (list[str]) – List of mitochondrial DNA names.
shift_left (int) – The number of bases to shift the left end of the fragment. For example, in ATAC-seq, this is usually set to 4 to account for the Tn5 transposase insertion bias. Available only when output_type='fragment'.
shift_right (int) – The number of bases to shift the right end of the fragment. For example, in ATAC-seq, this is usually set to -5 to account for the Tn5 transposase insertion bias. Available only when output_type='fragment'.
compute_snv (bool) – Whether to compute single nucleotide variants (SNVs) from the alignments. If True, the SNVs will be computed and added to the fragment file.
compression (str | None) – The compression algorithm to use for the output fragment file. If None, the compression algorithm will be inferred from the file extension.
compression_level (int | None) – The compression level to use for the output fragment file.
temp_dir (Path | None) – The temporary directory to use.
num_threads (int) – The number of threads to use.
chunk_size (int) – This parameter is used to control the number of bases processed in each chunk per thread. The total number of bases in each chunk is determined by: chunk_size * num_threads.

Returns:

A dictionary containing the QC metrics of the alignment and fragment generation.

Return type:

dict