precellar.align#

precellar.align(assay, genome_index, *, modality, output, output_type='alignment', mito_dna=['chrM', 'M'], shift_left=4, shift_right=-5, aligner=None, compression=None, compression_level=None, temp_dir=None, num_threads=8, chunk_size=10000000)#

Align fastq reads to the reference genome and generate unique fragments.

Parameters:
  • assay (Assay | Path) – A Assay object or file path to the yaml sequencing specification file, see pachterlab/seqspec.

  • genom_index (Path) – File path to the genome index. The genome index can be created by the make_genome_index function.

  • modality (str) – The modality of the sequencing data, e.g., “rna” or “atac”.

  • output (Path) – File path to the output file. The type of the output file is determined by the output_type parameter (see below).

  • output_type (Literal["alignment", "fragment", "gene_quantification"]) – The type of the output file. If “alignment”, the output will be a BAM file containing the alignments. If “fragment”, the output will be a fragment file containing the unique fragments. If “gene_quantification”, the output will be a h5ad file containing the gene quantification.

  • mito_dna (list[str]) – List of mitochondrial DNA names.

  • shift_left (int) – The number of bases to shift the left end of the fragment. Available only when output_type='fragment'.

  • shift_right (int) – The number of bases to shift the right end of the fragment. Available only when output_type='fragment'.

  • aligner (str | None) – The aligner to use for the alignment. If None, the aligner will be inferred from the modality.

  • compression (str | None) – The compression algorithm to use for the output fragment file. If None, the compression algorithm will be inferred from the file extension.

  • compression_level (int | None) – The compression level to use for the output fragment file.

  • temp_dir (Path | None) – The temporary directory to use.

  • num_threads (int) – The number of threads to use.

  • chunk_size (int) – This parameter is used to control the number of bases processed in each chunk per thread. The total number of bases in each chunk is determined by: chunk_size * num_threads.

Returns:

A dictionary containing the QC metrics of the alignment and fragment generation.

Return type:

dict