gdata.GenomeDataLoaderMap#
- class gdata.GenomeDataLoaderMap(self, /, loaders, *, batch_size=None, trim_target=None, window_size=None, seq_as_string=False)#
A dictionary-like class for loading multiple genomic datasets simultaneously.
This class allows you to load and iterate over multiple genomic datasets simultaneously, each identified by a unique tag. It ensures that all datasets share the same genomic segments, which is a requirement for consistent data processing.
Note
The tracks in different loaders may have different resolutions, but the segments must be the same.
GenomeDataLoaderMap
ensures that records correspond to the same genomic segments across all datasets, but there is no guarantee that the data values returned by different loaders will be aligned, e.,., they may have differentresolution
ortrim_target
.- Parameters:
loaders (dict[str, GenomeDataLoader]) – A dictionary mapping tags to
GenomeDataLoader
instances.batch_size (Optional[int]) – Optional parameter to specify the batch size for loading genomic sequences. If not provided, it defaults to the minimum batch size across all loaders.
trim_target (Optional[int]) – Optional parameter to specify the trim target for all loaders. If not provided, each loader will use its own trim target.
window_size (Optional[int]) – Optional parameter to specify the window size for all loaders.
seq_as_string (bool) – If True, sequences will be returned as strings instead of numpy integer arrays. This is useful for cases where you want to work with the sequences as text, such as for visualization or text-based analysis.
See also
Attributes
batch size of the dataloader.
Returns a dictionary mapping dataset tags to the number of tracks.
Returns the segments of the genome as a vector of strings.
Methods
difference
(regions)Creating a new genomic data loader in which the regions differ from the specified ones.
intersection
(regions)Creates a new genomic data loader based on specified regions.
keys
()Returns the keys of the dataloader.