QuantNado — Multi-Modal Dataset Exploration¶

In [1]:

Copied!

import warnings

warnings.filterwarnings("ignore")
import warnings

warnings.filterwarnings("ignore")

This notebook walks through a complete analysis workflow using QuantNado: a toolkit for efficient Zarr-backed storage and analysis of multi-modal genomic signal.

Workflow covered:

Setup & Data — Install dependencies and download test data
Create Dataset — Build unified multi-modal Zarr store from BAM/bedGraph/VCF files
Load & Inspect — Open the store and view available modalities
Quality Control — PCA, sample correlations, and data completeness overview
Coverage Analysis — Reduce signal over regions, feature counts, visualization
Methylation Analysis — CpG-level methylation, distribution, metaplots, tornado plots
Variant Analysis — Genotypes, allele frequencies, region-level filtering
Multi-Modal Integration — Locus browser and cross-modality comparisons

Setup¶

Dependencies
Install the optional example dependencies before running this notebook:
pip install "quantnado[example]"
This adds ipykernel and matplotlib on top of the core install.

In [2]:

Copied!





import re
import tarfile
import urllib.request
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np

import quantnado as qn
import re
import tarfile
import urllib.request
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np

import quantnado as qn

Data¶

The test dataset (BAM files, chr21 only) needs to be downloaded before running this notebook. Run the cell below to fetch and extract it automatically, or manually:

# From the example/ directory
curl -L https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/seqnado_output.tar.gz \
  | tar -xz -C multiomics_run/

The reference files (reference/chr21.gtf, reference/promoters_1024bp.bed) are already included in the repository.

In [3]:

Copied!





DATA_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/seqnado_output.tar.gz"
REF_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/reference.tar.gz"
RUN_DIR = Path("multiomics_run")
RUN_DIR.mkdir(parents=True, exist_ok=True)

for url in [DATA_URL, REF_URL]:
    filename = url.split("/")[-1]
    basename = filename.split(".tar.gz")[0]
    archive_path = RUN_DIR / filename
    outdir = RUN_DIR / basename
    if not outdir.exists():
        print(f"Downloading {filename}...")
        urllib.request.urlretrieve(url, archive_path)
        print(f"Extracting {filename}...")
        with tarfile.open(archive_path) as tf:
            tf.extractall(RUN_DIR)
        archive_path.unlink()
        print(f"Done with {filename}.")
    else:
        print(f"{filename} already present — skipping download.")
DATA_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/seqnado_output.tar.gz"
REF_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/reference.tar.gz"
RUN_DIR = Path("multiomics_run")
RUN_DIR.mkdir(parents=True, exist_ok=True)

for url in [DATA_URL, REF_URL]:
    filename = url.split("/")[-1]
    basename = filename.split(".tar.gz")[0]
    archive_path = RUN_DIR / filename
    outdir = RUN_DIR / basename
    if not outdir.exists():
        print(f"Downloading {filename}...")
        urllib.request.urlretrieve(url, archive_path)
        print(f"Extracting {filename}...")
        with tarfile.open(archive_path) as tf:
            tf.extractall(RUN_DIR)
        archive_path.unlink()
        print(f"Done with {filename}.")
    else:
        print(f"{filename} already present — skipping download.")

Downloading seqnado_output.tar.gz...
Extracting seqnado_output.tar.gz...
Done with seqnado_output.tar.gz.
Downloading reference.tar.gz...
Extracting reference.tar.gz...
Done with reference.tar.gz.

In [4]:

Copied!





# Paths relative to this notebook (example/)
OUT_DIR = RUN_DIR / "seqnado_output"
REF_DIR = RUN_DIR / "reference"

MS_PATH = RUN_DIR / "dataset"  # MultiomicsStore directory
BED_FILE = str(REF_DIR / "promoters_1024bp.bed")
GTF_FILE = str(REF_DIR / "chr21.gtf")
FIG_DIR = RUN_DIR / "figures"

FIG_DIR.mkdir(parents=True, exist_ok=True)
# Paths relative to this notebook (example/)
OUT_DIR = RUN_DIR / "seqnado_output"
REF_DIR = RUN_DIR / "reference"

MS_PATH = RUN_DIR / "dataset"  # MultiomicsStore directory
BED_FILE = str(REF_DIR / "promoters_1024bp.bed")
GTF_FILE = str(REF_DIR / "chr21.gtf")
FIG_DIR = RUN_DIR / "figures"

FIG_DIR.mkdir(parents=True, exist_ok=True)

MultiomicsStore.from_files() creates one directory (dataset/) containing a sub-Zarr for each supplied modality:

Sub-store	Input	What is stored
`coverage.zarr`	BAM files	per-base read depth (dense, sample × position)
`methylation.zarr`	MethylDackel bedGraph	CpG methylation % and counts (sparse)
`variants.zarr`	VCF.gz files	genotype, allele depths, quality (sparse)

Run this cell once; it is skipped automatically if the store already exists.

In [5]:

Copied!





bam_files = sorted(RUN_DIR.rglob("*.bam"))
vcf_files = sorted(RUN_DIR.rglob("*.vcf.gz"))
vcf_sample_names = [re.split("_|\\.", f.stem)[0] for f in vcf_files]
meth_files = sorted(RUN_DIR.rglob("*.bedGraph"))
meth_sample_names = [re.split("_|\\.", f.stem)[0] for f in meth_files]
rna_samples  = [f for f in bam_files if "rna" in f.stem]
rna_sample_names = [re.split("_|\\.", f.stem)[0] for f in rna_samples]

if MS_PATH.exists():
    print(f"Skipping dataset construction since {MS_PATH} already exists.")
else:
    qn.create_dataset(
        store_dir=MS_PATH,
        bam_files=bam_files,
        methylation_files=meth_files,
        vcf_files=vcf_files,
        methylation_sample_names=meth_sample_names,
        vcf_sample_names=vcf_sample_names,
        filter_chromosomes=True,
        overwrite=True,
        resume=False,
        chunk_len=1048576,
        construction_compression="default",
        local_staging=False,
        staging_dir=OUT_DIR / "staging",
        log_file="dataset.log",
        max_workers=8,
        test=True,
        # dUTP / TruSeq Stranded mRNA — use "F" for ligation / ISF libraries
        stranded={s: "R" for s in rna_sample_names},
    )
bam_files = sorted(RUN_DIR.rglob("*.bam"))
vcf_files = sorted(RUN_DIR.rglob("*.vcf.gz"))
vcf_sample_names = [re.split("_|\\.", f.stem)[0] for f in vcf_files]
meth_files = sorted(RUN_DIR.rglob("*.bedGraph"))
meth_sample_names = [re.split("_|\\.", f.stem)[0] for f in meth_files]
rna_samples  = [f for f in bam_files if "rna" in f.stem]
rna_sample_names = [re.split("_|\\.", f.stem)[0] for f in rna_samples]

if MS_PATH.exists():
    print(f"Skipping dataset construction since {MS_PATH} already exists.")
else:
    qn.create_dataset(
        store_dir=MS_PATH,
        bam_files=bam_files,
        methylation_files=meth_files,
        vcf_files=vcf_files,
        methylation_sample_names=meth_sample_names,
        vcf_sample_names=vcf_sample_names,
        filter_chromosomes=True,
        overwrite=True,
        resume=False,
        chunk_len=1048576,
        construction_compression="default",
        local_staging=False,
        staging_dir=OUT_DIR / "staging",
        log_file="dataset.log",
        max_workers=8,
        test=True,
        # dUTP / TruSeq Stranded mRNA — use "F" for ligation / ISF libraries
        stranded={s: "R" for s in rna_sample_names},
    )

2026-03-09 17:02:50.539 | INFO     | quantnado.api:create_dataset:449 - Creating QuantNado dataset at 'multiomics_run/dataset'
  atac                      [coverage]
  chip-rx_MLL               [coverage]
  meth-rep1                 [coverage, methylation]
  meth-rep2                 [coverage, methylation]
  rna-spikein-control-rep1  [stranded_coverage]
  rna-spikein-control-rep2  [stranded_coverage]
  rna-spikein-treated-rep1  [stranded_coverage]
  rna-spikein-treated-rep2  [stranded_coverage]
  snp                       [coverage, variants]
2026-03-09 17:02:50.540 | INFO     | quantnado.dataset.store_multiomics:from_files:191 - Building coverage store from 9 BAM file(s)...
2026-03-09 17:02:50 [INFO] Extracting chromsizes from multiomics_run/seqnado_output/atac/aligned/atac.bam
2026-03-09 17:02:50 [INFO] Building dataset under staging path multiomics_run/seqnado_output/staging/.coverage.staging-e885ecc398a74b3ca4a3b79f5ee8df5f.zarr before publishing to multiomics_run/dataset/coverage.zarr
2026-03-09 17:02:50 [INFO] Initialized Zarr store at multiomics_run/seqnado_output/staging/.coverage.staging-e885ecc398a74b3ca4a3b79f5ee8df5f.zarr
2026-03-09 17:02:50 [INFO] Processing sample 1/9: atac
2026-03-09 17:02:50 [INFO] Processing sample 2/9: chip-rx_MLL
2026-03-09 17:02:50 [INFO] Processing sample 4/9: meth-rep2
2026-03-09 17:02:50 [INFO] Processing sample 5/9: rna-spikein-control-rep1
2026-03-09 17:02:50 [INFO] Processing sample 3/9: meth-rep1
2026-03-09 17:02:50 [INFO] Processing sample 7/9: rna-spikein-treated-rep1
2026-03-09 17:02:50 [INFO] Processing sample 6/9: rna-spikein-control-rep2
2026-03-09 17:02:50 [INFO] Processing sample 8/9: rna-spikein-treated-rep2
2026-03-09 17:02:52 [INFO] Processing sample 9/9: snp
2026-03-09 17:03:04 [INFO] Published staged dataset to multiomics_run/dataset/coverage.zarr
2026-03-09 17:03:04 [INFO] Resuming existing store at multiomics_run/dataset/coverage.zarr
2026-03-09 17:03:04 [INFO] Building methylation store from 2 bedGraph file(s)...
2026-03-09 17:03:04 [INFO] Initialized MethylStore at multiomics_run/dataset/methylation.zarr
2026-03-09 17:03:04 [INFO] Reading bedGraph files...
2026-03-09 17:03:04 [INFO]   meth-rep1_hg38_CpG_inverted.bedGraph
2026-03-09 17:03:04 [INFO]   meth-rep2_hg38_CpG_inverted.bedGraph
2026-03-09 17:03:04 [INFO] Building union positions and writing store (1 chroms)...
2026-03-09 17:03:04 [INFO]   chr21: 19751 CpG sites
2026-03-09 17:03:04 [INFO] Building variants store from 1 VCF file(s)...
2026-03-09 17:03:04 [INFO] Initialized VariantStore at multiomics_run/dataset/variants.zarr
2026-03-09 17:03:04 [INFO] Reading VCF files...
2026-03-09 17:03:04 [INFO]   snp.vcf.gz
2026-03-09 17:03:04 [INFO] Building union positions and writing store (1 chroms)...
2026-03-09 17:03:04 [INFO]   chr21: 148 variants
2026-03-09 17:03:04 [INFO] Resuming existing store at multiomics_run/dataset/coverage.zarr
2026-03-09 17:03:04 [INFO] Resuming existing MethylStore at multiomics_run/dataset/methylation.zarr
2026-03-09 17:03:04 [INFO] Resuming existing VariantStore at multiomics_run/dataset/variants.zarr

Load and Inspect the Dataset¶

In [6]:

Copied!

ds = qn.open(MS_PATH)
print(ds)
ds = qn.open(MS_PATH)
print(ds)

2026-03-09 17:03:04 [INFO] Resuming existing store at multiomics_run/dataset/coverage.zarr
2026-03-09 17:03:04 [INFO] Resuming existing MethylStore at multiomics_run/dataset/methylation.zarr
2026-03-09 17:03:04 [INFO] Resuming existing VariantStore at multiomics_run/dataset/variants.zarr

MultiomicsStore at 'multiomics_run/dataset'
  modalities : ['coverage', 'methylation', 'variants']
  coverage    : 9 samples, 1 chrom(s)
  methylation : 2 samples, 1 chrom(s)
  variants    : 1 samples, 1 chrom(s)

In [7]:

Copied!





print("Available modalities:", ds.modalities)
print("Chromosomes        :", ds.chromosomes)
print()

metadata = ds.get_metadata()
metadata
print("Available modalities:", ds.modalities)
print("Chromosomes        :", ds.chromosomes)
print()

metadata = ds.get_metadata()
metadata

Available modalities: ['coverage', 'methylation', 'variants']
Chromosomes        : ['chr21']

Out[7]:

	sample_hash	completed	sparsity	modalities
sample_id
atac	96ae824c5c35c004df1b9db3db07283e	True	99.904594	coverage
chip-rx_MLL	ea6b78070b8d28b7a0239eca45893bd4	True	95.253860	coverage
meth-rep1	f5ba0583afa25559ed30c9e8e2709f07	True	94.216675	coverage, methylation
meth-rep2	bcc7e3bf38ae81f6947662683190e221	True	93.745697	coverage, methylation
rna-spikein-control-rep1	2414b9280438e98f0c7cf50d5324bc2f	True	97.987144	stranded_coverage
rna-spikein-control-rep2	9b73f71b7438cdc6081be20663ee4929	True	98.167877	stranded_coverage
rna-spikein-treated-rep1	53da554c5e301439abe48b13b8838b67	True	96.408737	stranded_coverage
rna-spikein-treated-rep2	c2ac62c58d5c053b6f9400e46e08441f	True	96.751144	stranded_coverage
snp	e9eae880d15aaf4929da8db87d4dcb83	True	99.945015	coverage, variants

Quality Control & Overview¶

Before diving into modality-specific analyses, let's assess data quality and get an overview of sample relationships.

PCA on Promoter Signal¶

Principal component analysis reveals whether samples cluster by expected biological groupings (e.g. assay type, treatment) and flags outliers. This is a standard QC step.

In [8]:

Copied!

promoter_signal = ds.reduce(intervals_path=BED_FILE, reduction="mean")
promoter_signal
promoter_signal = ds.reduce(intervals_path=BED_FILE, reduction="mean")
promoter_signal

2026-03-09 17:03:04 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in BED/GTF file and 1 in dataset

Out[8]:

<xarray.Dataset> Size: 356kB
Dimensions:       (ranges: 1456, sample: 9)
Coordinates:
  * ranges        (ranges) int64 12kB 0 1 2 3 4 5 ... 1451 1452 1453 1454 1455
    range_index   (ranges) int64 12kB 0 1 2 3 4 5 ... 1451 1452 1453 1454 1455
    start         (ranges) int64 12kB 5011852 5011852 ... 46635162 46635162
    end           (ranges) int64 12kB 5012876 5012876 ... 46636186 46636186
    range_length  (ranges) int64 12kB 1024 1024 1024 1024 ... 1024 1024 1024
    contig        (ranges) object 12kB 'chr21' 'chr21' ... 'chr21' 'chr21'
    strand        (ranges) object 12kB '+' '+' '+' '+' '+' ... '+' '+' '+' '+'
    name          (ranges) object 12kB 'LOC124905051:XM_047441065.1' ... 'PRM...
  * sample        (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
Data variables:
    sum           (ranges, sample) float32 52kB dask.array<chunksize=(1456, 1), meta=np.ndarray>
    count         (ranges, sample) int64 105kB dask.array<chunksize=(1456, 1), meta=np.ndarray>
    mean          (ranges, sample) float64 105kB dask.array<chunksize=(1456, 1), meta=np.ndarray>

In [9]:

Copied!





pca_obj, pca_result = qn.pca(
    promoter_signal["mean"],
    n_components=8,
    nan_handling_strategy="drop",
    chromosome="chr21",
)
pca_obj, pca_result = qn.pca(
    promoter_signal["mean"],
    n_components=8,
    nan_handling_strategy="drop",
    chromosome="chr21",
)

INFO:dask.sizeof:Selected chromosome chr21 for PCA with shape (9, 1456)
INFO:dask.sizeof:Dropped NaN-containing features

In [10]:

Copied!

scree = qn.plot_pca_scree(pca_obj, filepath=FIG_DIR / "pca_scree.png")
scree = qn.plot_pca_scree(pca_obj, filepath=FIG_DIR / "pca_scree.png")

No description has been provided for this image

In [11]:

Copied!





metadata["assay"] = metadata.index.str.split("-").str[0]

scatter = qn.plot_pca_scatter(
    pca_obj,
    pca_result,
    xaxis_pc=1,
    yaxis_pc=2,
    metadata_df=metadata,
    colour_by="assay",
    shape_by="assay",
    height=5,
    filepath=FIG_DIR / "pca_scatter_pc1_pc2.png",
)
metadata["assay"] = metadata.index.str.split("-").str[0]

scatter = qn.plot_pca_scatter(
    pca_obj,
    pca_result,
    xaxis_pc=1,
    yaxis_pc=2,
    metadata_df=metadata,
    colour_by="assay",
    shape_by="assay",
    height=5,
    filepath=FIG_DIR / "pca_scatter_pc1_pc2.png",
)

Coverage Analysis¶

Reduce Signal over Genomic Regions¶

Aggregate per-base signal across genomic intervals (e.g. promoters, peaks) to produce a compact (regions × samples) matrix. Useful for downstream analysis such as PCA and clustering.

Reduction methods: mean, sum, max, min, median.

Clustered Heatmap of Promoter Signal¶

qn.heatmap() takes the output of reduce() and plots a seaborn clustermap with hierarchical clustering on both rows (features) and columns (samples).

In [12]:

Copied!





g = qn.heatmap(
    promoter_signal,
    variable="mean",
    log_transform=True,
    cmap="mako",
    figsize=(5, 6),
    title="Promoter signal — clustered heatmap",
    filepath=FIG_DIR / "promoter_clustermap.png",
)
plt.show()
g = qn.heatmap(
    promoter_signal,
    variable="mean",
    log_transform=True,
    cmap="mako",
    figsize=(5, 6),
    title="Promoter signal — clustered heatmap",
    filepath=FIG_DIR / "promoter_clustermap.png",
)
plt.show()

Sample–Sample Correlation¶

qn.correlate() computes Pearson (or Spearman) correlations between all sample pairs and plots the result as a clustered heatmap. Returns the correlation pd.DataFrame alongside the figure.

In [13]:

Copied!





corr_df, g = qn.correlate(
    promoter_signal,
    variable="mean",
    log_transform=True,
    annotate=True,
    cmap="RdBu_r",
    figsize=(6, 6),
    title="Sample–sample correlation (promoter signal)",
    filepath=FIG_DIR / "sample_correlation.png",
)
plt.show()
corr_df, g = qn.correlate(
    promoter_signal,
    variable="mean",
    log_transform=True,
    annotate=True,
    cmap="RdBu_r",
    figsize=(6, 6),
    title="Sample–sample correlation (promoter signal)",
    filepath=FIG_DIR / "sample_correlation.png",
)
plt.show()

Feature Count Matrix (DESeq2-compatible)¶

Sum signal across gene bodies to produce an integer count matrix ready for differential expression analysis in DESeq2 or edgeR.

In [14]:

Copied!

ds.coverage
ds.coverage

Out[14]:

<quantnado.dataset.store_bam.BamStore at 0x1720b3a80>

In [15]:

Copied!





counts, features = ds.count_features(
    samples=rna_sample_names,
    gtf_file=GTF_FILE,
    feature_type="transcript",
    feature_id_col=["gene_id", "transcript_id"],
    integerize=True,
    strand=2,
)

print(f"Count matrix: {counts.shape[0]} genes × {counts.shape[1]} samples")
print(f"Feature metadata: {features.shape[0]} features × {features.shape[1]} columns")
counts, features = ds.count_features(
    samples=rna_sample_names,
    gtf_file=GTF_FILE,
    feature_type="transcript",
    feature_id_col=["gene_id", "transcript_id"],
    integerize=True,
    strand=2,
)

print(f"Count matrix: {counts.shape[0]} genes × {counts.shape[1]} samples")
print(f"Feature metadata: {features.shape[0]} features × {features.shape[1]} columns")

2026-03-09 17:03:15 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in input ranges and 1 in dataset

Count matrix: 986 genes × 4 samples
Feature metadata: 986 features × 7 columns

In [16]:

Copied!

counts.loc["RUNX1"]
counts.loc["RUNX1"]

Out[16]:

	rna-spikein-control-rep1	rna-spikein-control-rep2	rna-spikein-treated-rep1	rna-spikein-treated-rep2
transcript_id
NM_001122607.2	98	147	408	256
NM_001001890.3	177	189	614	513
NM_001754.5	1460	507	1992	3734

In [17]:

Copied!

features.loc["RUNX1"]
features.loc["RUNX1"]

Out[17]:

	gene_id	transcript_id	contig	strand	start	end	range_length
transcript_id
NM_001122607.2	RUNX1	NM_001122607.2	chr21	-	36193575	36260996	67421
NM_001001890.3	RUNX1	NM_001001890.3	chr21	-	36160097	36260996	100899
NM_001754.5	RUNX1	NM_001754.5	chr21	-	36160097	36421599	261502

In [18]:

Copied!

counts_path = OUT_DIR / "counts_matrix.csv"
counts.to_csv(counts_path)
print(f"Counts saved to {counts_path}")
counts_path = OUT_DIR / "counts_matrix.csv"
counts.to_csv(counts_path)
print(f"Counts saved to {counts_path}")

Counts saved to multiomics_run/seqnado_output/counts_matrix.csv

Extract Raw Signal over Regions¶

qn.extract() returns the full per-position signal for each genomic interval as a (intervals × positions × samples) DataArray — useful for metagene plots, coverage tracks, and deep-learning feature engineering.

Setting fixed_width centres all intervals to the same length; bin_size downsamples positions into coarser bins.

In [19]:

Copied!





binned = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)

binned
binned = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)

binned

2026-03-09 17:03:18 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in GTF and 3 in dataset

Out[19]:

<xarray.DataArray 'getitem-14d3d3e9ea3e0346a36143f35918eb1e' (interval: 986,
                                                              bin: 40, sample: 9)> Size: 1MB
dask.array<getitem, shape=(986, 40, 9), dtype=float32, chunksize=(518, 40, 1), chunktype=numpy.ndarray>
Coordinates:
  * interval  (interval) int64 8kB 0 1 2 3 4 5 6 ... 979 980 981 982 983 984 985
    start     (interval) int64 8kB 46707966 46683849 ... 9733506 9689972
    end       (interval) int64 8kB 46717269 46707810 ... 9734109 9690726
    contig    (interval) <U5 20kB 'chr21' 'chr21' 'chr21' ... 'chr21' 'chr21'
    strand    (interval) object 8kB '+' '-' '-' '-' '-' ... '+' '+' '+' '+' '-'
  * bin       (bin) int64 320B -1000 -950 -900 -850 -800 ... 750 800 850 900 950
  * sample    (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
Attributes:
    upstream:    1000
    downstream:  1000
    anchor:      start
    bin_size:    50
    bin_agg:     mean

In [20]:

Copied!





metadata["assay"] = metadata.index.to_series().apply(lambda s: s.split("-")[0])
metadata["condition"] = [
    "control" if "control" in s else "treated" if "treated" in s else "" for s in metadata.index
]
metadata["group"] = metadata["assay"].str.upper()
metadata.loc[metadata["assay"] == "rna", "group"] += " " + metadata["condition"]
metadata.loc[metadata["assay"] == "chip", "group"] += " MLL"
groups_dict = metadata.groupby("group").groups

ax = ds.metaplot(
    binned,
    modality="coverage",
    groups=groups_dict,
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    title="Metagene profile",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot.png",
)
metadata["assay"] = metadata.index.to_series().apply(lambda s: s.split("-")[0])
metadata["condition"] = [
    "control" if "control" in s else "treated" if "treated" in s else "" for s in metadata.index
]
metadata["group"] = metadata["assay"].str.upper()
metadata.loc[metadata["assay"] == "rna", "group"] += " " + metadata["condition"]
metadata.loc[metadata["assay"] == "chip", "group"] += " MLL"
groups_dict = metadata.groupby("group").groups

ax = ds.metaplot(
    binned,
    modality="coverage",
    groups=groups_dict,
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    title="Metagene profile",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot.png",
)

Tornado plot (signal heatmap over TSSs)¶

Each row is one genomic interval (TSS ± window), colour encodes signal intensity.
Intervals are sorted by mean signal of the first panel so all panels share the same row order.

In [21]:

Copied!





axes = ds.tornadoplot(
    binned,
    modality="coverage",
    samples=["atac", "chip-rx_MLL", "rna-spikein-control-rep1", "rna-spikein-treated-rep1"],
    sample_names=["ATAC", "ChIP MLL", "RNA control", "RNA treated"],
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=None,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="Coverage tornado — TSS ± window",
    figsize=(8, 4),
    filepath=FIG_DIR / "tornado_tss.png",
)
axes = ds.tornadoplot(
    binned,
    modality="coverage",
    samples=["atac", "chip-rx_MLL", "rna-spikein-control-rep1", "rna-spikein-treated-rep1"],
    sample_names=["ATAC", "ChIP MLL", "RNA control", "RNA treated"],
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=None,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="Coverage tornado — TSS ± window",
    figsize=(8, 4),
    filepath=FIG_DIR / "tornado_tss.png",
)

CPM-normalised tornado plot¶

Normalise the per-position signal to CPM (Counts Per Million mapped reads) before plotting so that tracks are comparable across samples with different sequencing depths.

Note: CPM is the correct normalisation for signal tracks from extract().
RPKM / TPM apply to summarised feature counts (output of count_features()), not per-position or binned signal.

In [22]:

Copied!





library_sizes = qn.get_library_sizes(ds)

# Extract binned signal (reuse params from the raw tornado above)
binned_cpm_input = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)

# Normalise to cpm
cpm_binned = ds.normalise(binned_cpm_input, method="cpm", library_sizes=library_sizes)

axes = ds.tornadoplot(
    cpm_binned,
    modality="coverage",
    samples=["atac", "chip-rx_MLL", "rna-spikein-control-rep1", "rna-spikein-treated-rep1"],
    sample_names=["ATAC", "ChIP MLL", "RNA control", "RNA treated"],
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=None,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="CPM-normalised coverage tornado — TSS ± 1kb",
    figsize=(8, 4),
    filepath=FIG_DIR / "tornado_tss_cpm.png",
)
library_sizes = qn.get_library_sizes(ds)

# Extract binned signal (reuse params from the raw tornado above)
binned_cpm_input = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)

# Normalise to cpm
cpm_binned = ds.normalise(binned_cpm_input, method="cpm", library_sizes=library_sizes)

axes = ds.tornadoplot(
    cpm_binned,
    modality="coverage",
    samples=["atac", "chip-rx_MLL", "rna-spikein-control-rep1", "rna-spikein-treated-rep1"],
    sample_names=["ATAC", "ChIP MLL", "RNA control", "RNA treated"],
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=None,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="CPM-normalised coverage tornado — TSS ± 1kb",
    figsize=(8, 4),
    filepath=FIG_DIR / "tornado_tss_cpm.png",
)

2026-03-09 17:03:19 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in GTF and 3 in dataset
2026-03-09 17:03:20 [INFO] Normalised xr.DataArray to CPM.

Stranded RNA-seq visualization¶

For samples built with stranded={"sample": "R", ...} (dUTP/TruSeq Stranded) or "F" (ligation/ISF), _fwd and _rev arrays are stored separately using read1/read2 + alignment orientation to assign each read to the correct transcriptional strand. Extract forward- and reverse-strand signal independently with strand="+" / strand="-" and pass both to metaplot / tornadoplot via data_rev.

metaplot: sense profile above zero; antisense profile mirrored below
tornadoplot: heatmap of fwd − rev per interval (divergent colormap)

In [23]:

Copied!





rna_groups = {k: v for k, v in groups_dict.items() if "RNA" in k}

binned_rna_fwd = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
    samples=rna_sample_names,
    strand="+",
)

binned_rna_rev = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
    samples=rna_sample_names,
    strand="-",
)

ax = ds.metaplot(
    binned_rna_fwd,
    data_rev=binned_rna_rev,
    modality="coverage",
    groups=rna_groups,
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Stranded coverage",
    title="RNA-seq stranded metagene — sense (+) vs antisense (−)",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot_rna_stranded.png",
)
rna_groups = {k: v for k, v in groups_dict.items() if "RNA" in k}

binned_rna_fwd = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
    samples=rna_sample_names,
    strand="+",
)

binned_rna_rev = ds.extract(
    modality="coverage",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
    samples=rna_sample_names,
    strand="-",
)

ax = ds.metaplot(
    binned_rna_fwd,
    data_rev=binned_rna_rev,
    modality="coverage",
    groups=rna_groups,
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Stranded coverage",
    title="RNA-seq stranded metagene — sense (+) vs antisense (−)",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot_rna_stranded.png",
)

2026-03-09 17:03:21 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in GTF and 3 in dataset
2026-03-09 17:03:21 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in GTF and 3 in dataset

In [24]:

Copied!





axes = ds.tornadoplot(
    binned_rna_fwd,
    data_rev=binned_rna_rev,
    modality="coverage",
    samples=rna_sample_names,
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="RNA-seq stranded tornado — sense − antisense",
    figsize=(6, 4),
    filepath=FIG_DIR / "tornado_rna_stranded.png",
)
axes = ds.tornadoplot(
    binned_rna_fwd,
    data_rev=binned_rna_rev,
    modality="coverage",
    samples=rna_sample_names,
    flip_minus_strand=True,
    sort_by="mean",
    scale_each=True,
    cmap="mako",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="RNA-seq stranded tornado — sense − antisense",
    figsize=(6, 4),
    filepath=FIG_DIR / "tornado_rna_stranded.png",
)

Extract Signal for a Single Genomic Region¶

Use extract_region to pull the raw per-base coverage for any genomic locus across all samples.

In [25]:

Copied!

REGION="chr21:36193575-36260996"

region = ds.extract_region(REGION)
print(f"Region shape: {region.shape}  (samples × positions)")
REGION="chr21:36193575-36260996"

region = ds.extract_region(REGION)
print(f"Region shape: {region.shape}  (samples × positions)")

2026-03-09 17:03:23 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)

Region shape: (9, 67421)  (samples × positions)

In [26]:

Copied!





plt = ds.locus_plot(
    locus=REGION,
    sample_names=ds.samples,
    modality=[
        "stranded_coverage" if s in rna_sample_names else "coverage"
        for s in ds.samples
    ],
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(12, 6),
    filepath=FIG_DIR / "locus_plot.png",
)
plt = ds.locus_plot(
    locus=REGION,
    sample_names=ds.samples,
    modality=[
        "stranded_coverage" if s in rna_sample_names else "coverage"
        for s in ds.samples
    ],
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(12, 6),
    filepath=FIG_DIR / "locus_plot.png",
)

2026-03-09 17:03:23 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)
2026-03-09 17:03:23 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)
2026-03-09 17:03:23 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)

Stranded RNA-seq locus plot¶

For stranded samples, pass modality="stranded_coverage" — forward and reverse coverage are fetched automatically and plotted mirrored around zero (forward above, reverse below).

In [27]:

Copied!





plt = ds.locus_plot(
    locus=REGION,
    sample_names=rna_sample_names,
    modality=["stranded_coverage"] * len(rna_sample_names),
    palette="tab10",
    title=f"RNA stranded locus — {REGION}",
    figsize=(10, 4),
    filepath=FIG_DIR / "locus_plot_rna_stranded.png",
)
plt = ds.locus_plot(
    locus=REGION,
    sample_names=rna_sample_names,
    modality=["stranded_coverage"] * len(rna_sample_names),
    palette="tab10",
    title=f"RNA stranded locus — {REGION}",
    figsize=(10, 4),
    filepath=FIG_DIR / "locus_plot_rna_stranded.png",
)

2026-03-09 17:03:25 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)
2026-03-09 17:03:25 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)

Methylation Analysis¶

ms.methylation is a MethylStore — a sparse Zarr store where each chromosome has:

positions — the union of CpG start coordinates (0-based) across all samples
methylation_pct — methylation percentage per site per sample (NaN = not covered)
n_methylated / n_unmethylated — read counts

In [28]:

Copied!





meth = ds.methylation
print(f"Samples : {meth.sample_names}")
for chrom in meth.chromosomes:
    print(f"  {chrom}: {len(meth.get_positions(chrom)):,} CpG sites")

meth_xr = meth.to_xarray(variable="methylation_pct")
print()
print(meth_xr["chr21"])
meth = ds.methylation
print(f"Samples : {meth.sample_names}")
for chrom in meth.chromosomes:
    print(f"  {chrom}: {len(meth.get_positions(chrom)):,} CpG sites")

meth_xr = meth.to_xarray(variable="methylation_pct")
print()
print(meth_xr["chr21"])

Samples : ['meth-rep1', 'meth-rep2']
  chr21: 19,751 CpG sites

<xarray.DataArray 'array-b15e15809889fe3f377a3543844cf5ce' (sample: 2,
                                                            position: 19751)> Size: 158kB
dask.array<array, shape=(2, 19751), dtype=float32, chunksize=(1, 19751), chunktype=numpy.ndarray>
Coordinates:
  * sample     (sample) <U9 72B 'meth-rep1' 'meth-rep2'
    completed  (sample) bool 2B True True
  * position   (position) int64 158kB 5067013 5067171 ... 46698163 46698204
Attributes:
    variable:    methylation_pct
    chromosome:  chr21

Methylation distribution¶

Extract methylation over a specific region¶

In [29]:

Copied!





stats, features = ds.methylation.count_features(
    gtf_file=GTF_FILE,
    feature_type="transcript",
    feature_id_col=["gene_id", "transcript_id"],
    integerize=True,
)

print(f"Available stats: {list(stats.keys())}")
print(f"Features: {features.shape}")

stats.keys()
stats, features = ds.methylation.count_features(
    gtf_file=GTF_FILE,
    feature_type="transcript",
    feature_id_col=["gene_id", "transcript_id"],
    integerize=True,
)

print(f"Available stats: {list(stats.keys())}")
print(f"Features: {features.shape}")

stats.keys()

Available stats: ['n_methylated', 'n_unmethylated', 'n_cpg_covered', 'methylation_ratio', 'methylation_pct']
Features: (1087, 8)

Out[29]:

dict_keys(['n_methylated', 'n_unmethylated', 'n_cpg_covered', 'methylation_ratio', 'methylation_pct'])

Methylation Feature Counts¶

Similar to coverage feature counting, you can aggregate methylation statistics over genomic features.

In [30]:

Copied!

region_meth = meth.extract_region(REGION, variable="methylation_pct")
print(f"Shape: {region_meth.shape}  (samples × CpG sites in RUNX1)")
print(f"CpG positions covered: {(~np.isnan(region_meth.values)).sum(axis=1)}")
region_meth = meth.extract_region(REGION, variable="methylation_pct")
print(f"Shape: {region_meth.shape}  (samples × CpG sites in RUNX1)")
print(f"CpG positions covered: {(~np.isnan(region_meth.values)).sum(axis=1)}")

Shape: (2, 51)  (samples × CpG sites in RUNX1)
CpG positions covered: [13 38]

Methylation metaplot over promoters¶

For each 1024 bp promoter window on chr21, CpG sites are assigned to one of 32 bins relative to the TSS (position 0). Minus-strand promoters are strand-corrected so that the TSS is always at the centre. The profile shows mean methylation % per bin across all promoters.

Methylation Signal Visualization¶

Extract methylation % over genomic intervals and visualize as metaplots and tornado plots, similar to the coverage visualizations above.

In [31]:

Copied!





binned_meth = ds.extract(
    modality="methylation",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)
binned_meth = ds.extract(
    modality="methylation",
    feature_type="transcript",
    gtf_path=GTF_FILE,
    upstream=1000,
    downstream=1000,
    anchor="start",
    bin_size=50,
)

2026-03-09 17:03:26 [INFO] Chromosome compatibility check: 1 shared chromosomes out of 1 in intervals and 1 in dataset

In [32]:

Copied!





axes = ds.tornadoplot(
    binned_meth,
    modality="methylation",
    flip_minus_strand=True,
    sort_by="mean_r",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="Methylation tornado — TSS ± 1kb",
    figsize=(4, 4),
    filepath=FIG_DIR / "tornado_tss_methylation.png",
)
axes = ds.tornadoplot(
    binned_meth,
    modality="methylation",
    flip_minus_strand=True,
    sort_by="mean_r",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="Promoters",
    title="Methylation tornado — TSS ± 1kb",
    figsize=(4, 4),
    filepath=FIG_DIR / "tornado_tss_methylation.png",
)

In [33]:

Copied!





ax = ds.metaplot(
    binned_meth,
    modality="methylation",
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="CpG Methylation (%)",
    title="Methylation metaplot — TSS ± 1kb",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot_methylation.png",
)
ax = ds.metaplot(
    binned_meth,
    modality="methylation",
    flip_minus_strand=True,
    error_stat="sem",
    palette="tab10",
    reference_point=0,
    reference_label="TSS",
    xlabel="Distance from TSS (bp)",
    ylabel="CpG Methylation (%)",
    title="Methylation metaplot — TSS ± 1kb",
    figsize=(6, 4),
    filepath=FIG_DIR / "metaplot_methylation.png",
)

In [34]:

Copied!





plt = ds.locus_plot(
    locus=REGION,
    sample_names=["meth-rep1", "meth-rep2"],
    modality=["methylation"] * 2,
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(8, 4),
    filepath=FIG_DIR / "locus_plot_methylation.png",
)
plt = ds.locus_plot(
    locus=REGION,
    sample_names=["meth-rep1", "meth-rep2"],
    modality=["methylation"] * 2,
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(8, 4),
    filepath=FIG_DIR / "locus_plot_methylation.png",
)

Variant Analysis¶

msstore_variants is a VariantStore — a sparse Zarr store where each chromosome has:

positions — 1-based variant positions (union across all samples)
genotype — integer encoded: -1 missing, 0 hom-ref, 1 het, 2 hom-alt
allele_depth_ref / allele_depth_alt — read depths per allele
qual — per-sample variant quality score

In [35]:

Copied!





var = ds.variants
print(f"Samples : {var.sample_names}")
for chrom in var.chromosomes:
    print(f"  {chrom}: {len(var.get_positions(chrom)):,} variant sites")

gt_xr = var.to_xarray(variable="genotype")
print()
print(gt_xr["chr21"])
var = ds.variants
print(f"Samples : {var.sample_names}")
for chrom in var.chromosomes:
    print(f"  {chrom}: {len(var.get_positions(chrom)):,} variant sites")

gt_xr = var.to_xarray(variable="genotype")
print()
print(gt_xr["chr21"])

Samples : ['snp']
  chr21: 148 variant sites

<xarray.DataArray 'array-6a9234c7c1ce7dcac3dfa9cbb9895f17' (sample: 1,
                                                            position: 148)> Size: 148B
dask.array<array, shape=(1, 148), dtype=int8, chunksize=(1, 148), chunktype=numpy.ndarray>
Coordinates:
  * sample     (sample) <U3 12B 'snp'
    completed  (sample) bool 1B True
  * position   (position) int64 1kB 5230802 5230815 ... 13918697 13950515
Attributes:
    variable:    genotype
    chromosome:  chr21

Genotype summary¶

In [36]:

Copied!





gt_data = gt_xr["chr21"].compute()

labels = {-1: "missing", 0: "hom-ref", 1: "het", 2: "hom-alt"}
print("Genotype counts per sample:")
for sample in var.sample_names:
    gt = gt_data.sel(sample=sample).values
    counts = {labels[k]: int((gt == k).sum()) for k in [-1, 0, 1, 2]}
    print(f"  {sample}: {counts}")
gt_data = gt_xr["chr21"].compute()

labels = {-1: "missing", 0: "hom-ref", 1: "het", 2: "hom-alt"}
print("Genotype counts per sample:")
for sample in var.sample_names:
    gt = gt_data.sel(sample=sample).values
    counts = {labels[k]: int((gt == k).sum()) for k in [-1, 0, 1, 2]}
    print(f"  {sample}: {counts}")

Genotype counts per sample:
  snp: {'missing': 0, 'hom-ref': 0, 'het': 125, 'hom-alt': 23}

Variants in a specific region¶

In [39]:

Copied!





gt_region = var.extract_region(REGION, variable="genotype")
adr_region = var.extract_region(REGION, variable="allele_depth_ref")
ada_region = var.extract_region(REGION, variable="allele_depth_alt")

print(f"Plotting {gt_region.shape[1]} variants in {REGION}")

plt = ds.locus_plot(
    locus=REGION,
    genotype=gt_region,
    allele_depth_ref=adr_region,
    allele_depth_alt=ada_region,
    sample_names=["snp"],
    modality=["variant"],
    title=f"Variant allele frequency — {REGION}",
    figsize=(10, 2),
    filepath=FIG_DIR / "locus_plot_variants.png",
)

gt_region = var.extract_region(REGION, variable="genotype")
adr_region = var.extract_region(REGION, variable="allele_depth_ref")
ada_region = var.extract_region(REGION, variable="allele_depth_alt")

print(f"Plotting {gt_region.shape[1]} variants in {REGION}")

plt = ds.locus_plot(
    locus=REGION,
    genotype=gt_region,
    allele_depth_ref=adr_region,
    allele_depth_alt=ada_region,
    sample_names=["snp"],
    modality=["variant"],
    title=f"Variant allele frequency — {REGION}",
    figsize=(10, 2),
    filepath=FIG_DIR / "locus_plot_variants.png",
)

Plotting 0 variants in chr21:36193575-36260996

View all three modalities together for a single genomic region — the same layout as a genome browser, built directly from the MultiomicsStore in a few lines.

Panels (top → bottom, all sharing the same x-axis):

Coverage tracks for selected assay types (ATAC, ChIP, methylation, SNP)
Methylation % at individual CpG sites (one dot per site, per replicate)
Variant allele frequency coloured by genotype (blue = het, red = hom-alt)

In [38]:

Copied!





gt = ds.variants.extract_region(REGION, variable="genotype").compute()
adr = ds.variants.extract_region(REGION, variable="allele_depth_ref").compute()
ada = ds.variants.extract_region(REGION, variable="allele_depth_alt").compute()

plt = ds.locus_plot(
    locus=REGION,
    genotype=gt,
    allele_depth_ref=adr,
    allele_depth_alt=ada,
    sample_names=ds.samples,
    modality=[
        "methylation" if "meth" in s
        else "variant" if "snp" in s
        else "stranded_coverage" if s in rna_sample_names
        else "coverage"
        for s in ds.samples
    ],
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(12, 6),
    filepath=FIG_DIR / "locus_plot.png",
)
gt = ds.variants.extract_region(REGION, variable="genotype").compute()
adr = ds.variants.extract_region(REGION, variable="allele_depth_ref").compute()
ada = ds.variants.extract_region(REGION, variable="allele_depth_alt").compute()

plt = ds.locus_plot(
    locus=REGION,
    genotype=gt,
    allele_depth_ref=adr,
    allele_depth_alt=ada,
    sample_names=ds.samples,
    modality=[
        "methylation" if "meth" in s
        else "variant" if "snp" in s
        else "stranded_coverage" if s in rna_sample_names
        else "coverage"
        for s in ds.samples
    ],
    palette="tab10",
    title=f"Multi-omics locus plot — {REGION}",
    figsize=(12, 6),
    filepath=FIG_DIR / "locus_plot.png",
)

2026-03-09 17:03:27 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)
2026-03-09 17:03:27 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)
2026-03-09 17:03:27 [INFO] Extracted region chr21:36193575-36260996 (67421 bp) for 9 sample(s)