QuantNado — end-to-end tutorial¶
This notebook is a guided tour of QuantNado on a small multi-omics test dataset focused on the RUNX1 locus on chr21. It is organised to follow a typical analysis workflow: build stores, combine them, inspect the dataset, organise samples into reusable groups, explore and normalise signal, call peaks, visualise loci, quantify features, and finish with PCA.
| Section | Details |
|---|---|
| Data | Download the test inputs and reference files |
| Create | Build one QuantNado store per sample |
| Combine | Merge per-sample stores into one multi-sample dataset |
| Open & organise | Inspect the dataset, cache sample groups, and subset cleanly |
| Access data | Slice regions, genes, chromosomes, and assay subsets |
| Normalisation | Compute library sizes and apply CPM/RPKM |
| Peak calling | Call peaks and compare overlap between assays |
| PlotNado peaks | Add called peaks back onto PlotNado tracks |
| Locus views | Plot multi-omics tracks with QuantNado |
| PlotNado | Plot multi-omics tracks with PlotNado |
| Reduce | Summarise signal over annotated features |
| Extract | Build binned profiles for metaplots and tornadoplots |
| Quantify | Turn stored signal into feature-by-sample matrices |
| PCA | Reduce high-dimensional signal to sample-level structure |
import tarfile
import urllib.request
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
import plotnado as pn
import pyranges1 as pr
import quantnado
from quantnado.analysis.core import QuantNadoDataset
Data¶
Test dataset: chr21-only BAM files (ATAC, ChIP, RNA, METH) plus a SNP VCF. The reference annotation is chr21.gtf, and we will derive GENE-focused examples from it throughout the notebook.
DATA_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/seqnado_output.tar.gz"
REF_URL = "https://userweb.molbiol.ox.ac.uk/public/project/milne_group/cchahrou/quantnado/reference.tar.gz"
RUN_DIR = Path("multiomics_run")
RUN_DIR.mkdir(parents=True, exist_ok=True)
for url in [DATA_URL, REF_URL]:
filename = url.split("/")[-1]
basename = filename.split(".tar.gz")[0]
archive = RUN_DIR / filename
outdir = RUN_DIR / basename
if not outdir.exists():
print(f"Downloading {filename}...")
urllib.request.urlretrieve(url, archive)
with tarfile.open(archive) as tf:
tf.extractall(RUN_DIR)
archive.unlink()
print("Done.")
else:
print(f"{filename} already present — skipping.")
Downloading seqnado_output.tar.gz...
/var/folders/bs/nx8yhh5s2tn0r0307mtl2mf40000gn/T/ipykernel_65431/2674900429.py:15: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. tf.extractall(RUN_DIR)
Done. Downloading reference.tar.gz... Done.
DATA_DIR = RUN_DIR / "seqnado_output"
REF_DIR = RUN_DIR / "reference"
OUT_DIR = RUN_DIR / "output"
SAMPLES_DIR = OUT_DIR / "samples"
DATASET = OUT_DIR / "dataset.zarr"
GTF_FILE = REF_DIR / "chr21.gtf"
GENE_GTF_FILE = REF_DIR / "GENE_only.gtf"
FIG_DIR = RUN_DIR / "figures"
for d in [OUT_DIR, SAMPLES_DIR, FIG_DIR]:
d.mkdir(parents=True, exist_ok=True)
GENE_NAME = "RUNX1"
GENE_CHROM = "chr21"
GENE_START = 34_785_801
GENE_END = 35_051_302
GENE_STRAND = "-"
GENE_LOCUS = f"{GENE_CHROM}:{GENE_START}-{GENE_END}"
GENE_PAD = 2_000
gtf = pr.read_gtf(GTF_FILE)
GENE_gtf = gtf[gtf.gene_name == GENE_NAME]
GENE_gtf.to_gtf(str(GENE_GTF_FILE))
print(GENE_LOCUS, "strand=", GENE_STRAND)
print(f"GENE subset annotation written to {GENE_GTF_FILE}")
chr21:34785801-35051302 strand= - GENE subset annotation written to multiomics_run/reference/GENE_only.gtf
Create¶
quantnado.create_dataset dispatches to BamStore, MethylStore, or VariantStore based on assay. Each sample gets its own .zarr directory. Pass test=True to use the default test chromosomes, or test_chromosomes=[...] to provide an explicit list.
samples = [
{"assay": "ATAC", "sample_id": "atac", "bam": "atac/aligned/atac.bam"},
{
"assay": "ChIP",
"sample_id": "chip-rx_MLL",
"ip": "MLLN",
"bam": "chip/aligned/chip-rx_MLL.bam",
},
{
"assay": "METH",
"sample_id": "meth-rep1",
"bam": "meth/aligned/meth-rep1.bam",
"methyl": "meth/methylation/methyldackel/meth-rep1_hg38_CpG_inverted.bedGraph",
},
{
"assay": "METH",
"sample_id": "meth-rep2",
"bam": "meth/aligned/meth-rep2.bam",
"methyl": "meth/methylation/methyldackel/meth-rep2_hg38_CpG_inverted.bedGraph",
},
{"assay": "SNP", "sample_id": "snp", "variant": "snp/variant/snp.vcf.gz"},
{
"assay": "RNA",
"sample_id": "rna-spikein-control-rep1",
"bam": "rna/aligned/rna-spikein-control-rep1.bam",
"stranded": "R",
},
{
"assay": "RNA",
"sample_id": "rna-spikein-control-rep2",
"bam": "rna/aligned/rna-spikein-control-rep2.bam",
"stranded": "R",
},
{
"assay": "RNA",
"sample_id": "rna-spikein-treated-rep1",
"bam": "rna/aligned/rna-spikein-treated-rep1.bam",
"stranded": "R",
},
{
"assay": "RNA",
"sample_id": "rna-spikein-treated-rep2",
"bam": "rna/aligned/rna-spikein-treated-rep2.bam",
"stranded": "R",
},
]
metadata = pd.DataFrame(
[
{
"assay": s["assay"],
"sample_id": s["sample_id"],
"bam_path": str(DATA_DIR / s["bam"]) if "bam" in s else None,
"methylation_path": str(DATA_DIR / s["methyl"]) if "methyl" in s else None,
"variant_path": str(DATA_DIR / s["variant"]) if "variant" in s else None,
"ip": s.get("ip"),
"stranded": s.get("stranded"),
}
for s in samples
]
)
metadata
| assay | sample_id | bam_path | methylation_path | variant_path | ip | stranded | |
|---|---|---|---|---|---|---|---|
| 0 | ATAC | atac | multiomics_run/seqnado_output/atac/aligned/ata... | NaN | NaN | NaN | NaN |
| 1 | ChIP | chip-rx_MLL | multiomics_run/seqnado_output/chip/aligned/chi... | NaN | NaN | MLLN | NaN |
| 2 | METH | meth-rep1 | multiomics_run/seqnado_output/meth/aligned/met... | multiomics_run/seqnado_output/meth/methylation... | NaN | NaN | NaN |
| 3 | METH | meth-rep2 | multiomics_run/seqnado_output/meth/aligned/met... | multiomics_run/seqnado_output/meth/methylation... | NaN | NaN | NaN |
| 4 | SNP | snp | NaN | NaN | multiomics_run/seqnado_output/snp/variant/snp.... | NaN | NaN |
| 5 | RNA | rna-spikein-control-rep1 | multiomics_run/seqnado_output/rna/aligned/rna-... | NaN | NaN | NaN | R |
| 6 | RNA | rna-spikein-control-rep2 | multiomics_run/seqnado_output/rna/aligned/rna-... | NaN | NaN | NaN | R |
| 7 | RNA | rna-spikein-treated-rep1 | multiomics_run/seqnado_output/rna/aligned/rna-... | NaN | NaN | NaN | R |
| 8 | RNA | rna-spikein-treated-rep2 | multiomics_run/seqnado_output/rna/aligned/rna-... | NaN | NaN | NaN | R |
for _, row in metadata.iterrows():
sample, assay = row["sample_id"], row["assay"]
store = SAMPLES_DIR / f"{sample}.zarr"
if store.exists():
print(f"✓ {sample} ({assay})")
else:
quantnado.create_dataset(
sample_id=sample,
assay=assay,
output_path=store,
bam_path=row.get("bam_path"),
methyl_path=row.get("methylation_path"),
variants_path=row.get("variant_path"),
stranded=row.get("stranded"),
test_chromosomes=["chr21"],
)
print(f"✓ {sample} ({assay})")
2026-04-11 20:58:26.932 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/atac/aligned/atac.bam 2026-04-11 20:58:26.933 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:26.949 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks) 2026-04-11 20:58:27.501 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed atac: 7,114 mapped reads 2026-04-11 20:58:27.502 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/chip/aligned/chip-rx_MLL.bam 2026-04-11 20:58:27.503 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:27.514 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ atac (ATAC)
2026-04-11 20:58:28.123 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed chip-rx_MLL: 58,683 mapped reads 2026-04-11 20:58:28.125 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:28.125 | INFO | quantnado.dataset.store_methyl:from_files:343 - Reading methylation data from multiomics_run/seqnado_output/meth/methylation/methyldackel/meth-rep1_hg38_CpG_inverted.bedGraph 2026-04-11 20:58:28.130 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:28.143 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks) 2026-04-11 20:58:28.163 | INFO | quantnado.dataset.store_methyl:from_files:364 - Writing BAM coverage for meth-rep1
✓ chip-rx_MLL (ChIP)
2026-04-11 20:58:28.708 | INFO | quantnado.dataset.store_methyl:from_files:366 - Writing methylation data for meth-rep1 2026-04-11 20:58:29.225 | INFO | quantnado.dataset.store_methyl:_finalise:273 - Completed meth-rep1: 228,719 mapped reads 2026-04-11 20:58:29.227 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:29.227 | INFO | quantnado.dataset.store_methyl:from_files:343 - Reading methylation data from multiomics_run/seqnado_output/meth/methylation/methyldackel/meth-rep2_hg38_CpG_inverted.bedGraph 2026-04-11 20:58:29.232 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:29.245 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks) 2026-04-11 20:58:29.267 | INFO | quantnado.dataset.store_methyl:from_files:364 - Writing BAM coverage for meth-rep2
✓ meth-rep1 (METH)
2026-04-11 20:58:29.872 | INFO | quantnado.dataset.store_methyl:from_files:366 - Writing methylation data for meth-rep2 2026-04-11 20:58:30.367 | INFO | quantnado.dataset.store_methyl:_finalise:273 - Completed meth-rep2: 244,369 mapped reads 2026-04-11 20:58:30.368 | INFO | quantnado.dataset.store_variants:from_vcf:346 - Reading VCF: multiomics_run/seqnado_output/snp/variant/snp.vcf.gz 2026-04-11 20:58:30.368 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:30.370 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:30.397 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ meth-rep2 (METH)
2026-04-11 20:58:30.859 | INFO | quantnado.dataset.store_variants:_finalise:293 - Completed snp: 149 variants 2026-04-11 20:58:30.860 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/rna/aligned/rna-spikein-control-rep1.bam 2026-04-11 20:58:30.860 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:30.872 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ snp (SNP)
2026-04-11 20:58:31.839 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed rna-spikein-control-rep1: 140,385 mapped reads 2026-04-11 20:58:31.840 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/rna/aligned/rna-spikein-control-rep2.bam 2026-04-11 20:58:31.841 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:31.859 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ rna-spikein-control-rep1 (RNA)
2026-04-11 20:58:32.793 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed rna-spikein-control-rep2: 191,124 mapped reads 2026-04-11 20:58:32.794 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/rna/aligned/rna-spikein-treated-rep1.bam 2026-04-11 20:58:32.795 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:32.806 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ rna-spikein-control-rep2 (RNA)
2026-04-11 20:58:33.841 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed rna-spikein-treated-rep1: 364,743 mapped reads 2026-04-11 20:58:33.842 | INFO | quantnado.dataset.store_bam:from_bam_files:512 - Extracting chromsizes from multiomics_run/seqnado_output/rna/aligned/rna-spikein-treated-rep2.bam 2026-04-11 20:58:33.843 | INFO | quantnado.dataset.metadata:_parse_chromsizes:79 - Test mode enabled: keeping chromosomes ['chr21'] 2026-04-11 20:58:33.855 | INFO | quantnado.dataset.store_bam:_resolve_chunk_len:98 - Resolved chunk_len=33554432 (2 chunks)
✓ rna-spikein-treated-rep1 (RNA)
2026-04-11 20:58:34.898 | INFO | quantnado.dataset.store_bam:_finalise:430 - Completed rna-spikein-treated-rep2: 302,428 mapped reads
✓ rna-spikein-treated-rep2 (RNA)
Combine¶
QuantNadoDataset.combine stacks completed per-sample stores along axis 0 ((1, L) × N → (N, L)) and writes a unified multi-sample Zarr. Incomplete stores are silently skipped.
QuantNadoDataset.combine(src=SAMPLES_DIR, output=DATASET)
2026-04-11 20:58:34.950 | INFO | quantnado.analysis.core:_load_directory:286 - Opened 9 stores from multiomics_run/output/samples 2026-04-11 20:58:39.111 | INFO | quantnado.analysis.core:combine:1649 - Combined 9 stores → multiomics_run/output/dataset.zarr
<quantnado.analysis.core.QuantNadoDataset at 0x120dd2850>
Open & organise¶
QuantNadoDataset auto-detects whether you point it at a directory of per-sample stores or a single combined .zarr. In this tutorial we open the combined store, attach a GTF with annotation=, and then use qn.info plus cached sample groups to make later analyses easier to read and subset.
qn = QuantNadoDataset(DATASET, annotation=GTF_FILE)
qn.info
2026-04-11 20:58:39.298 | WARNING | quantnado.analysis.core:set_annotation:671 - No 'gene' features found in GTF; falling back to 'transcript' 2026-04-11 20:58:39.368 | INFO | quantnado.analysis.core:set_annotation:678 - Loaded annotation: 1,087 genes from multiomics_run/reference/chr21.gtf
DatasetInfo
assays : ATAC, CHIP, METH, RNA, SNP
chromosomes : chr21
chromsizes : chr21=46,709,983
layout : combined
path : multiomics_run/output/dataset.zarr
subset : False
groups
ATAC
n : 1
samples : atac
keys : coverage
CHIP
n : 1
samples : chip-rx_MLL
keys : coverage
ips : MLL
METH
n : 2
samples : meth-rep1, meth-rep2
keys : coverage, methyl_pct, n_methylated, n_total
RNA
n : 4
samples : rna-spikein-control-rep1, rna-spikein-control-rep2, rna-spikein-treated-rep1, rna-spikein-treated-rep2
keys : coverage, rna_fwd, rna_rev
SNP
n : 1
samples : snp
keys : AF, DP, GT, MQ, coverage
Group and subset samples¶
group_by() lets you cache reusable sample-group namespaces such as ip, treatment, replicate, or spikein. Later, subset() can intersect assay, sample, IP, and group filters in one call.
With match="contains", a single label can match multiple substrings. In this notebook, the spikein label intentionally groups both RNA spike-in samples and ChIP rx samples.
# Cache reusable sample-group namespaces. With match="contains",
# each label can match one or many sample-name patterns.
qn.group_by(
ip="ip",
treatment={
"control": ["control"],
"treated": ["treated"],
},
replicate={
"rep1": ["rep1"],
"rep2": ["rep2"],
},
spikein={
"spikein": ["spikein", "rx"],
},
match="contains",
)
print(qn.info)
DatasetInfo
assays : ATAC, CHIP, METH, RNA, SNP
chromosomes : chr21
chromsizes : chr21=46,709,983
layout : combined
path : multiomics_run/output/dataset.zarr
subset : False
groups
ip : MLL (1)
treatment: control, treated (2)
replicate: rep1, rep2 (2)
spikein : spikein (1)
ATAC
n : 1
samples : atac
keys : coverage
CHIP
n : 1
samples : chip-rx_MLL
keys : coverage
ips : MLL
METH
n : 2
samples : meth-rep1, meth-rep2
keys : coverage, methyl_pct, n_methylated, n_total
RNA
n : 4
samples : rna-spikein-control-rep1, rna-spikein-control-rep2, rna-spikein-treated-rep1, rna-spikein-treated-rep2
keys : coverage, rna_fwd, rna_rev
SNP
n : 1
samples : snp
keys : AF, DP, GT, MQ, coverage
chip_qn = qn.subset(
assay="CHIP",
ip="MLL",
group={
"spikein": "spikein",
},
)
chip_qn.info
DatasetInfo
assays : CHIP
chromosomes : chr21
chromsizes : chr21=46,709,983
layout : combined
path : multiomics_run/output/dataset.zarr
subset : True
groups
ip : MLL (1)
treatment: control, treated (2)
replicate: rep1, rep2 (2)
spikein : spikein (1)
subset_samples : chip-rx_MLL
CHIP
n : 1
samples : chip-rx_MLL
keys : coverage
ips : MLL
rna_qn = qn.subset(
assay="RNA",
group={
"treatment": ["treated", "control"],
"spikein": "spikein",
"replicate": ["rep1", "rep2"],
},
)
rna_qn.info
DatasetInfo
assays : RNA
chromosomes : chr21
chromsizes : chr21=46,709,983
layout : combined
path : multiomics_run/output/dataset.zarr
subset : True
groups
ip : MLL (1)
treatment: control, treated (2)
replicate: rep1, rep2 (2)
spikein : spikein (1)
subset_samples : rna-spikein-control-rep1, rna-spikein-control-rep2, rna-spikein-treated-rep1, rna-spikein-treated-rep2
RNA
n : 4
samples : rna-spikein-control-rep1, rna-spikein-control-rep2, rna-spikein-treated-rep1, rna-spikein-treated-rep2
keys : coverage, rna_fwd, rna_rev
Access data¶
Once the dataset is open, the core object model is xarray-based. qn.sel() returns an xr.Dataset with shared sample and position coordinates, and the same API works whether you want one region, one gene, one assay, or a whole chromosome tree.
Region slice → xr.Dataset¶
qn.sel(chrom, start, end) returns an xr.Dataset with dims (sample, position). Coordinates are 1-based inclusive.
region = qn.sel(chrom=GENE_CHROM, start=GENE_START, end=GENE_END)
region.info
<bound method Dataset.info of <xarray.Dataset> Size: 136MB
Dimensions: (sample: 9, position: 265502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>>
# Select one sample — all data variables, 1D
region.sel(sample="atac")
region.info
<bound method Dataset.info of <xarray.Dataset> Size: 136MB
Dimensions: (sample: 9, position: 265502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>>
# Sub-range within the region — still multi-sample
region.sel(position=slice(5_220_000, 5_230_000))
region.info
<bound method Dataset.info of <xarray.Dataset> Size: 136MB
Dimensions: (sample: 9, position: 265502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>>
Gene-based access¶
With a GTF annotation loaded, qn.gene_info() returns gene metadata and qn.sel_gene() slices directly to the gene locus with optional padding. This is the most convenient way to move from gene names to genomic coordinates in exploratory work.
gene = GENE_NAME
info = qn.gene_info(gene)
print(f"{gene}: {info['locus']} strand={info['strand']}")
gene_region = qn.sel_gene(gene, padding=GENE_PAD)
gene_region.info
RUNX1: chr21:36160098-36421599 strand=-
<bound method Dataset.info of <xarray.Dataset> Size: 136MB
Dimensions: (sample: 9, position: 265502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 36158098 36158099 ... 36423598 36423599
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
Attributes:
gene_name: RUNX1
gene_id: RUNX1
gene_strand: -
locus: chr21:36158098-36423599
exons: [{'start': 36160098, 'end': 36164907, 'exon_number': '6'}, ...>
Full genome → xr.DataTree¶
qn.to_datatree() returns a lazy xr.DataTree with one child node per chromosome. It is helpful when you want to browse the full store structure or work chromosome-by-chromosome.
tree = qn.to_datatree()
tree
<xarray.DataTree>
Group: /
│ Attributes:
│ chromsizes: {'chr21': 46709983}
└── Group: /chr21
Dimensions: (sample: 9, position: 46709983)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 374MB 1 2 3 4 ... 46709981 46709982 46709983
Data variables:
AF (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
coverage (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
DP (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
GT (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
methyl_pct (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
MQ (sample, position) float32 2GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
n_methylated (sample, position) float64 3GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
n_total (sample, position) float64 3GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
rna_fwd (sample, position) float64 3GB dask.array<chunksize=(1, 33554432), meta=np.ndarray>
rna_rev (sample, position) float64 3GB dask.array<chunksize=(1, 33554432), meta=np.ndarray># Slice any chromosome node like a normal Dataset
tree[GENE_CHROM].ds.sel(position=slice(GENE_START, GENE_END))
<xarray.Dataset> Size: 136MB
Dimensions: (sample: 9, position: 265502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 265502), meta=np.ndarray>Multiple assays¶
Most QuantNado methods can subset by biological assay as well as by explicit sample names. This makes it easy to compare related modalities, such as RNA and ATAC over the same locus, without rebuilding the dataset.
# Slice any chromosome node like a normal Dataset
assays = qn.sel("chr21", GENE_START, GENE_END, assay=["RNA", "ATAC"])
assays.info
<bound method Dataset.info of <xarray.Dataset> Size: 76MB
Dimensions: (sample: 5, position: 265502)
Coordinates:
* sample (sample) <U24 480B 'atac' ... 'rna-spikein-treated-rep2'
assay (sample) <U4 80B 'ATAC' 'RNA' 'RNA' 'RNA' 'RNA'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Data variables:
AF (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
coverage (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
DP (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
GT (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
methyl_pct (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
MQ (sample, position) float32 5MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_methylated (sample, position) float64 11MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
n_total (sample, position) float64 11MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_fwd (sample, position) float64 11MB dask.array<chunksize=(1, 265502), meta=np.ndarray>
rna_rev (sample, position) float64 11MB dask.array<chunksize=(1, 265502), meta=np.ndarray>>
Normalisation¶
Library sizes are stored in each sample metadata block, so CPM/RPKM/TPM-style scaling can be applied directly to xarray or pandas outputs. This section shows both the summary statistics (qn.library_sizes()) and normalisation of locus-level signal.
lib_sizes = qn.library_sizes()
lib_sizes.rename("mapped reads").to_frame()
| mapped reads | |
|---|---|
| atac | 7114.0 |
| chip-rx_MLL | 58683.0 |
| meth-rep1 | 228719.0 |
| meth-rep2 | 244369.0 |
| rna-spikein-control-rep1 | 140385.0 |
| rna-spikein-control-rep2 | 191124.0 |
| rna-spikein-treated-rep1 | 364743.0 |
| rna-spikein-treated-rep2 | 302428.0 |
| snp | 149.0 |
region = qn.sel(chrom=GENE_CHROM, start=GENE_START, end=GENE_END)
# Plot CPM-normalised RNA forward coverage
rna_cpm = qn.normalise(region["rna_fwd"], method="cpm")
print(rna_cpm.dims)
print(rna_cpm.coords)
print(rna_cpm.sizes)
qn.info_of(rna_cpm)
2026-04-11 20:58:43.269 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM.
('sample', 'position')
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34785801 34785802 ... 35051301 35051302
Frozen({'sample': 9, 'position': 265502})
ObjectInfo type : DataArray name : rna_fwd dims : sample, position sizes : sample=9, position=265502 dtype : float64 coords : sample, position, assay
rna_samples = [s for s in qn.sample_names if "rna" in s]
rna_cpm.sel(sample=rna_samples).compute().plot.line(
x="position",
hue="sample",
figsize=(14, 3),
)
plt.title(f"RNA forward strand coverage (CPM) — {GENE_LOCUS}")
plt.tight_layout()
# Plot CPM-normalised RNA reverse coverage
rna_cpm = qn.normalise(region["rna_rev"], method="cpm")
rna_samples = [s for s in qn.sample_names if "rna" in s]
rna_cpm.sel(sample=rna_samples).compute().plot.line(
x="position",
hue="sample",
figsize=(14, 3),
)
plt.title(f"RNA reverse strand coverage (CPM) — {GENE_LOCUS}")
plt.tight_layout()
2026-04-11 20:58:43.985 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM.
Peak calling¶
qn.call_peaks() runs peak calling directly from the QuantNado dataset. qn.available_peak_methods lists the supported algorithms, and you can pass one method or a list of methods.
This section comes after normalisation because it is a good place to contrast two different ideas: explicit dataset-level normalisation for exploratory plots, and method-specific internal normalisation inside the peak callers themselves. In particular, LanceOtron already applies RPKM-normalised per-base coverage internally, so the clearest pattern is usually ATAC with quantile and ChIP with lanceotron.
Below we first call ATAC peaks with quantile, then ChIP peaks with lanceotron, and finally show how to request more than one method at once. We then compare one ATAC sample with one ChIP sample using the overlap table and a Venn diagram.
PEAKS_DIR = OUT_DIR / "peaks"
PEAKS_DIR.mkdir(exist_ok=True)
print("Available peak methods:", qn.available_peak_methods)
# ATAC is often a good fit for quantile-based peak calling.
quantile_beds = qn.call_peaks(
PEAKS_DIR / "atac_quantile",
method="quantile",
assay="ATAC",
)
print("Quantile peaks:", {k: v.name for k, v in quantile_beds.items()})
# LanceOtron internally uses RPKM-normalised per-base coverage, so we do
# not pre-normalise the dataset before calling it.
lanceotron_beds = qn.call_peaks(
PEAKS_DIR / "chip_lanceotron",
method="lanceotron",
assay="CHIP",
)
print("LanceOtron peaks:", {k: v.name for k, v in lanceotron_beds.items()})
# You can also request multiple methods at once.
beds_by_method = qn.call_peaks(
PEAKS_DIR / "multi_method",
method=["quantile", "lanceotron"],
assay=["CHIP", "ATAC"],
)
print({method: {k: v.name for k, v in beds.items()} for method, beds in beds_by_method.items()})
Available peak methods: ['quantile', 'seacr', 'lanceotron']
2026-04-11 20:58:46.778 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:169 - Found 1 completed sample(s) in multiomics_run/output/dataset.zarr 2026-04-11 20:58:46.779 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:182 - Sliding-window tiling chr21 (46,709,983 bp): size=128, overlap=8, step=120 2026-04-11 20:58:49.770 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:244 - Saved log-RPKM tile signal to multiomics_run/output/peaks/atac_quantile/log_rpkm_128bp_8bp_overlap.parquet 2026-04-11 20:58:49.770 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:60 - Calling peaks for sample: atac 2026-04-11 20:58:49.772 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:72 - [atac] Quantile 0.98 threshold = 14.9768 2026-04-11 20:58:49.777 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:104 - [atac] Final peak count: 7 2026-04-11 20:58:49.779 | SUCCESS | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:264 - Peak BED saved to: multiomics_run/output/peaks/atac_quantile/atac.bed
Quantile peaks: {'atac': 'atac.bed'}
2026-04-11 20:58:50.681 | INFO | quantnado.peak_calling._device:get_device:51 - Using device: mps 2026-04-11 20:58:50.682 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:778 - GPU detected (mps); forcing n_workers=1 for thread safety 2026-04-11 20:58:50.682 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:782 - Loading LanceOtron model and scaler assets … 2026-04-11 20:58:50.758 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:806 - Processing 1 sample(s) with 1 worker(s) 2026-04-11 20:58:50.759 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:852 - Sample: chip-rx_MLL 2026-04-11 20:58:54.437 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:640 - [chr21] 13504 candidates 2026-04-11 20:59:01.764 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_normalise_features:561 - GPU normalise_features failed (Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.); using numpy fallback. 2026-04-11 20:59:06.260 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:663 - [chr21] 12400 peaks after score filter 2026-04-11 20:59:06.328 | SUCCESS | quantnado.peak_calling.call_lanceotron_peaks:_write_sample_peaks:846 - [chip-rx_MLL] 12400 peaks → multiomics_run/output/peaks/chip_lanceotron/chip-rx_MLL_lanceotron_peaks.bed 2026-04-11 20:59:06.341 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:169 - Found 1 completed sample(s) in multiomics_run/output/dataset.zarr 2026-04-11 20:59:06.342 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:182 - Sliding-window tiling chr21 (46,709,983 bp): size=128, overlap=8, step=120
LanceOtron peaks: {'chip-rx_MLL_lanceotron_peaks': 'chip-rx_MLL_lanceotron_peaks.bed'}
2026-04-11 20:59:08.927 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:244 - Saved log-RPKM tile signal to multiomics_run/output/peaks/multi_method/quantile/chip/log_rpkm_128bp_8bp_overlap.parquet 2026-04-11 20:59:08.927 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:60 - Calling peaks for sample: chip-rx_MLL 2026-04-11 20:59:08.930 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:72 - [chip-rx_MLL] Quantile 0.98 threshold = 9.2735 2026-04-11 20:59:08.935 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:104 - [chip-rx_MLL] Final peak count: 1830 2026-04-11 20:59:08.940 | SUCCESS | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:264 - Peak BED saved to: multiomics_run/output/peaks/multi_method/quantile/chip/chip-rx_MLL.bed 2026-04-11 20:59:08.944 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:169 - Found 1 completed sample(s) in multiomics_run/output/dataset.zarr 2026-04-11 20:59:08.944 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:182 - Sliding-window tiling chr21 (46,709,983 bp): size=128, overlap=8, step=120 2026-04-11 20:59:11.537 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:244 - Saved log-RPKM tile signal to multiomics_run/output/peaks/multi_method/quantile/atac/log_rpkm_128bp_8bp_overlap.parquet 2026-04-11 20:59:11.538 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:60 - Calling peaks for sample: atac 2026-04-11 20:59:11.539 | DEBUG | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:72 - [atac] Quantile 0.98 threshold = 14.9768 2026-04-11 20:59:11.543 | INFO | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks:104 - [atac] Final peak count: 7 2026-04-11 20:59:11.546 | SUCCESS | quantnado.peak_calling.call_quantile_peaks:call_quantile_peaks_from_zarr:264 - Peak BED saved to: multiomics_run/output/peaks/multi_method/quantile/atac/atac.bed 2026-04-11 20:59:11.547 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:778 - GPU detected (mps); forcing n_workers=1 for thread safety 2026-04-11 20:59:11.548 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:782 - Loading LanceOtron model and scaler assets … 2026-04-11 20:59:11.579 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:806 - Processing 1 sample(s) with 1 worker(s) 2026-04-11 20:59:11.580 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:852 - Sample: chip-rx_MLL 2026-04-11 20:59:15.127 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:640 - [chr21] 13504 candidates 2026-04-11 20:59:22.288 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_normalise_features:561 - GPU normalise_features failed (Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.); using numpy fallback. 2026-04-11 20:59:25.140 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:663 - [chr21] 12400 peaks after score filter 2026-04-11 20:59:25.183 | SUCCESS | quantnado.peak_calling.call_lanceotron_peaks:_write_sample_peaks:846 - [chip-rx_MLL] 12400 peaks → multiomics_run/output/peaks/multi_method/lanceotron/chip/chip-rx_MLL_lanceotron_peaks.bed 2026-04-11 20:59:25.186 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:778 - GPU detected (mps); forcing n_workers=1 for thread safety 2026-04-11 20:59:25.187 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:782 - Loading LanceOtron model and scaler assets … 2026-04-11 20:59:25.221 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:806 - Processing 1 sample(s) with 1 worker(s) 2026-04-11 20:59:25.222 | INFO | quantnado.peak_calling.call_lanceotron_peaks:call_lanceotron_peaks_from_zarr:852 - Sample: atac 2026-04-11 20:59:28.671 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:640 - [chr21] 36 candidates 2026-04-11 20:59:28.704 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_normalise_features:561 - GPU normalise_features failed (Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.); using numpy fallback. 2026-04-11 20:59:28.817 | DEBUG | quantnado.peak_calling.call_lanceotron_peaks:_process_chromosome:663 - [chr21] 36 peaks after score filter 2026-04-11 20:59:28.819 | SUCCESS | quantnado.peak_calling.call_lanceotron_peaks:_write_sample_peaks:846 - [atac] 36 peaks → multiomics_run/output/peaks/multi_method/lanceotron/atac/atac_lanceotron_peaks.bed
{'quantile': {'chip-rx_MLL': 'chip-rx_MLL.bed', 'atac': 'atac.bed'}, 'lanceotron': {'chip-rx_MLL_lanceotron_peaks': 'chip-rx_MLL_lanceotron_peaks.bed', 'atac_lanceotron_peaks': 'atac_lanceotron_peaks.bed'}}
# Pick two samples to compare — one ATAC, one ChIP
chip_mll_peak = next(path for name, path in lanceotron_beds.items() if "MLL" in name)
peak_sets = {
"ATAC": quantile_beds["atac"],
"ChIP-MLL": chip_mll_peak,
}
# Overlap table: one row per exclusive Venn region
overlap_df = qn.peak_overlap(peak_sets)
print(overlap_df)
# Venn diagram of peak overlaps
qn.venn_peaks(
peak_sets,
title="ATAC vs ChIP-MLL peak overlap (chr21)",
)
combination n_peaks pct_of_first 0 (ATAC,) 7 100.0 1 (ChIP-MLL,) 12400 100.0 2 (ATAC, ChIP-MLL) 0 0.0
<Axes: title={'center': 'ATAC vs ChIP-MLL peak overlap (chr21)'}>
beds_by_method
{'quantile': {'chip-rx_MLL': PosixPath('multiomics_run/output/peaks/multi_method/quantile/chip/chip-rx_MLL.bed'),
'atac': PosixPath('multiomics_run/output/peaks/multi_method/quantile/atac/atac.bed')},
'lanceotron': {'chip-rx_MLL_lanceotron_peaks': PosixPath('multiomics_run/output/peaks/multi_method/lanceotron/chip/chip-rx_MLL_lanceotron_peaks.bed'),
'atac_lanceotron_peaks': PosixPath('multiomics_run/output/peaks/multi_method/lanceotron/atac/atac_lanceotron_peaks.bed')}}
# Pick two samples to compare — one ATAC, one ChIP
peak_sets = {
"ATAC LanceOtron": beds_by_method["lanceotron"]["atac_lanceotron_peaks"],
"ATAC Quantile": beds_by_method["quantile"]["atac"],
}
# Overlap table: one row per exclusive Venn region
overlap_df = qn.peak_overlap(peak_sets)
print(overlap_df)
# Venn diagram of peak overlaps
qn.venn_peaks(
peak_sets,
title="ATAC vs ChIP-MLL peak overlap (chr21)",
)
combination n_peaks pct_of_first 0 (ATAC LanceOtron,) 30 83.3 1 (ATAC Quantile,) 0 0.0 2 (ATAC LanceOtron, ATAC Quantile) 6 16.7
<Axes: title={'center': 'ATAC vs ChIP-MLL peak overlap (chr21)'}>
PlotNado peaks¶
The BED files returned by qn.call_peaks() can be added back onto PlotNado tracks, which makes it easy to compare discrete called intervals with the underlying coverage signal at the same locus.
Locus views¶
qn.locus_plot() gives a genome-browser-style multi-track summary directly from QuantNado arrays, while PlotNado provides a more composable plotting interface for publication-style tracks. Together they cover fast inspection and polished visualisation.
| modality | rendering |
|---|---|
coverage |
filled area (per-base depth) |
stranded_coverage |
mirrored strands — forward above zero, reverse below |
methylation |
scatter at CpG positions (methylation %) |
variant |
lollipop plot (allele frequency, coloured by genotype) |
region = qn.sel(
chrom=GENE_CHROM,
start=GENE_START - GENE_PAD,
end=GENE_END + GENE_PAD,
assay=qn.assays,
)
sample_names = region.sample.to_series()
region.info
<bound method Dataset.info of <xarray.Dataset> Size: 138MB
Dimensions: (sample: 9, position: 269502)
Coordinates:
* sample (sample) <U24 864B 'atac' 'chip-rx_MLL' ... 'snp'
assay (sample) <U4 144B 'ATAC' 'CHIP' 'METH' ... 'RNA' 'RNA' 'SNP'
* position (position) int64 2MB 34783801 34783802 ... 35053301 35053302
Data variables:
AF (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
coverage (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
DP (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
GT (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
methyl_pct (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
MQ (sample, position) float32 10MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
n_methylated (sample, position) float64 19MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
n_total (sample, position) float64 19MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
rna_fwd (sample, position) float64 19MB dask.array<chunksize=(1, 269502), meta=np.ndarray>
rna_rev (sample, position) float64 19MB dask.array<chunksize=(1, 269502), meta=np.ndarray>>
QuantNado locus plot¶
qn.locus_plot(
locus=GENE_LOCUS,
sample_names=sample_names,
modality=["coverage"] * 2 + ["methylation"] * 2 + ["stranded_coverage"] * 4 + ["variant"],
coverage=region["coverage"],
coverage_fwd=region["rna_fwd"],
coverage_rev=region["rna_rev"],
methylation=region["methyl_pct"],
allele_freq=region["AF"],
genotype=region["GT"],
figsize=(14, 10),
title=f"{GENE_NAME} ({GENE_LOCUS})",
)
<QuantNadoAxesList n=9>
PlotNado locus plot¶
beds_by_method
{'quantile': {'chip-rx_MLL': PosixPath('multiomics_run/output/peaks/multi_method/quantile/chip/chip-rx_MLL.bed'),
'atac': PosixPath('multiomics_run/output/peaks/multi_method/quantile/atac/atac.bed')},
'lanceotron': {'chip-rx_MLL_lanceotron_peaks': PosixPath('multiomics_run/output/peaks/multi_method/lanceotron/chip/chip-rx_MLL_lanceotron_peaks.bed'),
'atac_lanceotron_peaks': PosixPath('multiomics_run/output/peaks/multi_method/lanceotron/atac/atac_lanceotron_peaks.bed')}}
qn_cpm = qn.normalise(method="cpm")
groups = qn_cpm.groups
height = 1
peak_height = 0.8
gf = pn.GenomicFigure(theme="publication")
gf.scalebar()
gf.genes("hg38", minimum_gene_length=1e5)
for sample in groups["ATAC"]:
col = "black"
gf.quantnado_coverage(
sample,
quantnado=qn_cpm,
color=col,
title=sample,
height=height,
)
gf.spacer(height=0.1)
gf.bed(beds_by_method["quantile"][sample], color=col, edge_color=col, height=peak_height, title=f"{sample} quantile peaks")
gf.bed(beds_by_method["lanceotron"][f"{sample}_lanceotron_peaks"], color=col, edge_color=col, height=peak_height, title=f"{sample} lanceotron peaks")
gf.spacer(height=0.1)
for sample in groups["CHIP"]:
col = "darkblue"
gf.quantnado_coverage(
sample,
quantnado=qn_cpm,
color=col,
title=sample,
height=height,
)
gf.spacer(height=0.1)
gf.bed(beds_by_method["quantile"][sample], color=col, edge_color=col, height=peak_height, title=f"{sample} quantile peaks")
gf.bed(beds_by_method["lanceotron"][f"{sample}_lanceotron_peaks"], color=col, edge_color=col, height=peak_height, title=f"{sample} lanceotron peaks")
gf.spacer(height=0.1)
gf.spacer(height=0.3)
for sample in groups["RNA"]:
gf.quantnado_stranded_coverage(
sample,
quantnado=qn_cpm,
color="green",
reverse_color="darkgreen",
title=sample,
height=height,
)
gf.spacer(height=0.1)
for sample in groups["METH"]:
gf.quantnado_methylation(
sample,
quantnado=qn_cpm,
methylation_variable="methyl_pct",
color="purple",
title=sample,
height=height,
)
gf.spacer(height=0.1)
for sample in groups["SNP"]:
gf.quantnado_variant(
sample,
quantnado=qn_cpm,
genotype_variable="GT",
title=sample,
height=height,
)
gf.axis()
gf.plot_gene(GENE_NAME, extend=0.1)
2026-04-11 20:59:33.674 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:33.841 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:33.960 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.021 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.172 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.233 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.388 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.494 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.655 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM. 2026-04-11 20:59:34.714 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM.
Reduce¶
qn.reduce() summarises per-base signal over BED intervals or annotation-derived features. Here we keep the feature set restricted to the RUNX1 locus so the output stays directly comparable with the earlier region slices and locus plots. Reduced outputs are ideal for heatmaps and correlations.
# Reduce mean RNA signal over GENE transcript bodies
reduced = qn.reduce(
gtf_path=str(GTF_FILE),
feature_type="transcript",
assay=["RNA"],
modality=["rna_fwd"],
reduction="mean",
)
reduced
2026-04-11 20:59:37.516 | INFO | quantnado.analysis.reduce:_log_chromosome_overlap:185 - Chromosome compatibility check: 1 shared chromosomes out of 1 in GTF and 1 in dataset 2026-04-11 20:59:37.516 | DEBUG | quantnado.analysis.reduce:_log_chromosome_overlap:191 - Shared chromosomes: ['chr21']
<xarray.Dataset> Size: 103kB
Dimensions: (ranges: 986, sample: 4)
Coordinates:
* ranges (ranges) int64 8kB 0 1 2 3 4 5 6 ... 980 981 982 983 984 985
range_index (ranges) int64 8kB 3129 3133 3154 3177 ... 20305 20307 20309
start (ranges) int64 8kB 46707966 46683849 ... 9733506 9689972
end (ranges) int64 8kB 46709983 46707810 ... 9734109 9690726
range_length (ranges) int64 8kB 2017 23961 23961 2199 ... 156 842 603 754
contig (ranges) object 8kB 'chr21' 'chr21' ... 'chr21' 'chr21'
strand (ranges) object 8kB '+' '-' '-' '-' '-' ... '+' '+' '+' '-'
* sample (sample) <U24 384B 'rna-spikein-control-rep1' ... 'rna-spik...
Data variables:
sum (ranges, sample) float32 16kB dask.array<chunksize=(986, 4), meta=np.ndarray>
count (ranges, sample) float32 16kB dask.array<chunksize=(986, 4), meta=np.ndarray>
mean (ranges, sample) float32 16kB dask.array<chunksize=(986, 4), meta=np.ndarray>qn.heatmap(
reduced,
variable="mean",
exclude_zeros=True,
log_transform=True,
zscore=0,
title="Mean RNA signal over GENE transcript bodies",
figsize=(6, 7),
)
<seaborn.matrix.ClusterGrid at 0x1370692b0>
corr_df, _ = qn.correlate(
reduced,
variable="mean",
log_transform=True,
method="pearson",
title="RNA sample correlation over GENE transcript bodies",
)
corr_df
/Users/catherine/work/software/QuantNado/.venv/lib/python3.13/site-packages/seaborn/matrix.py:1124: UserWarning: ``square=True`` ignored in clustermap warnings.warn(msg)
| rna-spikein-control-rep1 | rna-spikein-control-rep2 | rna-spikein-treated-rep1 | rna-spikein-treated-rep2 | |
|---|---|---|---|---|
| rna-spikein-control-rep1 | 1.000000 | 0.819546 | 0.847655 | 0.916322 |
| rna-spikein-control-rep2 | 0.819546 | 1.000000 | 0.922547 | 0.843844 |
| rna-spikein-treated-rep1 | 0.847655 | 0.922547 | 1.000000 | 0.863132 |
| rna-spikein-treated-rep2 | 0.916322 | 0.843844 | 0.863132 | 1.000000 |
Extract¶
qn.extract() bins signal into fixed-width windows around genomic anchors such as promoters or transcripts. Those binned matrices are the natural input for metaplots and tornadoplots, which are useful when you want to compare shape as well as magnitude.
# Extract ±2 kb around GENE transcript-promoter TSS windows in 50 bp bins — ATAC signal
binned = qn.extract(
feature_type="promoter",
GTF_FILE=str(GTF_FILE),
anchor_feature="transcript",
upstream=2000,
downstream=2000,
anchor="start",
bin_size=50,
assay="ATAC",
modality="coverage",
)
print(binned.dims, binned.shape)
('interval', 'bin', 'sample') (1087, 80, 1)
# The extracted ATAC view contains one selected sample
list(binned.sample.values)
[np.str_('atac')]
# CPM-normalise then plot the average profile (metaplot)
binned_cpm = qn.normalise(binned, method="cpm")
qn.metaplot(
binned_cpm,
error_stat="sem",
reference_label="TSS",
title="ATAC signal at GENE transcript promoters",
figsize=(8, 4),
)
2026-04-11 20:59:46.815 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM.
<Axes: title={'center': 'ATAC signal at GENE transcript promoters'}, xlabel='Relative position', ylabel='Mean signal'>
# Tornadoplot — heatmap with one row per GENE promoter interval, sorted by mean ATAC signal
qn.tornadoplot(
binned_cpm,
modality="coverage",
sort_by="mean",
reference_label="TSS",
title="ATAC signal across GENE transcript promoters",
figsize=(4, 4),
)
<QuantNadoAxesList n=1>
# RNA — compare control vs treated at the same GENE transcript promoters
binned_rna = qn.extract(
feature_type="promoter",
GTF_FILE=str(GTF_FILE),
anchor_feature="transcript",
upstream=2000,
downstream=2000,
anchor="start",
bin_size=50,
assay="RNA",
modality="rna_fwd",
)
binned_rna_cpm = qn.normalise(binned_rna, method="cpm")
groups = qn.group_by(
groups={
"control": ["control"],
"treated": ["treated"],
},
match="contains",
)
qn.metaplot(
binned_rna_cpm,
groups=groups,
error_stat="sem",
reference_label="TSS",
title="RNA forward signal at GENE transcript promoters — control vs treated",
figsize=(8, 4),
)
2026-04-11 20:59:49.210 | DEBUG | quantnado.analysis.normalise:_normalise_xr_dataarray:583 - Normalised xr.DataArray to CPM.
<Axes: title={'center': 'RNA forward signal at GENE transcript promoters — control vs treated'}, xlabel='Relative position', ylabel='Mean signal'>
# Tornadoplot — heatmap with one row per GENE promoter interval, sorted by mean ATAC signal
qn.tornadoplot(
binned_rna_cpm,
modality="coverage",
sort_by="mean",
reference_label="TSS",
title="RNA signal across transcript promoters",
figsize=(10, 4),
)
<QuantNadoAxesList n=4>
Quantify¶
qn.quantify_signal() is the clearest way to turn stored QuantNado signal into a feature-by-sample matrix. It is great for exploratory RNA, ATAC, or ChIP summaries. qn.count_features(engine="signal") exposes the same backend through a counting-style API, but it is still signal-derived rather than BAM-backed read assignment.
# Quantify RNA signal over transcript bodies — RNA samples only
counts, features = qn.quantify_signal(
gtf_file=str(GTF_FILE),
feature_type="transcript",
assay="RNA",
modality="coverage",
)
rna_cols = list(counts.columns)
print(f"{len(counts)} features × {counts.shape[1]} samples")
counts.head()
2026-04-11 20:59:49.437 | INFO | quantnado.analysis.reduce:_log_chromosome_overlap:185 - Chromosome compatibility check: 1 shared chromosomes out of 1 in input ranges and 1 in dataset 2026-04-11 20:59:49.437 | DEBUG | quantnado.analysis.reduce:_log_chromosome_overlap:191 - Shared chromosomes: ['chr21']
986 features × 4 samples
| rna-spikein-control-rep1 | rna-spikein-control-rep2 | rna-spikein-treated-rep1 | rna-spikein-treated-rep2 | |
|---|---|---|---|---|
| LINC00205 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
# RPKM-normalise using transcript lengths
rpkm = qn.normalise(
counts[rna_cols],
method="rpkm",
feature_lengths=features["range_length"],
)
rpkm.head()
2026-04-11 21:00:03.222 | INFO | quantnado.analysis.normalise:_normalise_dataframe:399 - Normalised DataFrame to RPKM.
| rna-spikein-control-rep1 | rna-spikein-control-rep2 | rna-spikein-treated-rep1 | rna-spikein-treated-rep2 | |
|---|---|---|---|---|
| LINC00205 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
| POFUT2 | 0.0 | 0.0 | 0.0 | 0.0 |
# Heatmap of expressed transcripts
qn.heatmap(
rpkm,
exclude_zeros=True,
zscore=0,
log_transform=True,
title="RPKM — GENE transcripts (RNA forward strand)",
figsize=(5, 8),
)
<seaborn.matrix.ClusterGrid at 0x13642e710>
# Spearman correlation between RNA samples
corr_rna, _ = qn.correlate(
rpkm,
log_transform=True,
method="spearman",
title="Spearman correlation — RNA transcript RPKM",
)
/Users/catherine/work/software/QuantNado/.venv/lib/python3.13/site-packages/seaborn/matrix.py:1124: UserWarning: ``square=True`` ignored in clustermap warnings.warn(msg)
PCA¶
qn.pca() reduces a 2D sample matrix to a few principal components. In this tutorial we use locus-scale coverage to show how the same dataset can move from base-resolution signal to sample-level structure in just a few lines.
# GENE-locus coverage across all samples, randomly subsampled to 5k positions
cov = qn.sel_gene(GENE_NAME, padding=0)
pca_obj, pca_result = qn.pca(
cov,
n_components=3,
assay=["RNA"],
# samples=None,
# modality=None,
chromosome=GENE_CHROM,
nan_handling_strategy="set_to_zero",
subset_size=5_000,
subset_strategy="random",
random_state=42,
)
pca_result
INFO:dask.sizeof:Subsetting to 5000 features for PCA INFO:dask.sizeof:Set NaNs to zero for PCA INFO:dask.sizeof:Using random_state=42 for PCA
<xarray.DataArray 'array-6fc5cef151b0371ce7c47e9c6795222e' (sample: 4,
component: 3)> Size: 48B
dask.array<array, shape=(4, 3), dtype=float32, chunksize=(4, 3), chunktype=numpy.ndarray>
Coordinates:
* sample (sample) <U24 384B 'rna-spikein-control-rep1' ... 'rna-spikein...
* component (component) int64 24B 1 2 3# Scree plot — explained variance per component
qn.pca_scree(pca_obj)
<seaborn.axisgrid.FacetGrid at 0x137f85e80>
# Scatter — PC1 vs PC2, coloured by assay
rna_metadata = metadata[metadata["assay"] == "RNA"]
rna_metadata["treatment"] = rna_metadata["sample_id"].apply(
lambda x: "control" if "control" in x else "treated"
)
qn.pca_scatter(
pca_obj,
pca_result,
xaxis_pc=1,
yaxis_pc=2,
metadata_df=rna_metadata,
colour_by="treatment",
sample_column="sample_id",
)
<seaborn.axisgrid.FacetGrid at 0x138bb6710>