Usage Guide¶
This page reflects the current post-refactor API: datasets are opened from a directory of per-sample .zarr stores or from a combined .zarr, and most analysis methods work through QuantNado or QuantNadoDataset.
Opening a Dataset¶
from quantnado import QuantNado, QuantNadoDataset
qn = QuantNado.open("dataset/")
raw = QuantNadoDataset("dataset/")
combined = QuantNado.open("dataset/combined.zarr")
Useful properties:
qn.info returns a compact dataset summary that includes assay-level sample counts, array keys, inferred IPs, and any cached sample-group namespaces.
Region Selection¶
Filter by assay type or explicit sample names:
rna_region = qn.sel("chr21", 36_000_000, 36_010_000, assay="RNA")
atac_region = qn.sel("chr21", 36_000_000, 36_010_000, samples=["ATAC_1"])
You can also build reusable sample groups and intersect them in subset():
qn.group_by(
ip="ip",
treatment={"control": ["control"], "treated": ["treated"]},
replicate={"rep1": ["rep1"], "rep2": ["rep2"]},
spikein={"spikein": ["spikein", "rx"]},
match="contains",
)
chip_subset = qn.subset(
assay="CHIP",
ip="MLL",
group={"spikein": "spikein"},
)
rna_subset = qn.subset(
assay="RNA",
group={
"treatment": ["treated", "control"],
"spikein": "spikein",
"replicate": ["rep1", "rep2"],
},
)
subset() intersects all requested filters. If a combination resolves to no samples, the error message reports which group namespace removed the remaining samples.
Assay vs Modality¶
assaymeans a biological assay such asATAC,RNA, orMETHmodalitymeans an array key such ascoverage,rna_fwd,chip_h3k27ac,methyl_pct, orGT
In practice, use assay= to choose samples and modality= to choose which signal array to analyse.
Reducing Signal Over Intervals¶
Use BED intervals directly:
Or derive intervals from a GTF:
gene_signal = qn.reduce(
gtf_path="genes.gtf",
feature_type="gene",
reduction="sum",
assay="RNA",
modality="coverage",
)
Returned value: xr.Dataset with variables such as sum, count, and mean.
The sample axis remains sample-oriented, so the usual next step is signal["mean"] for PCA, heatmaps, or correlations.
Counting Features¶
counts, features = qn.count_features(
gtf_file="genes.gtf",
feature_type="gene",
engine="signal",
assay="RNA",
)
counts is a feature-by-sample pd.DataFrame. features is the aligned feature metadata table.
For explicit signal quantification without the counting terminology:
signal_matrix, signal_features = qn.quantify_signal(
gtf_file="genes.gtf",
feature_type="gene",
assay="RNA",
modality="coverage",
)
quantify_signal() is the recommended API when you want a feature-by-sample matrix derived from stored QuantNado signal rather than BAM-backed read assignment.
count_features(engine="signal") is still a signal-derived summarisation path. A BAM-backed engine="bam" mode is planned for featureCounts-style read assignment semantics.
You can also count over BED intervals:
counts, features = qn.count_features(
bed_file="peaks.bed",
samples=["ATAC_1", "ATAC_2"],
modality="coverage",
)
Extracting Binned Signal¶
extract() bins signal around promoters, genes, transcripts, or exons.
binned = qn.extract(
feature_type="promoter",
GTF_FILE="genes.gtf",
assay="ATAC",
modality="coverage",
upstream=2000,
downstream=2000,
bin_size=50,
)
Returned value: xr.DataArray with dimensions (interval, bin, sample).
Use this output with metaplot() or tornadoplot().
Normalisation¶
Normalise reduced signal, extracted signal, or count matrices:
cpm_signal = qn.normalise(signal, method="cpm")
rpkm_counts = qn.normalise(
counts,
method="rpkm",
feature_lengths=features["range_length"],
)
Library sizes are available directly:
PCA¶
On a reduced matrix:
Here pca_result is sample-by-component output, ready for pca_scatter().
Or directly from one chromosome and modality:
Visualisation¶
qn.heatmap(signal, variable="mean", title="Promoter signal")
qn.correlate(signal, variable="mean", title="Sample correlation")
qn.metaplot(binned, modality="coverage", title="Metaplot")
qn.tornadoplot(binned, modality="coverage", title="Tornado plot")
Use heatmap() and correlate() on reduced matrices, and metaplot() / tornadoplot() on extracted binned signal.
Annotation-Aware Queries¶
qn = QuantNadoDataset("dataset/", annotation="genes.gtf")
info = qn.gene_info("GNAQ")
gene_region = qn.sel_gene("GNAQ", padding=2000)
Combining Per-Sample Stores¶
The combined store keeps the same read API and is often easier to move or share.
SeqNado Integration¶
from quantnado import metadata_from_seqnado
metadata = metadata_from_seqnado("my_seqnado_project", output_dir=".")
This writes a quantnado_metadata.csv that is still useful for bookkeeping or batch scripting, but the main CLI now creates stores one sample at a time with quantnado dataset create.