Cross-validation
The auto-detector is robust on standard DSTs but no algorithm is
infallible. The welltest_pta.cross_validate_detector() function
quantifies how trustworthy the auto-detection is on this particular
dataset with a 0–100 confidence index.
Three independent stability checks
Check |
Method |
Weight |
|---|---|---|
Bootstrap event-count |
\(K\) random downsample replicas (default \(f = 0.85\)); report \(\sigma\) of detected \(n_{DD}\) and \(n_{BU}\). |
0.40 |
Jaccard edge overlap |
Compare the binary “is-PTA” mask of each replica to the reference mask via \(J = |A \cap B| / |A \cup B|\). |
0.40 |
Parameter sensitivity |
Sweep \(\pm 20\%\) around each of
|
0.20 |
Composite score
The three components combine linearly:
where \(s_{\text{boot}}, s_{\text{sens}} \in [0, 1]\) are normalised penalties (lower \(\sigma\) and lower \(\Delta n\) are better) and \(\overline{J}\) is the mean Jaccard overlap.
The score is mapped to a human-readable grade:
Score range |
Grade |
|---|---|
80 – 100 |
HIGHLY ROBUST — manual review optional |
60 – 80 |
REASONABLE — spot-check critical events |
40 – 60 |
MARGINAL — recommend manual splitting |
0 – 40 |
UNSTABLE — manual splitting strongly advised |
Usage
The simplest way is to enable CV when loading the test:
from welltest_pta import WellTest
wt = WellTest.from_file("DST.txt", cross_validate=True, cv_n_bootstrap=8)
A pretty-printed report is emitted to stdout. The result is also
attached to wt.cv_result for programmatic access:
print(wt.cv_result.overall_score) # e.g. 82.4
print(wt.cv_result.grade) # e.g. "HIGHLY ROBUST"
For finer control, call cross_validate_detector()
directly:
from welltest_pta import cross_validate_detector
from welltest_pta.parser import parse
df = parse("DST.txt")
res = cross_validate_detector(
df,
n_bootstrap=12, # more replicas → tighter σ
downsample_frac=0.80,
perturbation=0.25, # ±25% sensitivity sweep
seed=42,
)
Returned object: DetectorCVResult.
Sample report
════════════════════════════════════════════════════════════════════════
EVENT-DETECTOR CROSS-VALIDATION REPORT
════════════════════════════════════════════════════════════════════════
Reference detection: 2 drawdowns, 2 buildups
Bootstrap stability (K = 8 replicas):
n_drawdowns: 2.00 ± 0.00
n_buildups: 2.00 ± 0.00
Edge-position consistency (Jaccard overlap):
mean = 0.952 (1.0 = perfect)
std = 0.018
Parameter sensitivity (Δ events under ±20 % perturbation):
hampel_sigma Δ_dd = +0, Δ_bu = +0
spike_percentile Δ_dd = +0, Δ_bu = +0
min_pta_dp_psi Δ_dd = +0, Δ_bu = +0
tail_trim_dev_n_sigma Δ_dd = +0, Δ_bu = +0
─ OVERALL CV SCORE: 86.2 / 100 (HIGHLY ROBUST)
════════════════════════════════════════════════════════════════════════
When the score is low
If \(S < 60\):
Re-run with
cv_print=Trueto see which component is failing.Bootstrap σ high ⇒ the test is short or has few clean events; try increasing
min_pta_duration_hr.Jaccard low ⇒ event boundaries are unstable across replicas; tighten
hampel_sigmaandspike_percentile.Sensitivity high ⇒ events are right at the edge of detection thresholds; raise
min_pta_dp_psi.If problems persist, fall back to manual splitting via
split_manual().