Cross-validation
================

The auto-detector is robust on standard DSTs but no algorithm is
infallible. The :func:`welltest_pta.cross_validate_detector` function
quantifies how trustworthy the auto-detection is on **this particular
dataset** with a 0–100 confidence index.


Three independent stability checks
----------------------------------

.. list-table::
   :header-rows: 1
   :widths: 30 50 20

   * - Check
     - Method
     - Weight
   * - Bootstrap event-count
     - :math:`K` random downsample replicas (default
       :math:`f = 0.85`); report :math:`\sigma` of detected
       :math:`n_{DD}` and :math:`n_{BU}`.
     - 0.40
   * - Jaccard edge overlap
     - Compare the binary "is-PTA" mask of each replica to the
       reference mask via :math:`J = |A \cap B| / |A \cup B|`.
     - 0.40
   * - Parameter sensitivity
     - Sweep :math:`\pm 20\%` around each of
       ``hampel_sigma``, ``spike_percentile``, ``min_pta_dp_psi``,
       ``tail_trim_dev_n_sigma``; report the worst-case event-count
       drift.
     - 0.20


Composite score
---------------

The three components combine linearly:

.. math::

   S \;=\; 100 \cdot \big(\,
     0.40 \cdot s_{\text{boot}}
     + 0.40 \cdot \overline{J}
     + 0.20 \cdot s_{\text{sens}}
   \,\big)

where :math:`s_{\text{boot}}, s_{\text{sens}} \in [0, 1]` are normalised
penalties (lower :math:`\sigma` and lower :math:`\Delta n` are better)
and :math:`\overline{J}` is the mean Jaccard overlap.

The score is mapped to a human-readable grade:

============  ==========================================================
Score range   Grade
============  ==========================================================
80 – 100      **HIGHLY ROBUST** — manual review optional
60 – 80       **REASONABLE** — spot-check critical events
40 – 60       **MARGINAL** — recommend manual splitting
0 – 40        **UNSTABLE** — manual splitting strongly advised
============  ==========================================================


Usage
-----

The simplest way is to enable CV when loading the test:

.. code-block:: python

   from welltest_pta import WellTest

   wt = WellTest.from_file("DST.txt", cross_validate=True, cv_n_bootstrap=8)

A pretty-printed report is emitted to stdout. The result is also
attached to ``wt.cv_result`` for programmatic access:

.. code-block:: python

   print(wt.cv_result.overall_score)   # e.g. 82.4
   print(wt.cv_result.grade)           # e.g. "HIGHLY ROBUST"

For finer control, call :func:`~welltest_pta.cross_validate_detector`
directly:

.. code-block:: python

   from welltest_pta import cross_validate_detector
   from welltest_pta.parser import parse

   df = parse("DST.txt")
   res = cross_validate_detector(
       df,
       n_bootstrap=12,                 # more replicas → tighter σ
       downsample_frac=0.80,
       perturbation=0.25,              # ±25% sensitivity sweep
       seed=42,
   )

Returned object: :class:`~welltest_pta.DetectorCVResult`.


Sample report
-------------

.. code-block:: text

   ════════════════════════════════════════════════════════════════════════
     EVENT-DETECTOR CROSS-VALIDATION REPORT
   ════════════════════════════════════════════════════════════════════════
     Reference detection:   2 drawdowns, 2 buildups

     Bootstrap stability  (K = 8 replicas):
       n_drawdowns:      2.00 ± 0.00
       n_buildups:       2.00 ± 0.00

     Edge-position consistency (Jaccard overlap):
       mean = 0.952   (1.0 = perfect)
       std  = 0.018

     Parameter sensitivity (Δ events under ±20 % perturbation):
       hampel_sigma             Δ_dd = +0,  Δ_bu = +0
       spike_percentile         Δ_dd = +0,  Δ_bu = +0
       min_pta_dp_psi           Δ_dd = +0,  Δ_bu = +0
       tail_trim_dev_n_sigma    Δ_dd = +0,  Δ_bu = +0

     ─ OVERALL CV SCORE:   86.2 / 100   (HIGHLY ROBUST)
   ════════════════════════════════════════════════════════════════════════


When the score is low
---------------------

If :math:`S < 60`:

1. Re-run with ``cv_print=True`` to see *which* component is failing.
2. **Bootstrap σ high** ⇒ the test is short or has few clean events;
   try increasing ``min_pta_duration_hr``.
3. **Jaccard low** ⇒ event boundaries are unstable across replicas;
   tighten ``hampel_sigma`` and ``spike_percentile``.
4. **Sensitivity high** ⇒ events are right at the edge of detection
   thresholds; raise ``min_pta_dp_psi``.
5. If problems persist, fall back to manual splitting via
   :meth:`~welltest_pta.WellTest.split_manual`.