Prediction and benchmarking

After generating simulation data, use the prediction script to fit the simulated TACs back and compare estimates against known ground truth.

Prediction script

Entrypoint:

uv run python predict_method.py \
  simulated-data/synth-run \
  lls_4k_vB \
  v1 \
  predictions

Arguments:

  • dataset_dir: directory containing parquet simulation data
  • method: one of lls_3k, lls_3k_vB, lls_4k, lls_4k_vB
  • version: run label used in the output path
  • save_dir: base directory for predictions
  • --vB: required for fixed-vB methods such as lls_3k and lls_4k

Fixed-vB example:

uv run python predict_method.py \
  simulated-data/synth-run \
  lls_4k \
  v1-fixed-vb \
  predictions \
  --vB 0.05

Output layout

Predictions are written to:

<save_dir>/<method>/<version>/

The parquet output contains:

  • fitted parameters such as K1_fit, k2_fit, k3_fit, k4_fit, vB_fit
  • mse
  • fitted TAC columns tac_fit_000 ... tac_fit_XXX
  • the run identifiers sample_id, method, and version

How to use this for validation

The intended workflow is:

  1. Simulate data with known ground truth.
  2. Fit the same samples with one or more solver variants.
  3. Aggregate per-parameter error metrics.
  4. Inspect representative plots and summary tables before accepting a solver change.

The benchmarks/ directory in this repository shows the kind of artifacts worth keeping:

  • metric summary tables
  • error distributions
  • truth-versus-estimate plots

Interpreting results

Treat benchmark numbers as validation evidence, not general performance guarantees. They are:

  • representative of the chosen simulation regimes
  • sensitive to the forward model and noise assumptions in the YAML configs
  • useful for regression detection across commits