Prediction and benchmarking¶
After generating simulation data, use the prediction script to fit the simulated TACs back and compare estimates against known ground truth.
Prediction script¶
Entrypoint:
uv run python predict_method.py \
simulated-data/synth-run \
lls_4k_vB \
v1 \
predictions
Arguments:
dataset_dir: directory containing parquet simulation datamethod: one oflls_3k,lls_3k_vB,lls_4k,lls_4k_vBversion: run label used in the output pathsave_dir: base directory for predictions--vB: required for fixed-vBmethods such aslls_3kandlls_4k
Fixed-vB example:
uv run python predict_method.py \
simulated-data/synth-run \
lls_4k \
v1-fixed-vb \
predictions \
--vB 0.05
Output layout¶
Predictions are written to:
<save_dir>/<method>/<version>/
The parquet output contains:
- fitted parameters such as
K1_fit,k2_fit,k3_fit,k4_fit,vB_fit mse- fitted TAC columns
tac_fit_000 ... tac_fit_XXX - the run identifiers
sample_id,method, andversion
How to use this for validation¶
The intended workflow is:
- Simulate data with known ground truth.
- Fit the same samples with one or more solver variants.
- Aggregate per-parameter error metrics.
- Inspect representative plots and summary tables before accepting a solver change.
The benchmarks/ directory in this repository shows the kind of artifacts worth keeping:
- metric summary tables
- error distributions
- truth-versus-estimate plots
Interpreting results¶
Treat benchmark numbers as validation evidence, not general performance guarantees. They are:
- representative of the chosen simulation regimes
- sensitive to the forward model and noise assumptions in the YAML configs
- useful for regression detection across commits