Simulation pipeline

Use this workflow when you need controlled validation data with known ground-truth kinetic parameters. It is not required for normal library usage.

Entrypoint

uv run python simulate.py -c config.yaml -o simulated-data/run-001

Arguments:

  • -c or --config: simulation config YAML
  • -o or --outdir: output directory
  • -b or --batch_size: write batch size, default 10000

Install simulation dependencies

uv sync --group sim

With pip only:

pip install omegaconf pandas pyarrow rich

Synthetic mode

If aif.measured_files is not configured, the simulator:

  • samples kinetic parameters and noise settings from the YAML config
  • generates synthetic time vectors and AIF curves
  • runs the forward compartment model to create TACs

Example:

uv run python simulate.py -c config.yaml -o simulated-data/synth-run

The default config.yaml samples regimes such as:

  • lowflow_lowbind
  • highflow_lowbind
  • modflow_highbind
  • fast_exchange

Measured AIF mode

If aif.measured_files is present, the simulator draws a measured AIF and TIME vector from the configured .npz pool for each sample.

Each .npz file must contain:

  • AIF: one-dimensional array
  • TIME: one-dimensional array in minutes, strictly increasing

Example configuration fragment:

aif:
  measured_files:
    - data/aif/AIF_001.npz
    - data/aif/AIF_002.npz

Behavior:

  • files are sampled with replacement
  • relative paths are resolved relative to the config file
  • aif.params is not required in this mode
  • output rows still include sampled kinetic ground truth

Example:

uv run python simulate.py -c config-measured-aif.yaml -o simulated-data/measured-run

Output format

Each run writes:

<outdir>/simulations.parquet

Each row represents one simulation sample and includes:

  • identifiers and metadata such as sample_id, batch_id, protocol, regime, aif_source
  • ground-truth kinetic parameters such as K1, k2, k3, k4, vB
  • n_frames_used, which indicates the active prefix of the stored frame vectors
  • flattened curve columns:
  • time_000 ... time_XXX
  • aif_000 ... aif_XXX
  • tac_000 ... tac_XXX

Sanity checks

Check row and column counts:

uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/synth-run/simulations.parquet")
print("rows:", len(df))
print("cols:", len(df.columns))
print(df[["sample_id", "regime", "n_frames_used"]].head())
PY

Check which measured AIF sources were sampled:

uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/measured-run/simulations.parquet")
print(df["aif_source"].value_counts().head(10))
PY