Simulation pipeline¶

Use this workflow when you need controlled validation data with known ground-truth kinetic parameters. It is not required for normal library usage.

Entrypoint¶

uv run python simulate.py -c config.yaml -o simulated-data/run-001

Arguments:

-c or --config: simulation config YAML
-o or --outdir: output directory
-b or --batch_size: write batch size, default 10000

Install simulation dependencies¶

uv sync --group sim

With pip only:

pip install omegaconf pandas pyarrow rich

Synthetic mode¶

If aif.measured_files is not configured, the simulator:

samples kinetic parameters and noise settings from the YAML config
generates synthetic time vectors and AIF curves
runs the forward compartment model to create TACs

Example:

uv run python simulate.py -c config.yaml -o simulated-data/synth-run

The default config.yaml samples regimes such as:

lowflow_lowbind
highflow_lowbind
modflow_highbind
fast_exchange

Measured AIF mode¶

If aif.measured_files is present, the simulator draws a measured AIF and TIME vector from the configured .npz pool for each sample.

Each .npz file must contain:

AIF: one-dimensional array
TIME: one-dimensional array in minutes, strictly increasing

Example configuration fragment:

aif:
  measured_files:
    - data/aif/AIF_001.npz
    - data/aif/AIF_002.npz

Behavior:

files are sampled with replacement
relative paths are resolved relative to the config file
aif.params is not required in this mode
output rows still include sampled kinetic ground truth

Example:

uv run python simulate.py -c config-measured-aif.yaml -o simulated-data/measured-run

Output format¶

Each run writes:

<outdir>/simulations.parquet

Each row represents one simulation sample and includes:

identifiers and metadata such as sample_id, batch_id, protocol, regime, aif_source
ground-truth kinetic parameters such as K1, k2, k3, k4, vB
n_frames_used, which indicates the active prefix of the stored frame vectors
flattened curve columns:
time_000 ... time_XXX
aif_000 ... aif_XXX
tac_000 ... tac_XXX

Sanity checks¶

Check row and column counts:

uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/synth-run/simulations.parquet")
print("rows:", len(df))
print("cols:", len(df.columns))
print(df[["sample_id", "regime", "n_frames_used"]].head())
PY

Check which measured AIF sources were sampled:

uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/measured-run/simulations.parquet")
print(df["aif_source"].value_counts().head(10))
PY