Simulation pipeline¶
Use this workflow when you need controlled validation data with known ground-truth kinetic parameters. It is not required for normal library usage.
Entrypoint¶
uv run python simulate.py -c config.yaml -o simulated-data/run-001
Arguments:
-cor--config: simulation config YAML-oor--outdir: output directory-bor--batch_size: write batch size, default10000
Install simulation dependencies¶
uv sync --group sim
With pip only:
pip install omegaconf pandas pyarrow rich
Synthetic mode¶
If aif.measured_files is not configured, the simulator:
- samples kinetic parameters and noise settings from the YAML config
- generates synthetic time vectors and AIF curves
- runs the forward compartment model to create TACs
Example:
uv run python simulate.py -c config.yaml -o simulated-data/synth-run
The default config.yaml samples regimes such as:
lowflow_lowbindhighflow_lowbindmodflow_highbindfast_exchange
Measured AIF mode¶
If aif.measured_files is present, the simulator draws a measured AIF and TIME vector from the configured .npz pool for each sample.
Each .npz file must contain:
AIF: one-dimensional arrayTIME: one-dimensional array in minutes, strictly increasing
Example configuration fragment:
aif:
measured_files:
- data/aif/AIF_001.npz
- data/aif/AIF_002.npz
Behavior:
- files are sampled with replacement
- relative paths are resolved relative to the config file
aif.paramsis not required in this mode- output rows still include sampled kinetic ground truth
Example:
uv run python simulate.py -c config-measured-aif.yaml -o simulated-data/measured-run
Output format¶
Each run writes:
<outdir>/simulations.parquet
Each row represents one simulation sample and includes:
- identifiers and metadata such as
sample_id,batch_id,protocol,regime,aif_source - ground-truth kinetic parameters such as
K1,k2,k3,k4,vB n_frames_used, which indicates the active prefix of the stored frame vectors- flattened curve columns:
time_000 ... time_XXXaif_000 ... aif_XXXtac_000 ... tac_XXX
Sanity checks¶
Check row and column counts:
uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/synth-run/simulations.parquet")
print("rows:", len(df))
print("cols:", len(df.columns))
print(df[["sample_id", "regime", "n_frames_used"]].head())
PY
Check which measured AIF sources were sampled:
uv run python - <<'PY'
import pandas as pd
df = pd.read_parquet("simulated-data/measured-run/simulations.parquet")
print(df["aif_source"].value_counts().head(10))
PY