import pandas as pd
import starsim as ss
import sciris as sc
import typhoidsim as tyT8 - Calibration
This tutorial drafts how we would set up a simple calibration workflow:
- define which data we want to fit the model to
- set up a simulation object
- define which parameters and which parameter values or ranges we want to test
- set up the calibration oject
- run the calibration
We usually have some data we’d like to fit the model to. The data below is simulated data using specific values for init_prev (0.01) and p_cpg (0.35). The simplest way to pass data for a calibration is in the form of a dataframe with column names matching quantities tracked by the Sim. For instance, n_alive, or typhoid.n_chronic – the cumulative number of chroi
We first make a function that will create a Sim object with the ingredients we believe explain our data, and over a span of time that matches that of our data. Inside this function you will see that we have wrapped the basic setup seen in many of the other tutorials.
def make_sim():
pars = dict(
start = 1982, # Starting year
dur = 9.0, # Duration of the simulation in years
dt = 1.0/365.0, # Timestep of 1 day, expressed in years
verbose = 0, # Do not print details of the run
)
typhoid = ty.Typhoid()
ppl = ss.People(10_000)
sim = ss.Sim(
pars=pars,
people=ppl,
diseases = typhoid,
)
return simdef get_reference_data():
sim = make_sim()
sim.init()
sim.diseases.typhoid.pars.init_prev.pars.p = 0.01
sim.diseases.typhoid.pars.p_cpg = 0.35
sim.run()
df = sim.to_df()
expected_data = pd.DataFrame(data={"n": df["n_alive"],
"x": df["typhoid_n_infected"]},
index=pd.Index(df["timevec"], name="t")
)
return expected_datadef extract_simulated_data(sim):
df = sim.to_df()
simulated_data = pd.DataFrame(data={"n": df["n_alive"],
"x": df["typhoid_n_infected"]},
index=pd.Index(df["timevec"], name="t")
)
return simulated_dataThen, we specifiy which parameters need to be calibrated, and what are the ranges to explore. The calibration parameters has a hierarchical/nested structure similar to the Sim class.
def get_calib_pars():
calib_pars = dict(
init_prev =dict(low=0.00,high=0.1, guess=0.05),
p_cpg=dict(low=0.05, high=0.4, guess=0.05),
)
return calib_parsdef calib_pars_updater(sim, calib_pars, **kwargs):
"""
Also referred to as as build_sim function in some of starsim's tutorials.
This function tells the Calibration class how to reach and update a parameter
value for our specific model encapuslated in the sim object..
The more modules our full model has, the more complex to navigate the path
to find and update the required parameters.
"""
# Access the modules whose parameters we need to modify during optimisation
typh = sim.pars.diseases
if 'rand_seed' in calib_pars:
sim.pars['rand_seed'] = calib_pars.pop('rand_seed')
for par_name, par_attrs in calib_pars.items(): # Loop over the calibration parameters
v = par_attrs["value"]
# Each item in calib_pars is a dictionary with keys like 'low', 'high',
# 'guess', 'suggest_type', and importantly 'value'. The 'value' key is
# the one we want to use as that's the one selected by the algorithm
match par_name:
case "p_cpg":
typh.pars.p_cpg = v
case "init_prev":
typh.pars.init_prev = ss.bernoulli(v)
case _:
raise NotImplementedError(f"Do not know how to update parameter {par_name}.")
return sim def get_calib_components():
expected_data = get_reference_data()
# In Starsim v3, the likelihood function is chosen by selecting a CalibComponent
# subclass (e.g. BetaBinomial, Binomial, GammaPoisson, Normal) rather than via the
# old nll_fn argument. BetaBinomial expects "n" (denominator) and "x" (numerator)
# columns, which is what get_reference_data and extract_simulated_data provide.
components = [ss.BetaBinomial(
name="cases", # NOTE: can be renamed to something else
expected=expected_data, # dataframe
extract_fn=extract_simulated_data, # function
conform="prevalent")]
return componentsNow we set up the put all the elements together using Starsim’s Calibration class that uses optuna
def run_calibration():
sc.heading('Testing calibration')
# The parameters or parameters of each ingredient need to exist in the sim defined in make_sim()
# Make the sim and data
sim = make_sim()
calib_components = get_calib_components()
calib_pars = get_calib_pars()
# Make the calibration
calib = ss.Calibration(
sim = sim,
calib_pars = calib_pars,
build_fn=calib_pars_updater,
components=calib_components,
total_trials=50,
n_workers=4, # the underlying library runs in parallel
db_name="typhoidsim_tutorial",
die=True,
debug=True
)
# Perform the calibration
sc.printcyan('\nPeforming calibration...')
calib.calibrate()
print(calib.best_pars)
return sim, calibrun_calibration()——————————————————— Testing calibration ——————————————————— Peforming calibration... Removed existing calibration file typhoidsim_tutorial sqlite:///typhoidsim_tutorial
[I 2026-06-07 20:36:52,900] A new study created in RDB with name: starsim_calibration [I 2026-06-07 20:37:00,430] Trial 0 finished with value: inf and parameters: {'init_prev': 0.08771719257824223, 'p_cpg': 0.05301682351428011, 'rand_seed': 586907}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:07,810] Trial 1 finished with value: inf and parameters: {'init_prev': 0.09842852377075789, 'p_cpg': 0.050796017006391504, 'rand_seed': 893117}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:14,523] Trial 2 finished with value: inf and parameters: {'init_prev': 0.07511757805853374, 'p_cpg': 0.341788989564013, 'rand_seed': 113951}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:21,971] Trial 3 finished with value: inf and parameters: {'init_prev': 0.07669088847474967, 'p_cpg': 0.06584136979696, 'rand_seed': 896865}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:28,683] Trial 4 finished with value: inf and parameters: {'init_prev': 0.0027571013977813145, 'p_cpg': 0.2589722993699589, 'rand_seed': 525883}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:36,079] Trial 5 finished with value: inf and parameters: {'init_prev': 0.09370866115232589, 'p_cpg': 0.16953206138753363, 'rand_seed': 488691}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:42,799] Trial 6 finished with value: inf and parameters: {'init_prev': 0.0824178224400627, 'p_cpg': 0.2663381943887316, 'rand_seed': 14884}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:50,232] Trial 7 finished with value: inf and parameters: {'init_prev': 0.08203528652011972, 'p_cpg': 0.10708059100802791, 'rand_seed': 985512}. Best is trial 0 with value: inf. [I 2026-06-07 20:37:57,829] Trial 8 finished with value: inf and parameters: {'init_prev': 0.051681553042429024, 'p_cpg': 0.2757170783202996, 'rand_seed': 100128}. Best is trial 0 with value: inf. [I 2026-06-07 20:38:04,415] Trial 9 finished with value: inf and parameters: {'init_prev': 0.07707698124000974, 'p_cpg': 0.08966386065303814, 'rand_seed': 132735}. Best is trial 0 with value: inf. [I 2026-06-07 20:38:11,863] Trial 10 finished with value: inf and parameters: {'init_prev': 0.04842845327229383, 'p_cpg': 0.17080646924049664, 'rand_seed': 612269}. Best is trial 0 with value: inf. [I 2026-06-07 20:38:18,662] Trial 11 finished with value: inf and parameters: {'init_prev': 0.049526718727763465, 'p_cpg': 0.05296598942385594, 'rand_seed': 746893}. Best is trial 0 with value: inf. [I 2026-06-07 20:38:26,262] Trial 12 finished with value: inf and parameters: {'init_prev': 0.09887706692044018, 'p_cpg': 0.14123239741742843, 'rand_seed': 317742}. Best is trial 0 with value: inf.
Making results structure...
Processed 13 trials; 0 failed
Best pars: {'init_prev': 0.08771719257824223, 'p_cpg': 0.05301682351428011, 'rand_seed': 586907}
Removed existing calibration file typhoidsim_tutorial
{'init_prev': 0.08771719257824223, 'p_cpg': 0.05301682351428011, 'rand_seed': 586907}
(Sim(n=10000; 1982—None; not initialized),
<starsim.calibration.Calibration at 0x754be53ef0e0>
[<class 'starsim.calibration.Calibration'>, <class 'sciris.sc_printing.prettyobj'>]
————————————————————————————————————————————————————————————————————————
Methods:
_eval_fit() make_study() run_sim()
_sample_from_trial() parse_study() run_trial()
build_fn() plot() run_workers()
calibrate() plot_final() to_df()
check_fit() plot_optuna() to_json()
eval_fn() remove_db() worker()
————————————————————————————————————————————————————————————————————————
after_msim: None
before_msim: None
best_pars: {'init_prev': 0.08771719257824223, 'p_cpg':
0.05301682351428011, 'ran [...]
build_fn: <function calib_pars_updater at 0x754be6c2b560>
build_kw: {}
calib_pars: {'init_prev': {'low': 0.0, 'high': 0.1, 'guess': 0.05},
'p_cpg': {'lo [...]
calibrated: True
components: [Calibration component with name cases]
df: index mismatch init_prev p_cpg rand_seed
0 0 i [...]
die: True
elapsed: 93.6798152923584
eval_fn: <bound method Calibration._eval_fit of
<starsim.calibration.Calibrati [...]
eval_kw: {}
label: None
prune_fn: None
reseed: True
run_args: #0. 'n_trials': 13
#1. 'n_workers': 4
#2. 'debug': True
#3 [...]
sim: Sim(n=10000; 1982—None; not initialized)
study: <optuna.study.study.Study object at 0x754be3eaba10>
study_data: #0. 'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12]
#1. 'mism [...]
verbose: True
————————————————————————————————————————————————————————————————————————)