utils

utils

General utilities

Functions

Name Description
detect_age_anniversary Detect people who crossed a specific age_anniversary, does not
digitize_ages_1yr This function returns the indices of the 1-year age bins in the range
generate_age_bin_labels Generates consistent age bin labels.
generate_unique_filename Generate a unique filename – useful when running lots of simulations, in a
get_age_distribution_un Parse age distribution data from UN sources.
get_attr_vals Get values of the attribute in attr_path
get_data_home Return a path to the directory for default datasets.
get_dataset_names Provides a list of available datasets in data_home
load_age_dist_un Load UN population age distribution that can be used in place of Pakistan demographics
load_dataset Load default dataset from typhoidsim data directory.
stratify_parameter_by_age Returns a callable that, given an age, returns the value of a parameter
test_cpu_performance Normalize performance across CPUs
to_df Export results as a Pandas dataframe. Available in newer starsim versions (ie, 2.0)

detect_age_anniversary

utils.detect_age_anniversary(sim, age_anniversary)

Detect people who crossed a specific age_anniversary, does not have to be birthday necessarily.

Parameters

Name Type Description Default
sim starsim.Sim object the current simulation object required
age_anniversary float the age we wish to detect required

Returns

Name Type Description
reached_anniv Boolean array

digitize_ages_1yr

utils.digitize_ages_1yr(ages)

This function returns the indices of the 1-year age bins in the range (0, tyd.max_age). The bin index is used as an integer representation of the agent’s age.

generate_age_bin_labels

utils.generate_age_bin_labels(age_bins, inclusive_range=False)

Generates consistent age bin labels.

This function creates age bin labels in the format of “start-end”, and they define a semi-open interval of age [start, end) by default. If inclusive is True, the age bin labels are [start, end-1]

Parameters

Name Type Description Default
age_bins list A list of age bins given as single age values. required
inclusive_range bool whether the age bin labels should represent an inclusive age range [age_low, age_high] or a semi-open interval [age_low, age_high) False

Returns

Name Type Description
list A list of strings representing each age bin in the format “start-end”.

Example

generate_age_bin_labels([10, 20, 30, 40]) [“10-20”, “20-30”, “30-40”] generate_age_bin_labels([10, 20, 30, 40], inclusive_range=True) [“10-19”, “20-19”, “30-39”]

generate_unique_filename

utils.generate_unique_filename(root_str='typhoidsim')

Generate a unique filename – useful when running lots of simulations, in a distributed manner

Parameters

Name Type Description Default
root_str str the string that will be used at the beginning of any filename generated by this function. 'typhoidsim'

Returns

Name Type Description
filename str a unique filename string without file extension.

get_age_distribution_un

utils.get_age_distribution_un(loc_type='Low-and-middle-income countries')

Parse age distribution data from UN sources.

Parameters

Name Type Description Default
loc_type chr which country grouping to extract distribution from 'Low-and-middle-income countries'
Options Low-and-Lower-middle-income countries, Low-and-middle-income countries, required

Returns: df (pd.DataFrame): A dataframe (with columns age and value) that specifies the age distribution for this population.

get_attr_vals

utils.get_attr_vals(sim, attr_path, attr_name)

Get values of the attribute in attr_path

get_data_home

utils.get_data_home(data_home=None)

Return a path to the directory for default datasets. This function is needed by load_dataset(), and avoids the problem of using relative paths.

If the data_home argument is not provided, it will use a directory specified by the TYPHOIDSIM_DATA environment variable (if it exists) or otherwise it will use the default data directory.

get_dataset_names

utils.get_dataset_names(data_home=None)

Provides a list of available datasets in data_home

load_age_dist_un

utils.load_age_dist_un(csv_file='un_pop_dist_bylocation.csv')

Load UN population age distribution that can be used in place of Pakistan demographics

load_dataset

utils.load_dataset(ds_name, data_home=None, **kwargs)

Load default dataset from typhoidsim data directory.

This function provides access to a small collection of datasets that are useful to set empirical distributions (ie, demographics, or gallstone probs by age and gender), rather than having those hardcoded in the code.

The small datasets are expected to be simple tabular data saved in csv files. This function may apply some small amount of preprocessing, but it’s not intended to be a full ingest and preprocessing pipelines. The csv files are expected to be in an already ‘ingestable’ form and simply loaded with pandas.read_csv().

Use get_dataset_names to see a list of available datasets.

Parameters

Name Type Description Default
ds_name str name of the dataset ({name}.csv). required
data_home str / Path the directory in which to cache data. None
kwargs additional keyword arguments passed through to pandas.read_csv. {}

data, type depends on dataset

Name Type Description
df pandas.DataFrame tabular data, with some preprocessing applied
(depends on the dataset)
or
arr numpy.ndarray array data

stratify_parameter_by_age

utils.stratify_parameter_by_age(age_bin_edges, par_bin_values)

Returns a callable that, given an age, returns the value of a parameter assigned to the age bin the given age falls into.

Parameters

Name Type Description Default
age_bin_edges np.ndarray The edges of the age bins. Should be in ascending order. required
par_bin_values np.ndarray The parameter values assigned to each age bin. Should be the of length age_bin_edges - 1. required

Returns

Name Type Description
age_bin_function callable A function that takes an age and returns the parameter value for the bin that the age falls into.

age_bin_edges = np.array([0, 2, 5, 120]) par_bin_values = np.array([904.4, 240.9, 0.0]) age_stratified_parameter = stratify_parameter_by_age(age_bin_edges, par_bin_values) age_stratified_parameter(25) # should return 0.0

test_cpu_performance

utils.test_cpu_performance()

Normalize performance across CPUs

to_df

utils.to_df(sim, sep='_')

Export results as a Pandas dataframe. Available in newer starsim versions (ie, 2.0) Only saves 1D result arrays, discards the results from analyzers, unless they are 1D arrays.