madminer.ml module

class madminer.ml.EnsembleForge(estimators=None)

Bases: object

Ensemble methods for likelihood ratio and score information.

Generally, EnsembleForge instances can be used very similarly to MLForge instances:

  • The initialization of EnsembleForge takes a list of (trained or untrained) MLForge instances.
  • The methods EnsembleForge.train_one() and EnsembleForge.train_all() train the estimators (this can also be done outside of EnsembleForge).
  • EnsembleForge.calculate_expectation() can be used to calculate the expectation of the estimation likelihood ratio or the expected estimated score over a validation sample. Ideally (and assuming the correct sampling), these expectation values should be close to zero. Deviations from zero therefore point out that the estimator is probably inaccurate.
  • EnsembleForge.evaluate() and EnsembleForge.calculate_fisher_information() can then be used to calculate ensemble predictions. The user has the option to treat all estimators equally (‘committee method’) or to give those with expected score / ratio close to zero a higher weight.
  • EnsembleForge.save() and EnsembleForge.load() can store all estimators in one folder.

The individual estimators in the ensemble can be trained with different methods, but they have to be of the same type: either all estimators are single-parameterized likelihood ratio estimators, or all estimators are doubly-parameterized likelihood estimators, or all estimators are local score regressors.

Parameters:
estimators : None or int or list of (MLForge or str), optional

If int, sets the number of estimators that will be created as new MLForge instances. If list, sets the estimators directly, either from MLForge instances or filenames (that are then loaded with MLForge.load()). If None, the ensemble is initialized without estimators. Note that the estimators have to be consistent: either all of them are trained with a local score method (‘sally’ or ‘sallino’); or all of them are trained with a single-parameterized method (‘carl’, ‘rolr’, ‘rascal’, ‘scandal’, ‘alice’, or ‘alices’); or all of them are trained with a doubly parameterized method (‘carl2’, ‘rolr2’, ‘rascal2’, ‘alice2’, or ‘alices2’). Mixing estimators of different types within one of these three categories is supported, but mixing estimators from different categories is not and will raise a RuntimeException. Default value: None.

Attributes:
estimators : list of MLForge

The estimators in the form of MLForge instances.

Methods

add_estimator(estimator) Adds an estimator to the ensemble.
calculate_expectation(x_filename[, …]) Calculates the expectation of the estimation likelihood ratio or the expected estimated score over a validation sample.
calculate_fisher_information(x[, …]) Calculates expected Fisher information matrices for an ensemble of SALLY estimators.
evaluate(x_filename[, theta0_filename, …]) Evaluates the estimators of the likelihood ratio (or, if method is ‘sally’ or ‘sallino’, the score), and calculates the ensemble mean or variance.
load(folder) Loads the estimator ensemble from a folder.
save(folder[, save_model]) Saves the estimator ensemble to a folder.
train_all(**kwargs) Trains all estimators.
train_one(i, **kwargs) Trains an individual estimator.
add_estimator(estimator)

Adds an estimator to the ensemble.

Parameters:
estimator : MLForge or str

The estimator, either as MLForge instance or filename (which is then loaded with MLForge.load()).

Returns:
None
calculate_expectation(x_filename, theta0_filename=None, theta1_filename=None)

Calculates the expectation of the estimation likelihood ratio or the expected estimated score over a validation sample. Ideally (and assuming the correct sampling), these expectation values should be close to zero. Deviations from zero therefore point out that the estimator is probably inaccurate.

Parameters:
x_filename : str

Path to an unweighted sample of observations, as saved by the madminer.sampling.SampleAugmenter functions.

theta0_filename : str or None, optional

Path to an unweighted sample of numerator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimators were trained with the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘carl’, ‘carl2’, ‘nde’, ‘rascal’, ‘rascal2’, ‘rolr’, ‘rolr2’, or ‘scandal’ method. Default value: None.

theta1_filename : str or None, optional

Path to an unweighted sample of denominator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimators were trained with the ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, or ‘rolr2’ method. Default value: None.

Returns:
expectations : ndarray

Expected score (if the estimators were trained with the ‘sally’ or ‘sallino’ methods) or likelihood ratio (otherwise).

calculate_fisher_information(x, obs_weights=None, n_events=1, mode='score', uncertainty='ensemble', vote_expectation_weight=None, return_individual_predictions=False)

Calculates expected Fisher information matrices for an ensemble of SALLY estimators.

There are two ways of calculating the ensemble average. In the default “score” mode, the ensemble average for the score is calculated for each event, and the Fisher information is calculated based on these mean scores. In the “information” mode, the Fisher information is calculated for each estimator separately and the ensemble mean is calculated only for the final Fisher information matrix. The “score” mode is generally assumed to be more precise and is the default.

In the “score” mode, the covariance matrix of the final result is calculated in the following way: - For each event x and each estimator a, the “shifted” predicted score is calculated as

t_a’(x) = t(x) + 1/sqrt(n) * (t_a(x) - t(x)). Here t(x) is the mean score (averaged over the ensemble) for this event, t_a(x) is the prediction of estimator a for this event, and n is the number of estimators. The ensemble variance of these shifted score predictions is equal to the uncertainty on the mean of the ensemble of original predictions.
  • For each estimator a, the shifted Fisher information matrix I_a’ is calculated from the shifted predicted scores.
  • The ensemble covariance between all Fisher information matrices I_a’ is calculated and taken as the measure of uncertainty on the Fisher information calculated from the mean scores.

In the “information” mode, the user has the option to treat all estimators equally (‘committee method’) or to give those with expected score close to zero (as calculated by calculate_expectation()) a higher weight. In this case, the ensemble mean I is calculated as I = sum_i w_i I_i with weights w_i = exp(-vote_expectation_weight |E[t_i]|) / sum_j exp(-vote_expectation_weight |E[t_k]|). Here I_i are the individual estimators and E[t_i] is the expectation value calculated by calculate_expectation().

Parameters:
x : str or ndarray

Sample of observations, or path to numpy file with observations, as saved by the madminer.sampling.SampleAugmenter functions. Note that this sample has to be sampled from the reference parameter where the score is estimated with the SALLY / SALLINO estimator!

obs_weights : None or ndarray, optional

Weights for the observations. If None, all events are taken to have equal weight. Default value: None.

n_events : float, optional

Expected number of events for which the kinematic Fisher information should be calculated. Default value: 1.

mode : {“score”, “information”}, optional

If mode is “information”, the Fisher information for each estimator is calculated individually and only then are the sample mean and covariance calculated. If mode is “score”, the sample mean is calculated for the score for each event. Default value: “score”.

uncertainty : {“ensemble”, “expectation”, “sum”}, optional

How the covariance matrix of the Fisher information estimate is calculate. With “ensemble”, the ensemble covariance is used (only supported if mode is “information”). With “expectation”, the expectation of the score is used as a measure of the uncertainty of the score estimator, and this uncertainty is propagated through to the covariance matrix. With “sum”, both terms are summed (only supported if mode is “information”). Default value: “ensemble”.

vote_expectation_weight : float or list of float or None, optional

If mode is “information”, this factor determines how much more weight is given to those estimators with small expectation value (as calculated by calculate_expectation()). If a list is given, results are returned for each element in the list. If None, or if calculate_expectation() has not been called, all estimators are treated equal. Default value: None.

return_individual_predictions : bool, optional

If mode is “information”, sets whether the individual estimator predictions are returned. Default value: False.

Returns:
mean_prediction : ndarray or list of ndarray

The (weighted) ensemble mean of the estimators. If the estimators were trained with method=’sally’ or method=’sallino’, this is an array of the estimator for t(x_i | theta_ref) for all events i. Otherwise, the estimated likelihood ratio (if test_all_combinations is True, the result has shape (n_thetas, n_x), otherwise, it has shape (n_samples,)). If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight.

covariance : ndarray or list of ndarray

The covariance matrix of the Fisher information estimate. Its definition depends on the value of uncertainty; by default, the covariance is defined as the ensemble covariance (only supported if mode is “information”). This object has four indices, cov_(ij)(i’j’), ordered as i j i’ j’. It has shape (n_parameters, n_parameters, n_parameters, n_parameters). If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight.

weights : ndarray or list of ndarray

Only returned if return_individual_predictions is True. The estimator weights w_i. If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight.

individual_predictions : ndarray

Only returned if return_individual_predictions is True. The individual estimator predictions.

evaluate(x_filename, theta0_filename=None, theta1_filename=None, test_all_combinations=True, vote_expectation_weight=None, calculate_covariance=False, return_individual_predictions=False)

Evaluates the estimators of the likelihood ratio (or, if method is ‘sally’ or ‘sallino’, the score), and calculates the ensemble mean or variance.

The user has the option to treat all estimators equally (‘committee method’) or to give those with expected score / ratio close to zero (as calculated by calculate_expectation()) a higher weight. In the latter case, the ensemble mean f(x) is calculated as f(x) = sum_i w_i f_i(x) with weights w_i = exp(-vote_expectation_weight |E[f_i]|) / sum_j exp(-vote_expectation_weight |E[f_j]|). Here f_i(x) are the individual estimators and E[f_i] is the expectation value calculated by calculate_expectation().

Parameters:
x_filename : str

Path to an unweighted sample of observations, as saved by the madminer.sampling.SampleAugmenter functions.

theta0_filename : str or None, optional

Path to an unweighted sample of numerator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimator was trained with the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘carl’, ‘carl2’, ‘nde’, ‘rascal’, ‘rascal2’, ‘rolr’, ‘rolr2’, or ‘scandal’ method. Default value: None.

theta1_filename : str or None, optional

Path to an unweighted sample of denominator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimator was trained with the ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, or ‘rolr2’ method. Default value: None.

test_all_combinations : bool, optional

If method is not ‘sally’ and not ‘sallino’: If False, the number of samples in the observable and theta files has to match, and the likelihood ratio is evaluated only for the combinations r(x_i | theta0_i, theta1_i). If True, r(x_i | theta0_j, theta1_j) for all pairwise combinations i, j are evaluated. Default value: True.

vote_expectation_weight : float or list of float or None, optional

Factor that determines how much more weight is given to those estimators with small expectation value (as calculated by calculate_expectation()). If a list is given, results are returned for each element in the list. If None, or if calculate_expectation() has not been called, all estimators are treated equal. Default value: None.

calculate_covariance : bool, optional

Whether the covariance matrix is calculated. Default value: False.

return_individual_predictions : bool, optional

Whether the individual estimator predictions are returned. Default value: False.

Returns:
mean_prediction : ndarray or list of ndarray

The (weighted) ensemble mean of the estimators. If the estimators were trained with method=’sally’ or method=’sallino’, this is an array of the estimator for t(x_i | theta_ref) for all events i. Otherwise, the estimated likelihood ratio (if test_all_combinations is True, the result has shape (n_thetas, n_x), otherwise, it has shape (n_samples,)). If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight.

covariance : None or ndarray or list of ndarray

The covariance matrix of the (flattened) predictions, defined as the ensemble covariance. If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight. If calculate_covariance is False, None is returned.

weights : ndarray or list of ndarray

Only returned if return_individual_predictions is True. The estimator weights w_i. If more then one value vote_expectation_weight is given, this is a list with results for all entries in vote_expectation_weight.

individual_predictions : ndarray

Only returned if return_individual_predictions is True. The individual estimator predictions.

load(folder)

Loads the estimator ensemble from a folder.

Parameters:
folder : str

Path to the folder.

Returns:
None
save(folder, save_model=False)

Saves the estimator ensemble to a folder.

Parameters:
folder : str

Path to the folder.

save_model : bool, optional

If True, the whole model is saved in addition to the state dict. This is not necessary for loading it again with EnsembleForge.load(), but can be useful for debugging, for instance to plot the computational graph.

Returns:
None
train_all(**kwargs)

Trains all estimators. See MLForge.train().

Parameters:
kwargs : dict

Parameters for MLForge.train(). If a value in this dict is a list, it has to have length n_estimators and contain one value of this parameter for each of the estimators. Otherwise the value is used as parameter for the training of all the estimators.

Returns:
None
train_one(i, **kwargs)

Trains an individual estimator. See MLForge.train().

Parameters:
i : int

The index 0 <= i < n_estimators of the estimator to be trained.

kwargs : dict

Parameters for MLForge.train().

Returns:
None
class madminer.ml.MLForge

Bases: object

Estimating likelihood ratios and scores with machine learning.

Each instance of this class represents one neural estimator. The most important functions are:

  • MLForge.train() to train an estimator. The keyword method determines the inference technique and whether a class instance represents a single-parameterized likelihood ratio estimator, a doubly-parameterized likelihood ratio estimator, or a local score estimator.
  • MLForge.evaluate() to evaluate the estimator.
  • MLForge.save() to save the trained model to files.
  • MLForge.load() to load the trained model from files.

Please see the tutorial for a detailed walk-through.

Methods

calculate_fisher_information(x[, weights, …]) Calculates the expected Fisher information matrix based on the kinematic information in a given number of events.
evaluate(x[, theta0_filename, …]) Evaluates a trained estimator of the log likelihood ratio (or, if method is ‘sally’ or ‘sallino’, the score).
load(filename) Loads a trained model from files.
save(filename[, save_model]) Saves the trained model to four files: a JSON file with the settings, a pickled pyTorch state dict file, and numpy files for the mean and variance of the inputs (used for input scaling).
train(method, x_filename[, y_filename, …]) Trains a neural network to estimate either the likelihood ratio or, if method is ‘sally’ or ‘sallino’, the score.
calculate_fisher_information(x, weights=None, n_events=1)

Calculates the expected Fisher information matrix based on the kinematic information in a given number of events. Currently only supported for estimators trained with method=’sally’ or method=’sallino’.

Parameters:
x : str or ndarray

Sample of observations, or path to numpy file with observations, as saved by the madminer.sampling.SampleAugmenter functions. Note that this sample has to be sampled from the reference parameter where the score is estimated with the SALLY / SALLINO estimator!

weights : None or ndarray, optional

Weights for the observations. If None, all events are taken to have equal weight. Default value: None.

n_events : float, optional

Expected number of events for which the kinematic Fisher information should be calculated. Default value: 1.

Returns:
fisher_information : ndarray

Expected kinematic Fisher information matrix with shape (n_parameters, n_parameters).

evaluate(x, theta0_filename=None, theta1_filename=None, test_all_combinations=True, evaluate_score=False, return_grad_x=False)

Evaluates a trained estimator of the log likelihood ratio (or, if method is ‘sally’ or ‘sallino’, the score).

Parameters:
x : str or ndarray

Sample of observations, or path to numpy file with observations, as saved by the madminer.sampling.SampleAugmenter functions.

theta0_filename : str or None, optional

Path to an unweighted sample of numerator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimator was trained with the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘carl’, ‘carl2’, ‘nde’, ‘rascal’, ‘rascal2’, ‘rolr’, ‘rolr2’, or ‘scandal’ method. Default value: None.

theta1_filename : str or None, optional

Path to an unweighted sample of denominator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required if the estimator was trained with the ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, or ‘rolr2’ method. Default value: None.

test_all_combinations : bool, optional

If method is not ‘sally’ and not ‘sallino’: If False, the number of samples in the observable and theta files has to match, and the likelihood ratio is evaluated only for the combinations r(x_i | theta0_i, theta1_i). If True, r(x_i | theta0_j, theta1_j) for all pairwise combinations i, j are evaluated. Default value: True.

evaluate_score : bool, optional

If method is not ‘sally’ and not ‘sallino’, this sets whether in addition to the likelihood ratio the score is evaluated. Default value: False.

return_grad_x : bool, optional

If True, grad_x log r(x) or grad_x t(x) (for ‘sally’ or ‘sallino’ estimators) are returned in addition to the other outputs. Default value: False.

Returns:
sally_estimated_score : ndarray

Only returned if the network was trained with method=’sally’ or method=’sallino’. In this case, an array of the estimator for t(x_i | theta_ref) is returned for all events i.

log_likelihood_ratio : ndarray

Only returned if the network was trained with neither method=’sally’ nor method=’sallino’. The estimated log likelihood ratio. If test_all_combinations is True, the result has shape (n_thetas, n_x). Otherwise, it has shape (n_samples,).

score_theta0 : ndarray or None

Only returned if the network was trained with neither method=’sally’ nor method=’sallino’. None if evaluate_score is False. Otherwise the derived estimated score at theta0. If test_all_combinations is True, the result has shape (n_thetas, n_x, n_parameters). Otherwise, it has shape (n_samples, n_parameters).

score_theta1 : ndarray or None

Only returned if the network was trained with neither method=’sally’ nor method=’sallino’. None if evaluate_score is False, or the network was trained with any method other than ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, or ‘rolr2’. Otherwise the derived estimated score at theta1. If test_all_combinations is True, the result has shape (n_thetas, n_x, n_parameters). Otherwise, it has shape (n_samples, n_parameters).

grad_x : ndarray

Only returned if return_grad_x is True.

load(filename)

Loads a trained model from files.

Parameters:
filename : str

Path to the files. ‘_settings.json’ and ‘_state_dict.pl’ will be added.

Returns:
None
save(filename, save_model=False)

Saves the trained model to four files: a JSON file with the settings, a pickled pyTorch state dict file, and numpy files for the mean and variance of the inputs (used for input scaling).

Parameters:
filename : str

Path to the files. ‘_settings.json’ and ‘_state_dict.pl’ will be added.

save_model : bool, optional

If True, the whole model is saved in addition to the state dict. This is not necessary for loading it again with MLForge.load(), but can be useful for debugging, for instance to plot the computational graph.

Returns:
None
train(method, x_filename, y_filename=None, theta0_filename=None, theta1_filename=None, r_xz_filename=None, t_xz0_filename=None, t_xz1_filename=None, features=None, nde_type='mafmog', n_hidden=(100, 100), activation='tanh', maf_n_mades=3, maf_batch_norm=False, maf_batch_norm_alpha=0.1, maf_mog_n_components=10, alpha=1.0, trainer='amsgrad', n_epochs=50, batch_size=128, initial_lr=0.001, final_lr=0.0001, nesterov_momentum=None, validation_split=None, early_stopping=True, scale_inputs=True, shuffle_labels=False, grad_x_regularization=None, limit_samplesize=None, return_first_loss=False, verbose=False)

Trains a neural network to estimate either the likelihood ratio or, if method is ‘sally’ or ‘sallino’, the score.

The keyword method determines the structure of the estimator that an instance of this class represents:

  • For ‘alice’, ‘alices’, ‘carl’, ‘nde’, ‘rascal’, ‘rolr’, and ‘scandal’, the neural network models the likelihood ratio as a function of the observables x and the numerator hypothesis theta0, while the denominator hypothesis is kept at a fixed reference value (“single-parameterized likelihood ratio estimator”). In addition to the likelihood ratio, the estimator allows to estimate the score at theta0.
  • For ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, and ‘rolr2’, the neural network models the likelihood ratio as a function of the observables x, the numerator hypothesis theta0, and the denominator hypothesis theta1 (“doubly parameterized likelihood ratio estimator”). The score at theta0 and theta1 can also be evaluated.
  • For ‘sally’ and ‘sallino’, the neural networks models the score evaluated at some reference hypothesis (“local score regression”). The likelihood ratio cannot be estimated directly from the neural network, but can be estimated in a second step through density estimation in the estimated score space.
Parameters:
method : str

The inference method used. Allows values are ‘alice’, ‘alices’, ‘carl’, ‘nde’, ‘rascal’, ‘rolr’, and ‘scandal’ for a single-parameterized likelihood ratio estimator; ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, and ‘rolr2’ for a doubly-parameterized likelihood ratio estimator; and ‘sally’ and ‘sallino’ for local score regression.

x_filename : str

Path to an unweighted sample of observations, as saved by the madminer.sampling.SampleAugmenter functions. Required for all inference methods.

y_filename : str or None, optional

Path to an unweighted sample of class labels, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘carl’, ‘carl2’, ‘rascal’, ‘rascal2’, ‘rolr’, and ‘rolr2’ methods. Default value: None.

theta0_filename : str or None, optional

Path to an unweighted sample of numerator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘carl’, ‘carl2’, ‘nde’, ‘rascal’, ‘rascal2’, ‘rolr’, ‘rolr2’, and ‘scandal’ methods. Default value: None.

theta1_filename : str or None, optional

Path to an unweighted sample of denominator parameters, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘alice2’, ‘alices2’, ‘carl2’, ‘rascal2’, and ‘rolr2’ methods. Default value: None.

r_xz_filename : str or None, optional

Path to an unweighted sample of joint likelihood ratios, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘alice’, ‘alice2’, ‘alices’, ‘alices2’, ‘rascal’, ‘rascal2’, ‘rolr’, and ‘rolr2’ methods. Default value: None.

t_xz0_filename : str or None, optional

Path to an unweighted sample of joint scores at theta0, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘alices’, ‘alices2’, ‘rascal’, ‘rascal2’, ‘sallino’, ‘sally’, and ‘scandal’ methods. Default value: None.

t_xz1_filename : str or None, optional

Path to an unweighted sample of joint scores at theta1, as saved by the madminer.sampling.SampleAugmenter functions. Required for the ‘rascal2’ and ‘alices2’ methods. Default value: None.

features : list of int or None, optional

Indices of observables (features) that are used as input to the neural networks. If None, all observables are used. Default value: None.

nde_type : {‘maf’, ‘mafmog’}, optional

If the method is ‘nde’ or ‘scandal’, nde_type determines the architecture used in the neural density estimator. Currently supported are ‘maf’ for a Masked Autoregressive Flow with a Gaussian base density, or ‘mafmog’ for a Masked Autoregressive Flow with a mixture of Gaussian base densities. Default value: ‘mafmog’.

n_hidden : tuple of int, optional

Units in each hidden layer in the neural networks. If method is ‘nde’ or ‘scandal’, this refers to the setup of each individual MADE layer. Default value: (100, 100).

activation : {‘tanh’, ‘sigmoid’, ‘relu’}, optional

Activation function. Default value: ‘tanh’.

maf_n_mades : int, optional

If method is ‘nde’ or ‘scandal’, this sets the number of MADE layers. Default value: 3.

maf_batch_norm : bool, optional

If method is ‘nde’ or ‘scandal’, switches batch normalization layers after each MADE layer on or off. Default: False.

maf_batch_norm_alpha : float, optional

If method is ‘nde’ or ‘scandal’ and maf_batch_norm is True, this sets the alpha parameter in the calculation of the running average of the mean and variance. Default value: 0.1.

maf_mog_n_components : int, optional

If method is ‘nde’ or ‘scandal’ and nde_type is ‘mafmog’, this sets the number of Gaussian base components. Default value: 10.

alpha : float, optional

Hyperparameter weighting the score error in the loss function of the ‘alices’, ‘alices2’, ‘rascal’, ‘rascal2’, and ‘scandal’ methods. Default value: 1.

trainer : {“adam”, “amsgrad”, “sgd”}, optional

Optimization algorithm. Default value: “amsgrad”.

n_epochs : int, optional

Number of epochs. Default value: 50.

batch_size : int, optional

Batch size. Default value: 128.

initial_lr : float, optional

Learning rate during the first epoch, after which it exponentially decays to final_lr. Default value: 0.001.

final_lr : float, optional

Learning rate during the last epoch. Default value: 0.0001.

nesterov_momentum : float or None, optional

If trainer is “sgd”, sets the Nesterov momentum. Default value: None.

validation_split : float or None, optional

Fraction of samples used for validation and early stopping (if early_stopping is True). If None, the entire sample is used for training and early stopping is deactivated. Default value: None.

early_stopping : bool, optional

Activates early stopping based on the validation loss (only if validation_split is not None). Default value: True.

scale_inputs : bool, optional

Scale the observables to zero mean and unit variance. Default value: True.

shuffle_labels : bool, optional

If True, the labels (y, r_xz, t_xz) are shuffled, while the observations (x) remain in their normal order. This serves as a closure test, in particular as cross-check against overfitting: an estimator trained with shuffle_labels=True should predict to likelihood ratios around 1 and scores around 0.

grad_x_regularization : float or None, optional

If not None, a term of the form grad_x_regularization * |grad_x f(x)|^2 is added to the loss, where f(x) is the neural network output (the estimated log likelihood ratio or score). Default value: None.

limit_samplesize : int or None, optional

If not None, only this number of samples (events) is used to train the estimator. Default value: None.

return_first_loss : bool, optional

If True, the training routine only proceeds until the loss is calculated for the first time, at which point the loss tensor is returned. This can be useful for debugging or visualization purposes (but of course not for training a model).

verbose : bool, optional

If True, prints loss updates after every epoch.

Returns:
None