madminer.limits module¶

class
madminer.limits.
AsymptoticLimits
(filename=None, include_nuisance_parameters=False)[source]¶ Bases:
madminer.analysis.DataAnalyzer
Statistical inference based on asymptotic properties of the likelihood ratio as test statistics.
This class provides two highlevel functions:
 AsymptoticLimits.observed_limits() calculates pvalues over a grid in parameter space for a given set of observed data.
 AsymptoticLimits.expected_limits() calculates expected pvalues over a grid in parameter space based on “Asimov data”, a large hypothetical data set drawn from a given parameter point. This method is typically used to define expected exclusion limits or significances.
Both functions support inference based on…
 histograms of kinematic observables,
 based on histograms of score vectors estimated with the madminer.ml.ScoreEstimator class (SALLY and SALLINO techniques),
 based on likelihood or likelihood ratio functions estimated with the madminer.ml.LikelihoodEstimator and madminer.ml.ParameterizedRatioEstimator classes (NDE, SCANDAL, CARL, RASCAL, ALICES, and so on).
Currently, this class requires a morphing setup. It does not yet support nuisance parameters.
Parameters:  filename : str
Path to MadMiner file (for instance the output of madminer.delphes.DelphesProcessor.save()).
 include_nuisance_parameters : bool, optional
If True, nuisance parameters are taken into account. Currently not implemented. Default value: False.
Methods
asymptotic_p_value
(self, log_likelihood_ratio)Calculates the pvalue corresponding to a given log likelihood ratio and number of degrees of freedom assuming the asymptotic approximation. event_loader
(self[, start, end, batch_size, …])Yields batches of events in the MadMiner file. expected_limits
(self, mode, theta_true[, …])Calculates expected pvalues over a grid in parameter space. observed_limits
(self, mode, x_observed[, …])Calculates pvalues over a grid in parameter space based on a given set of observed events. weighted_events
(self[, theta, nu, …])Returns all events together with the benchmark weights (if theta is None) or weights for a given theta. xsec_gradients
(self, thetas[, nus, …])Returns the gradient of total cross sections with respect to parameters. xsecs
(self[, thetas, nus, partition, …])Returns the total cross sections for benchmarks or parameter points. 
asymptotic_p_value
(self, log_likelihood_ratio, dof=None)[source]¶ Calculates the pvalue corresponding to a given log likelihood ratio and number of degrees of freedom assuming the asymptotic approximation.
Parameters:  log_likelihood_ratio : ndarray
Log likelihood ratio (without the factor 2)
 dof : int or None, optional
Number of parameters / degrees of freedom. None means the overall number of parameters is used. Default value: None.
Returns:  p_values : ndarray
pvalues.

expected_limits
(self, mode, theta_true, grid_ranges=None, grid_resolutions=25, include_xsec=True, model_file=None, hist_vars=None, score_components=None, hist_bins=None, thetaref=None, luminosity=300000.0, weighted_histo=True, n_histo_toys=100000, histo_theta_batchsize=1000, dof=None, test_split=0.2, return_histos=True, return_asimov=False, fix_adaptive_binning='autogrid', sample_only_from_closest_benchmark=True, postprocessing=None, n_asimov=None, n_binning_toys=100000, thetas_eval=None)[source]¶ Calculates expected pvalues over a grid in parameter space.
theta_true specifies which parameter point is assumed to be true. Based on this parameter point, the function generates a large artificial “Asimov data set”. pvalues are then calculated with frequentist hypothesis tests using the likelihood ratio as test statistic. The asymptotic approximation is used, see https://arxiv.org/abs/1007.1727.
Depending on the keyword mode, the likelihood ratio is calculated with one of several different methods:
 With mode=”rate”, MadMiner only calculates the Poisson likelihood of the total number of events.
 With mode=”histo”, the kinematic likelihood is estimated with histograms of a small number of observables given by the keyword hist_vars. hist_bins determines the binning of the histograms. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”ml”, the likelihood ratio is estimated with a parameterized neural network. model_file has to point to the filename of a saved LikelihoodEstimator or ParameterizedRatioEstimator instance or a corresponding Ensemble (i.e. be the same filename used when calling estimator.save()). include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”sally”, the likelihood ratio is estimated with histograms of the components of the estimated score vector. model_file has to point to the filename of a saved ScoreEstimator instance. With score_components, the histogram can be restricted to some components of the score. hist_bins defines the binning of the histograms. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”adaptivesally”, the likelihood ratio is estimated with histograms of the components of the estimated score vector. The approach is essentially the same as for “sally”, but the histogram binning is optimized for every parameter point by adding a new h = score * (theta  thetaref) dimension to the histogram. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”sallino”, the likelihood ratio is estimated with onedimensional histograms of the scalar variable h = score * (theta  thetaref) for each point theta along the parameter grid. model_file has to point to the filename of a saved ScoreEstimator instance. hist_bins defines the binning of the histogram. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
MadMiner calculates one pvalue for every parameter point on an evenly spaced grid specified by grid_ranges and grid_resolutions. For instance, in a threedimensional parameter space, grid_ranges=[(1., 1.), (2., 2.), (3., 3.)] and grid_resolutions=[10,10,10] will start the calculation along 10^3 parameter points in a cube with edges (1, 1) in the first parameter and so on.
Parameters:  mode : {“rate”, “histo”, “ml”, “sally”, “sallino”, “adaptivesally”}
Defines how the likelihood ratio test statistic is calculated. See above.
 theta_true : ndarray
Parameter point assumed to be true to calculate the Asimov data.
 grid_ranges : list of (tuple of float) or None, optional
Specifies the boundaries of the parameter grid on which the pvalues are evaluated. It should be [(min, max), (min, max), …, (min, max)], where the list goes over all parameters and min and max are float. If None, thetas_eval has to be given. Default: None.
 grid_resolutions : int or list of int, optional
Resolution of the parameter space grid on which the pvalues are evaluated. If int, the resolution is the same along every dimension of the hypercube. If list of int, the individual entries specify the number of points along each parameter individually. Default value: 25.
 include_xsec : bool, optional
Whether the Poisson likelihood representing the total number of events is included in the analysis. Default value: True.
 model_file : str or None, optional
Filename of a saved neural network estimating the likelihood, likelihood ratio, or score. Required if mode is anything except “rate” or “histo”. Default value: None.
 hist_vars : list of str or None, optional
Kinematic variables used in the histograms when mode is “histo”. The names are the same as used for instance in DelphesReader. Default value: None.
 score_components : None or list of int, optional
Defines the score components used when mode is “sally” or “adaptivesally”. Default value: None.
 hist_bins : int or list of (int or ndarray) or None, optional
Defines the histogram binning when mode is “histo”, “sally”, “adaptivesally”, or “sallino”. If int, gives the number of bins automatically chosen for each summary statistic. If list, each entry corresponds to one summary statistic (e.g. kinematic variable specified by hist_vars or estimated score component); an int entry corresponds to the number of automatically chosen bins, an ndarray specifies the bin edges along this dimension explicitly. If None, the bins are chosen according to the defaults: for one summary statistic the default is 25 bins, for 2 it’s 8 bins along each direction, for more it’s 5 per dimension. Default value: None.
 thetaref : ndarray or None, optional
Defines the reference parameter point at which the score is evaluated for mode “sallino” or “adaptivesally”. If None, the origin in parameter space, [0., 0., …, 0.], is used. Default value: None.
 luminosity : float, optional
Integrated luminosity in pb^{1} assumed in the analysis. Default value: 300000.
 weighted_histo : bool, optional
If True, the histograms used for the modes “histo”, “sally”, “sallino”, and “adaptivesally” use one set of weighted events to construct the histograms at every point along the parameter grid, only with different weights for each parameter point on the grid. If False, independent unweighted event samples are drawn for each parameter point on the grid. Default value: True.
 n_histo_toys : int or None, optional
Number of events drawn to construct the histograms used for the modes “histo”, “sally”, “sallino”, and “adaptivesally”. If None and weighted_histo is True, all events in the training fraction of the MadMiner file are used. If None and weighted_histo is False, 100000 events are used. Default value: 100000.
 histo_theta_batchsize : int or None, optional
Number of histograms constructed in parallel for the modes “histo”, “sally”, “sallino”, and “adaptivesally” and if weighted_histo is True. A larger number speeds up the calculation, but requires more memory. Default value: 1000.
 dof : int or None, optional
If not None, sets the number of parameters for the calculation of the pvalues. If None, the overall number of parameters is used. Default value: None.
 test_split : float, optional
Fraction of weighted events in the MadMiner file reserved for evaluation. Default value: 0.2.
 return_histos : bool, optional
If True and if mode is “histo”, “sally”, “adaptivesally”, or “sallino”, the function returns histogram objects for each point along the grid.
 fix_adaptive_binning : [False, “center”, “grid”, “autogrid”, “autocenter”], optional
If not False and if mode is “histo”, “sally”, “adaptivesally”, or “sallino”, the automatic histogram binning is the same for every point along the parameter grid. For “center”, the central point in the parameter grid is used to determine the binning, for “grid” all points in the parameter grid are combined for this. For “autogrid” or “autocenter”, this option is turned on if mode is “histo” or “sally”, but not for “adaptivesally” or “sallino”. Default value: “autogrid”.
 sample_only_from_closest_benchmark : bool, optional
If True, only events originally generated from the closest benchmarks are used when generating the Asimov data (and, if weighted_histo is False, the histogram data). Default value: True.
 return_asimov : bool, optional
Whether the values of the summary statistics in the Asimov (“expected observed”) data set are returned. Default value: False.
 postprocessing : None or function, optional
If not None, points to a function that processes the summary statistics before being fed into histograms. Default value: None.
 n_binning_toys : int or None, optional
Number of toy events used to determine the binning of adaptive histograms. Default value: 100000.
 n_asimov : int or None, optional
Size of the Asimov sample. If None, all weighted events in the MadMiner file are used. Default value: None.
 thetas_eval : ndarray or None
Manually specifies the parameter point at which the likelihood and pvalues are evaluated. If None, grid_ranges and resolution are used instead to construct a regular grid. Default value: None.
Returns:  parameter_grid : ndarray
Parameter points at which the pvalues are evaluated with shape (n_grid_points, n_parameters).
 p_values : ndarray
Observed pvalues for each parameter point on the grid, with shape (n_grid_points,).
 mle : int
Index of the parameter point with the best fit (largest pvalue / smallest 2 log likelihood ratio).
 log_likelihood_ratio_kin : ndarray or None
log likelihood ratio based only on kinematics for each point of the grid, with shape (n_grid_points,).
 log_likelihood_rate : ndarray or None
log likelihood based only on the total rate for each point of the grid, with shape (n_grid_points,).
 histos : None or list of Histogram
None if return_histos is False. Otherwise a list of histogram objects for each point on the grid. This can be useful for debugging or for plotting the histograms.

observed_limits
(self, mode, x_observed, grid_ranges=None, grid_resolutions=25, include_xsec=True, model_file=None, hist_vars=None, score_components=None, hist_bins=None, thetaref=None, luminosity=300000.0, weighted_histo=True, n_histo_toys=100000, histo_theta_batchsize=1000, n_observed=None, dof=None, test_split=0.2, return_histos=True, return_observed=False, fix_adaptive_binning='autogrid', postprocessing=None, n_binning_toys=100000, thetas_eval=None)[source]¶ Calculates pvalues over a grid in parameter space based on a given set of observed events.
x_observed specifies the observed data as an array of observables, using the same observables and their order as used throughout the MadMiner workflow.
The pvalues with frequentist hypothesis tests using the likelihood ratio as test statistic. The asymptotic approximation is used, see https://arxiv.org/abs/1007.1727.
Depending on the keyword mode, the likelihood ratio is calculated with one of several different methods:
 With mode=”rate”, MadMiner only calculates the Poisson likelihood of the total number of events.
 With mode=”histo”, the kinematic likelihood is estimated with histograms of a small number of observables given by the keyword hist_vars. hist_bins determines the binning of the histograms. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”ml”, the likelihood ratio is estimated with a parameterized neural network. model_file has to point to the filename of a saved LikelihoodEstimator or ParameterizedRatioEstimator instance or a corresponding Ensemble (i.e. be the same filename used when calling estimator.save()). include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”sally”, the likelihood ratio is estimated with histograms of the components of the estimated score vector. model_file has to point to the filename of a saved ScoreEstimator instance. With score_components, the histogram can be restricted to some components of the score. hist_bins defines the binning of the histograms. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”adaptivesally”, the likelihood ratio is estimated with histograms of the components of the estimated score vector. The approach is essentially the same as for “sally”, but the histogram binning is optimized for every parameter point by adding a new h = score * (theta  thetaref) dimension to the histogram. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
 With mode=”sallino”, the likelihood ratio is estimated with onedimensional histograms of the scalar variable h = score * (theta  thetaref) for each point theta along the parameter grid. model_file has to point to the filename of a saved ScoreEstimator instance. hist_bins defines the binning of the histogram. include_xsec sets whether the Poisson likelihood of the total number of events is included or not.
MadMiner calculates one pvalue for every parameter point on an evenly spaced grid specified by grid_ranges and grid_resolutions. For instance, in a threedimensional parameter space, grid_ranges=[(1., 1.), (2., 2.), (3., 3.)] and grid_resolutions=[10,10,10] will start the calculation along 10^3 parameter points in a cube with edges (1, 1) in the first parameter and so on.
Parameters:  mode : {“rate”, “histo”, “ml”, “sally”, “sallino”, “adaptivesally”}
Defines how the likelihood ratio test statistic is calculated. See above.
 x_observed : ndarray
Observed data with shape (n_events, n_observables). The observables have to be the same used throughout the MadMiner analysis, for instance specified in the DelphesReader class with add_observables.
 grid_ranges : list of (tuple of float) or None, optional
Specifies the boundaries of the parameter grid on which the pvalues are evaluated. It should be [(min, max), (min, max), …, (min, max)], where the list goes over all parameters and min and max are float. If None, thetas_eval has to be given. Default: None.
 grid_resolutions : int or list of int, optional
Resolution of the parameter space grid on which the pvalues are evaluated. If int, the resolution is the same along every dimension of the hypercube. If list of int, the individual entries specify the number of points along each parameter individually. Doesn’t have any effect if grid_ranges is None. Default value: 25.
 include_xsec : bool, optional
Whether the Poisson likelihood representing the total number of events is included in the analysis. Default value: True.
 model_file : str or None, optional
Filename of a saved neural network estimating the likelihood, likelihood ratio, or score. Required if mode is anything except “rate” or “histo”. Default value: None.
 hist_vars : list of str or None, optional
Kinematic variables used in the histograms when mode is “histo”. The names are the same as used for instance in DelphesReader. Default value: None.
 score_components : None or list of int, optional
Defines the score components used when mode is “sally” or “adaptivesally”. Default value: None.
 hist_bins : int or list of (int or ndarray) or None, optional
Defines the histogram binning when mode is “histo”, “sally”, “adaptivesally”, or “sallino”. If int, gives the number of bins automatically chosen for each summary statistic. If list, each entry corresponds to one summary statistic (e.g. kinematic variable specified by hist_vars or estimated score component); an int entry corresponds to the number of automatically chosen bins, an ndarray specifies the bin edges along this dimension explicitly. If None, the bins are chosen according to the defaults: for one summary statistic the default is 25 bins, for 2 it’s 8 bins along each direction, for more it’s 5 per dimension. Default value: None.
 thetaref : ndarray or None, optional
Defines the reference parameter point at which the score is evaluated for mode “sallino” or “adaptivesally”. If None, the origin in parameter space, [0., 0., …, 0.], is used. Default value: None.
 luminosity : float, optional
Integrated luminosity in pb^{1} assumed in the analysis. Default value: 300000.
 weighted_histo : bool, optional
If True, the histograms used for the modes “histo”, “sally”, “sallino”, and “adaptivesally” use one set of weighted events to construct the histograms at every point along the parameter grid, only with different weights for each parameter point on the grid. If False, independent unweighted event samples are drawn for each parameter point on the grid. Default value: True.
 n_histo_toys : int or None, optional
Number of events drawn to construct the histograms used for the modes “histo”, “sally”, “sallino”, and “adaptivesally”. If None and weighted_histo is True, all events in the training fraction of the MadMiner file are used. If None and weighted_histo is False, 100000 events are used. Default value: 100000.
 histo_theta_batchsize : int or None, optional
Number of histograms constructed in parallel for the modes “histo”, “sally”, “sallino”, and “adaptivesally” and if weighted_histo is True. A larger number speeds up the calculation, but requires more memory. Default value: 1000.
 n_observed : int or None, optional
If not None, the likelihood ratio is rescaled to this number of observed events before calculating pvalues. Default value: None.
 dof : int or None, optional
If not None, sets the number of parameters for the calculation of the pvalues. If None, the overall number of parameters is used. Default value: None.
 test_split : float, optional
Fraction of weighted events in the MadMiner file reserved for evaluation. Default value: 0.2.
 return_histos : bool, optional
If True and if mode is “histo”, “sally”, “adaptivesally”, or “sallino”, the function returns histogram objects for each point along the grid.
 fix_adaptive_binning : [False, “center”, “grid”, “autogrid”, “autocenter”], optional
If not False and if mode is “histo”, “sally”, “adaptivesally”, or “sallino”, the automatic histogram binning is the same for every point along the parameter grid. For “center”, the central point in the parameter grid is used to determine the binning, for “grid” all points in the parameter grid are combined for this. For “autogrid” or “autocenter”, this option is turned on if mode is “histo” or “sally”, but not for “adaptivesally” or “sallino”. Default value: “autogrid”.
 return_observed : bool, optional
Whether the observed values of the summary statistics are returned. Default value: False.
 postprocessing : None or function
If not None, points to a function that processes the summary statistics before being fed into histograms. Default value: None.
 n_binning_toys : int or None
Number of toy events used to determine the binning of adaptive histograms. Default value: 100000.
 thetas_eval : ndarray or None
Manually specifies the parameter point at which the likelihood and pvalues are evaluated. If None, grid_ranges and resolution are used instead to construct a regular grid. Default value: None.
Returns:  parameter_grid : ndarray
Parameter points at which the pvalues are evaluated with shape (n_grid_points, n_parameters).
 p_values : ndarray
Observed pvalues for each parameter point on the grid, with shape (n_grid_points,).
 mle : int
Index of the parameter point with the best fit (largest pvalue / smallest 2 log likelihood ratio).
 log_likelihood_ratio_kin : ndarray or None
log likelihood ratio based only on kinematics for each point of the grid, with shape (n_grid_points,).
 log_likelihood_rate : ndarray or None
log likelihood based only on the total rate for each point of the grid, with shape (n_grid_points,).
 histos : None or list of Histogram
None if return_histos is False. Otherwise a list of histogram objects for each point on the grid. This can be useful for debugging or for plotting the histograms.