madminer.sampling package¶
Submodules¶
madminer.sampling.combine module¶

madminer.sampling.combine.
combine_and_shuffle
(input_filenames, output_filename, k_factors=None, overwrite_existing_file=True, recalculate_header=True)[source]¶ Combines multiple MadMiner files into one, and shuffles the order of the events.
Note that this function assumes that all samples are generated with the same setup, including identical benchmarks (and thus morphing setup). If it is used with samples with different settings, there will be wrong results! There are no explicit cross checks in place yet!
Parameters:  input_filenames : list of str
List of paths to the input MadMiner files.
 output_filename : str
Path to the combined MadMiner file.
 k_factors : float or list of float, optional
Multiplies the weights in input_filenames with a universal factor (if k_factors is a float) or with independent factors (if it is a list of float). Default value: None.
 overwrite_existing_file : bool, optional
If True and if the output file exists, it is overwritten. Default value: True.
 recalculate_header : bool, optional
Recalculates the total number of events. Default value: True.
Returns:  None
madminer.sampling.parameters module¶

madminer.sampling.parameters.
benchmark
(benchmark_name)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying a single parameter benchmark.
Parameters:  benchmark_name : str
Name of the benchmark (as in madminer.core.MadMiner.add_benchmark)
Returns:  output : tuple
Input to various SampleAugmenter functions

madminer.sampling.parameters.
benchmarks
(benchmark_names)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying multiple parameter benchmarks.
Parameters:  benchmark_names : list of str
List of names of the benchmarks (as in madminer.core.MadMiner.add_benchmark)
Returns:  output : tuple
Input to various SampleAugmenter functions

madminer.sampling.parameters.
iid_nuisance_parameters
(shape='gaussian', param0=0.0, param1=1.0)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying that nuisance parameters are fixed at their nominal valuees.
Parameters:  shape : [“flat”, “gaussian”], optional
Parameter prior shape. Default value: “gaussian”.
 param0 : float, optional
Gaussian mean or flat lower bound. Default value: 0.0.
 param1 : float, optional
Gaussian std or flat upper bound. Default value: 1.0.
Returns:  output : tuple
Input to various SampleAugmenter functions.

madminer.sampling.parameters.
morphing_point
(theta)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying a single parameter point theta in a morphing setup.
Parameters:  theta : ndarray or list
Parameter point with shape (n_parameters,)
Returns:  output : tuple
Input to various SampleAugmenter functions

madminer.sampling.parameters.
morphing_points
(thetas)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying multiple parameter points theta in a morphing setup.
Parameters:  thetas : ndarray or list of lists or list of ndarrays
Parameter points with shape (n_thetas, n_parameters)
Returns:  output : tuple
Input to various SampleAugmenter functions

madminer.sampling.parameters.
nominal_nuisance_parameters
()[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying that nuisance parameters are fixed at their nominal valuees.
Returns:  output : tuple
Input to various SampleAugmenter functions

madminer.sampling.parameters.
random_morphing_points
(n_thetas, priors)[source]¶ Utility function to be used as input to various SampleAugmenter functions, specifying random parameter points sampled from a prior in a morphing setup.
Parameters:  n_thetas : int
Number of parameter points to be sampled
 priors : list of tuples
Priors for each parameter is characterized by a tuple of the form (prior_shape, prior_param_0, prior_param_1). Currently, the supported prior_shapes are flat, in which case the two other parameters are the lower and upper bound of the flat prior, and gaussian, in which case they are the mean and standard deviation of a Gaussian.
Returns:  output : tuple
Input to various SampleAugmenter functions
madminer.sampling.sampleaugmenter module¶

class
madminer.sampling.sampleaugmenter.
SampleAugmenter
(filename, disable_morphing=False, include_nuisance_parameters=True)[source]¶ Bases:
madminer.analysis.dataanalyzer.DataAnalyzer
Sampling / unweighting and data augmentation.
After the generated events have been analyzed and the observables and weights have been saved into a MadMiner file, for instance with madminer.delphes.DelphesReader or madminer.lhe.LHEReader, the next step is typically the generation of training and evaluation data for the machine learning algorithms. This generally involves two (related) tasks: unweighting, i.e. the creation of samples that do not carry individual weights but follow some distribution, and the extraction of the joint likelihood ratio and / or joint score (the “augmented data”).
After inializing SampleAugmenter with the filename of a MadMiner file, this is done with a single function call. Depending on the downstream infference algorithm, there are different possibilities:
 SampleAugmenter.sample_train_plain() creates plain training samples without augmented data.
 SampleAugmenter.sample_train_local() creates training samples for local methods based on the score, such as SALLY and SALLINO.
 SampleAugmenter.sample_train_density() creates training samples for nonlocal methods based on density estimation and the score, such as SCANDAL.
 SampleAugmenter.sample_train_ratio() creates training samples for nonlocal, ratiobased methods like RASCAL or ALICE.
 SampleAugmenter.sample_train_more_ratios() does the same, but can extract joint ratios and scores at more parameter points. This additional information can be used efficiently in the setup with a “doubly parameterized” likelihood ratio estimator that models the dependence on both the numerator and denominator hypothesis.
 SampleAugmenter.sample_test() creates evaluation samples for all methods.
Please see the tutorial for a walkthrough.
For the curious, let us explain these steps in a little bit more detail (assuming a morphing setup):
 The sample augmentation step starts from a set of events (x_i, z_i) together with corresponding weights for each morphing basis point theta_b, p(x_i, z_i  theta_b).
 Morphing: Assume we want to generate data sampled from a parameter point theta, which is not necessarily one of the basis points theta_b. Using the morphing structure, the event weights for p(x_i, z_i  theta) can be calculated. Note that the events (phasespace points) (x_i, z_i) are not changed, only their weights.
 Unweighting: For the machine learning part, such a weighted event sample is not practical. Instead we aim for an unweighted one, in which events can appear multiple times. If the user request N events (which can be larger than the original number of events in the MadGraph runs), SampleAugmenter will draw N samples (x_i, z_i) from the discrete distribution p(x_i, z_i  theta). In other words, it draws (with replacement) N of the original events from MadGraph, with probabilities given by the morphing setup before. This is similar to what np.random.choice() does.
 Augmentation: For each of the drawn samples, the morphing setup can be used to calculate the joint likelihood ratio and / or the joint score (this depends on which SampleAugmenter function is called).
Parameters:  filename : str
Path to MadMiner file (for instance the output of madminer.delphes.DelphesProcessor.save()).
 disable_morphing : bool, optional
If True, the morphing setup is not loaded from the file. Default value: False.
 include_nuisance_parameters : bool, optional
If True, nuisance parameters are taken into account. Default value: True.
Methods
cross_sections
(self, theta[, nu])Calculates the total cross sections for all specified thetas. event_loader
(self[, start, end, batch_size, …])Yields batches of events in the MadMiner file. sample_test
(self, theta, n_samples[, nu, …])Extracts evaluation samples x ~ p(xtheta) without any augmented data. sample_train_density
(self, theta, n_samples)Extracts training samples x ~ p(xtheta) as well as the joint score t(x, ztheta), where theta is sampled from a prior. sample_train_local
(self, theta, n_samples[, …])Extracts training samples x ~ p(xtheta) as well as the joint score t(x, ztheta). sample_train_more_ratios
(self, theta0, …)Extracts training samples x ~ p(xtheta0) and x ~ p(xtheta1) together with the class label y, the joint likelihood ratio r(x,ztheta0, theta1), and the joint score t(x,ztheta0). sample_train_plain
(self, theta, n_samples[, …])Extracts plain training samples x ~ p(xtheta) without any augmented data. sample_train_ratio
(self, theta0, theta1, …)Extracts training samples x ~ p(xtheta0) and x ~ p(xtheta1) together with the class label y, the joint likelihood ratio r(x,ztheta0, theta1), and, if morphing is set up, the joint score t(x,ztheta0). weighted_events
(self[, theta, nu, …])Returns all events together with the benchmark weights (if theta is None) or weights for a given theta. xsec_gradients
(self, thetas[, nus, …])Returns the gradient of total cross sections with respect to parameters. xsecs
(self[, thetas, nus, partition, …])Returns the total cross sections for benchmarks or parameter points. 
cross_sections
(self, theta, nu=None)[source]¶ Calculates the total cross sections for all specified thetas.
Parameters:  theta : tuple
Tuple (type, value) that defines the parameter point or prior over parameter points at which the cross section is calculated. Pass the output of the functions benchmark(), benchmarks(), morphing_point(), morphing_points(), or random_morphing_points().
 nu : tuple or None, optional
Tuple (type, value) that defines the nuisance parameter point or prior over nuisance parameter points at which the cross section is calculated. Pass the output of the functions benchmark(), benchmarks(), morphing_point(), morphing_points(), or random_morphing_points(). Default valuee: None.
Returns:  thetas : ndarray
Parameter points with shape (n_thetas, n_parameters) or (n_thetas, n_parameters + n_nuisance_parameters).
 xsecs : ndarray
Total cross sections in pb with shape (n_thetas, ).
 xsec_uncertainties : ndarray
Statistical uncertainties on the total cross sections in pb with shape (n_thetas, ).

sample_test
(self, theta, n_samples, nu=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, test_split=0.2, validation_split=0.2, partition='test', n_processes=1, n_eff_forced=None, double_precision=False)[source]¶ Extracts evaluation samples x ~ p(xtheta) without any augmented data.
Parameters:  theta : tuple
Tuple (type, value) that defines the parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 n_samples : int
Total number of events to be drawn.
 nu : None or tuple, optional
Tuple (type, value) that defines the nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “test”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta : ndarray
Parameter points used for sampling with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.

sample_train_density
(self, theta, n_samples, nu=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, nuisance_score='auto', test_split=0.2, validation_split=0.2, partition='train', n_processes=1, n_eff_forced=None, double_precision=False)[source]¶ Extracts training samples x ~ p(xtheta) as well as the joint score t(x, ztheta), where theta is sampled from a prior. This can be used for inference methods such as SCANDAL.
Parameters:  theta : tuple
Tuple (type, value) that defines the numerator parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 n_samples : int
Total number of events to be drawn.
 nu : None or tuple, optional
Tuple (type, value) that defines the nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 nuisance_score : bool or “auto”, optional
If True, the score with respect to the nuisance parameters (at the default position) will also be calculated. If False, only the score with respect to the physics parameters is calculated. For “auto”, the nuisance score will be calculated if a nuisance setup is defined. Default: True.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “train”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False.
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta : ndarray
Parameter points used for sampling (and evaluation of the joint score) with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 t_xz : ndarray
Joint score evaluated at theta with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.

sample_train_local
(self, theta, n_samples, nu=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, nuisance_score='auto', test_split=0.2, validation_split=0.2, partition='train', n_processes=1, log_message=True, n_eff_forced=None, double_precision=False)[source]¶ Extracts training samples x ~ p(xtheta) as well as the joint score t(x, ztheta). This can be used for inference methods such as SALLY and SALLINO.
Parameters:  theta : tuple
Tuple (type, value) that defines the parameter point for the sampling. This is also where the score is evaluated. Pass the output of the functions constant_benchmark_theta() or constant_morphing_theta().
 n_samples : int
Total number of events to be drawn.
 nu : None or tuple, optional
Tuple (type, value) that defines the nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 nuisance_score : bool or “auto”, optional
If True, the score with respect to the nuisance parameters (at the default position) will also be calculated. If False, only the score with respect to the physics parameters is calculated. For “auto”, the nuisance score will be calculated if a nuisance setup is defined. Default: True.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “train”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 log_message : bool, optional
If True, logging output. This option is only designed for internal use.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False.
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta : ndarray
Parameter points used for sampling (and evaluation of the joint score) with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 t_xz : ndarray
Joint score evaluated at theta with shape (n_samples, n_parameters + n_nuisance_parameters) (if nuisance_score is True) or (n_samples, n_parameters). The same information is saved as a file in the given folder.
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.

sample_train_more_ratios
(self, theta0, theta1, n_samples, nu0=None, nu1=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, additional_thetas=None, nuisance_score='auto', test_split=0.2, validation_split=0.2, partition='train', n_processes=1, n_eff_forced=None, double_precision=False)[source]¶ Extracts training samples x ~ p(xtheta0) and x ~ p(xtheta1) together with the class label y, the joint likelihood ratio r(x,ztheta0, theta1), and the joint score t(x,ztheta0). This information can be used in inference methods such as CARL, ROLR, CASCAL, and RASCAL.
With the keyword additional_thetas, this function allows to extract joint ratios and scores at more parameter points than just theta0 and theta1. This additional information can be used efficiently in the setup with a “doubly parameterized” likelihood ratio estimator that models the dependence on both the numerator and denominator hypothesis.
Parameters:  theta0 :
Tuple (type, value) that defines the numerator parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 theta1 :
Tuple (type, value) that defines the denominator parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 n_samples : int
Total number of events to be drawn.
 nu0 : None or tuple, optional
Tuple (type, value) that defines the numerator nuisance parameter point or prior over parameter points for the sampling. Default value: None
 nu1 : None or tuple, optional
Tuple (type, value) that defines the denominator nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 additional_thetas : list of tuple or None
list of tuples (type, value) that defines additional theta points at which ratio and score are evaluated, and which are then used to create additional training data points. These can be efficiently used only in the “doubly parameterized” setup where a likelihood ratio estimator models the dependence of the likelihood ratio on both the numerator and denominator hypothesis. Pass the output of the helper functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas(). Default value: None.
 nuisance_score : bool or “auto”, optional
If True, the score with respect to the nuisance parameters (at the default position) will also be calculated. If False, only the score with respect to the physics parameters is calculated. For “auto”, the nuisance score will be calculated if a nuisance setup is defined. Default: True.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “train”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta0 : ndarray
Numerator parameter points with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 theta1 : ndarray
Denominator parameter points with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 y : ndarray
Class label with shape (n_samples, n_parameters). y=0 (1) for events sample from the numerator (denominator) hypothesis. The same information is saved as a file in the given folder.
 r_xz : ndarray
Joint likelihood ratio with shape (n_samples,). The same information is saved as a file in the given folder.
 t_xz : ndarray
Joint score evaluated at theta0 with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.

sample_train_plain
(self, theta, n_samples, nu=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, test_split=0.2, validation_split=0.2, partition='train', n_processes=1, n_eff_forced=None, double_precision=False)[source]¶ Extracts plain training samples x ~ p(xtheta) without any augmented data. This can be use for standard inference methods such as ABC, histograms of observables, or neural density estimation techniques. It can also be used to create validation or calibration samples.
Parameters:  theta : tuple
Tuple (type, value) that defines the parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 n_samples : int
Total number of events to be drawn.
 nu : None or tuple, optional
Tuple (type, value) that defines the nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “train”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False.
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta : ndarray
Parameter points used for sampling with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.

sample_train_ratio
(self, theta0, theta1, n_samples, nu0=None, nu1=None, sample_only_from_closest_benchmark=True, folder=None, filename=None, nuisance_score='auto', test_split=0.2, validation_split=0.2, partition='train', n_processes=1, return_individual_n_effective=False, n_eff_forced=None, double_precision=False)[source]¶ Extracts training samples x ~ p(xtheta0) and x ~ p(xtheta1) together with the class label y, the joint likelihood ratio r(x,ztheta0, theta1), and, if morphing is set up, the joint score t(x,ztheta0). This information can be used in inference methods such as CARL, ROLR, CASCAL, and RASCAL.
Parameters:  theta0 : tuple
Tuple (type, value) that defines the numerator parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 theta1 : tuple
Tuple (type, value) that defines the denominator parameter point or prior over parameter points for the sampling. Pass the output of the functions constant_benchmark_theta(), multiple_benchmark_thetas(), constant_morphing_theta(), multiple_morphing_thetas(), or random_morphing_thetas().
 n_samples : int
Total number of events to be drawn.
 nu0 : None or tuple, optional
Tuple (type, value) that defines the numerator nuisance parameter point or prior over parameter points for the sampling. Default value: None
 nu1 : None or tuple, optional
Tuple (type, value) that defines the denominator nuisance parameter point or prior over parameter points for the sampling. Default value: None
 sample_only_from_closest_benchmark : bool, optional
If True, only weighted events originally generated from the closest benchmarks are used. Default value: True.
 folder : str or None
Path to the folder where the resulting samples should be saved (ndarrays in .npy format). Default value: None.
 filename : str or None
Filenames for the resulting samples. A prefix such as ‘x’ or ‘theta0’ as well as the extension ‘.npy’ will be added automatically. Default value: None.
 nuisance_score : bool or “auto”, optional
If True, the score with respect to the nuisance parameters (at the default position) will also be calculated. If False, only the score with respect to the physics parameters is calculated. For “auto”, the nuisance score will be calculated if a nuisance setup is defined. Default: True.
 test_split : float or None, optional
Fraction of events reserved for the evaluation sample (that will not be used for any training samples). Default value: 0.2.
 validation_split : float or None, optional
Fraction of events reserved for testing. Default value: 0.2.
 partition : {“train”, “test”, “validation”, “all”}, optional
Which event partition to use. Default value: “train”.
 n_processes : None or int, optional
If None or larger than 1, MadMiner will use multiprocessing to parallelize the sampling. In this case, n_workers sets the number of jobs running in parallel, and None will use the number of CPUs. Default value: 1.
 return_individual_n_effective : bool, optional
Returns number of effective samples for each set individually. Default value: False.
 n_eff_forced : float, optional
If not None, MadMiner will require the relative weights of the events to be smaller than 1/n_eff_forced and ignore other events. This can help to reduce statistical effects caused by a small number of events with very large weights obtained by the morphing procedure. Default value: None
 double_precision : bool, optional
Use double floatingpoint precision. Default value: False
Returns:  x : ndarray
Observables with shape (n_samples, n_observables). The same information is saved as a file in the given folder.
 theta0 : ndarray
Numerator parameter points with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 theta1 : ndarray
Denominator parameter points with shape (n_samples, n_parameters). The same information is saved as a file in the given folder.
 y : ndarray
Class label with shape (n_samples, n_parameters). y=0 (1) for events sample from the numerator (denominator) hypothesis. The same information is saved as a file in the given folder.
 r_xz : ndarray
Joint likelihood ratio with shape (n_samples,). The same information is saved as a file in the given folder.
 t_xz : ndarray or None
If morphing is set up, the joint score evaluated at theta0 with shape (n_samples, n_parameters). The same information is saved as a file in the given folder. If morphing is not set up, None is returned (and no file is saved).
 effective_n_samples : int
Effective number of samples, defined as 1/max(event_probabilities), where event_probabilities are the fractions of the cross section carried by each event.