The Uncertainty Quantification 360 (UQ360) toolkit is an open-source Python package that provides a diverse set of
algorithms to quantify uncertainty, as well as capabilities to measure and improve UQ to streamline the development
process. We provide a taxonomy and guidance for choosing these capabilities based on the user’s needs. Further, UQ360
makes the communication method of UQ an integral part of development choices in an AI lifecycle. Developers can make a
user-centered choice by following the psychology-based guidance on communicating UQ estimates,
from concise descriptions to detailed visualizations.
For more information and installation instructions, see our GitHub page.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
Parameters:
X – array-like of shape (n_samples, n_features).
Features vectors of the test points.
return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.
return_epistemic – if True, the epistemic upper and lower bounds are returned.
return_epistemic_dists – If True, the epistemic distribution for each instance using scipy distributions
is returned.
Returns:
A namedtuple that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
y_lower_epistemic: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of epistemic component of the predictive distribution of the test points.
Only returned when return_epistemic is True.
y_upper_epistemic: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of epistemic component of the predictive distribution of the test points.
Only returned when return_epistemic is True.
dists: list of predictive distribution as scipy.stats objects with length n_samples.
Wrapper for heteroscedastic regression. We learn to predict targets given features,
assuming that the targets are noisy and that the amount of noise varies between data points.
https://en.wikipedia.org/wiki/Heteroscedasticity
Parameters:
model_type – The base model architecture. Currently supported values are [mlp].
mlp modeltype learns a multi-layer perceptron with a heteroscedastic Gaussian likelihood. Both the
mean and variance of the Gaussian are functions of the data point ->git N(y_n | mlp_mu(x_n), mlp_var(x_n))
model – (optional) The prediction model. Currently support pytorch models that returns mean and log variance.
config – dictionary containing the config parameters for the model.
device – device used for pytorch models ignored otherwise.
verbose – if True, print statements with the progress are enabled.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
Parameters:
X – array-like of shape (n_samples, n_features).
Features vectors of the test points.
return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.
Returns:
A namedtupe that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
dists: list of predictive distribution as scipy.stats objects with length n_samples.
Ensemble Regression assumes an ensemble of models of Gaussian form for the predictive distribution and
returns the mean and log variance of the ensemble of Gaussians.
Initializer for Ensemble of heteroscedastic regression.
:param model_type: The base model used for predicting a quantile. Currently supported values are [heteroscedasticregression].
:param config: dictionary containing the config parameters for the model.
:param device: device used for pytorch models ignored otherwise.
Obtain predictions for the test points.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
:param X: array-like of shape (n_samples, n_features).
Features vectors of the test points.
Parameters:
return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.
Returns:
A namedtupe that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
dists: list of predictive distribution as scipy.stats objects with length n_samples.
ActivelyLearnedModel assumes an existing BuiltinUQ model, and implements an active learning training of this model. This code is supporting Pestourie et al. “Active learning of deep surrogates for PDEs: application to metasurface design.” npj Computational Materials 6.1 (2020): 1-7.
Initializer for Actively learned model.
:param config: dictionary containing the config parameters for the model. For active learning: num_init, T, K, M, sampling_function, querry_function, for the used model:
{“model_function”: BuilInUQ model to actively learn,
“model_args”: same arguments as the BuilInUQ model used,
“model_kwargs”: same keyword arguments as the BuilInUQ model used}
Parameters:
device – device used for pytorch models ignored otherwise.
Fit the actively learned model, by increasing the dataset efficiently. NB: it does not take a dataset as argument, because it is building one during training.
:returns: self
Obtain predictions for the test points.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
:param X: array-like of shape (n_samples, n_features).
Features vectors of the test points.
Returns:
A namedtupe that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
Parameters:
X – array-like of shape (n_samples, n_features).
Features vectors of the test points.
Returns:
A namedtupe that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
In addition to the mean and lower/upper bounds, also returns epistemic uncertainty (return_epistemic=True)
and full predictive distribution (return_dists=True).
Parameters:
X – array-like of shape (n_samples, n_features).
Features vectors of the test points.
mc_samples – Number of Monte-Carlo samples.
return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.
return_epistemic – if True, the epistemic upper and lower bounds are returned.
return_epistemic_dists – If True, the epistemic distribution for each instance using scipy distributions
is returned.
Returns:
A namedtupe that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
y_lower_epistemic: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of epistemic component of the predictive distribution of the test points.
Only returned when return_epistemic is True.
y_upper_epistemic: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of epistemic component of the predictive distribution of the test points.
Only returned when return_epistemic is True.
dists: list of predictive distribution as scipy.stats objects with length n_samples.
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes).
Features vectors of the training data or the probability scores from the base model.
Ignored if train_loader is not None.
y – array-like of shape (n_samples,) or (n_samples, n_targets)
Target values
Ignored if train_loader is not None.
Obtain calibrated predictions for the test points.
Parameters:
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes).
Features vectors of the training data or the probability scores from the base model.
mc_samples – Number of Monte-Carlo samples.
Returns:
A namedtupe that holds
y_pred: ndarray of shape (n_samples,)
Predicted labels of the test points.
y_prob: ndarray of shape (n_samples, n_classes)
Predicted probability scores of the classes.
y_prob_var: ndarray of shape (n_samples,)
Variance of the prediction on the test points.
y_prob_samples: ndarray of shape (mc_samples, n_samples, n_classes)
Auxiliary Interval Predictor [1] uses an auxiliary model to encourage calibration of the main model.
References
Parameters:
model_type – The model type used to build the main model and the auxiliary model. Currently supported values
are [mlp, custom]. mlp modeltype learns a mlp neural network using pytorch framework. For custom the user
provide main_model and aux_model.
main_model – (optional) The main prediction model. Currently support pytorch models that return mean and log variance.
aux_model – (optional) The auxiliary prediction model. Currently support pytorch models that return calibrated log variance.
config – dictionary containing the config parameters for the model.
device – device used for pytorch models ignored otherwise.
verbose – if True, print statements with the progress are enabled.
Performs a first order Taylor series expansion around MLE / MAP fit.
Requires the model being probed to be twice differentiable.
Initialize IJ.
:param params: MLE / MAP fit around which uncertainty is sought. d*1
:param gradients: Per data point gradients, estimated at the MLE / MAP fit. d*n
:param hessian: Hessian evaluated at the MLE / MAP fit. d*d
Post hoc calibration of classification models. Currently wraps CalibratedClassifierCV from sklearn and allows
non-sklearn models to be calibrated.
Parameters:
num_classes – number of classes.
fit_mode – features or probs. If probs the fit and predict operate on the base models probability scores,
useful when these are precomputed.
method – isotonic or sigmoid.
base_model_prediction_func – the function that takes in the input features and produces base model’s
probability scores. This is ignored when operating in probs mode.
Fits calibration model using the provided calibration set.
Parameters:
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes).
Features vectors of the training data or the probability scores from the base model.
y – array-like of shape (n_samples,) or (n_samples, n_targets)
Target values
Obtain calibrated predictions for the test points.
Parameters:
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes).
Features vectors of the training data or the probability scores from the base model.
This predictor allows flexible feature and calibrator configurations, and uses a meta-model which is an ensemble of a GBM and a Logistic Regression model. It returns no errorbars (constant zero errorbars) of its own.
PostHocUQ model based on the “structured_data” performance predictor
x – array-like of shape (n_samples, n_features).
Features vectors of the test points.
return_predictions – data point wise prediction will be returned when this flag is True
predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict
Returns:
namedtuple: A namedtuple that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_pred: ndarray of shape (n_samples,)
Predicted labels of the test points.
y_score: ndarray of shape (n_samples,)
This is very similar to the structured data predictor but it is fine tuned to handle text data. The meta model used by the predictor is an ensemble of an SVM, GBM, and MLP. Feature vectors can be either raw text or pre-encoded vectors. If raw text is passed and no encoder is specified in the initialization, USE embeddings will be used by default.
PostHocUQ model based on the “text_ensemble” performance predictor (uq360.algorithms.blackbox_metamodel.predictors.core.short_text.py).
Returns an instance of a short text predictor
:param base_model: scikit learn estimator instance which has the capability of returning confidence (predict_proba). base_model can also be None
:return: predictor instance
x – array-like of shape (n_samples, n_features).
Features vectors of the test points.
return_predictions – data point wise prediction will be returned when this flag is True
predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict
Returns:
namedtuple: A namedtuple that holds
y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_pred: ndarray of shape (n_samples,)
Predicted labels of the test points.
y_score: ndarray of shape (n_samples,)
Implementation of the Mahalanobis Adversarial/Out-of-distribution detector [1].
[1] “A Simple Unified Framework for Detecting Out-of-Distribution Samples and
Adversarial Attacks”, K. Lee et al., NIPS 2018.
Parameters:
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional.
If no model or layer is provided, it is expected that the inputs are already latent vectors.
If both a model and layers are provided, inputs are expected to be model inputs
to be mapped to latent vectors.
KNN-based latent space anomaly detector. Return some measure of distance to the training data.
Parameters:
n_neighbors – number of nearest neighbors to consider in in-distribution data
method – one of (“knn”, “avg”, “lid”). These correspond respectively to the distance to the k-th neighbor, the mean of the kNN,
nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors
nearest_neighbors_kwargs – keyword arguments for the NN algorithm
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional.
If no model or layer is provided, it is expected that the inputs are already latent vectors.
If both a model and layers are provided, inputs are expected to be model inputs
to be mapped to latent vectors.
Implementation of Averaged K nearest neighbors Localized P-value
Estimation (aK_LPE) [1].
[1] J. Qian and V. Saligrama, “New statistic in P-value estimation for
anomaly detection,” 2012 IEEE Statistical Signal Processing Workshop (SSP)
Parameters:
nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors
nearest_neighbors_kwargs – keyword arguments for the NN algorithm
n_neighbors – number of NN to consider
n_bootstraps – number of bootstraps to estimate the p-value
batch_size – int
random_state – seed for RNG
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional.
If no model or layer is provided, it is expected that the inputs are already latent vectors.
If both a model and layers are provided, inputs are expected to be model inputs
to be mapped to latent vectors.
Computes risk vs rejection rate curve and the area under this curve. Similar to risk-coverage curves [3] where
coverage instead of rejection rate is used.
References
Parameters:
y_true – array-like of shape (n_samples,)
ground truth labels.
y_prob – array-like of shape (n_samples, n_classes).
Probability scores from the base model.
y_pred – array-like of shape (n_samples,)
predicted labels.
selection_scores – scores corresponding to certainty in the predicted labels.
risk_func – risk function under consideration.
attributes – (optional) if risk function is a fairness metric also pass the protected attribute name.
num_bins – number of bins.
subgroup_ids – (optional) selectively compute risk on a subgroup of the samples specified by subgroup_ids.
return_counts – set to True to return counts also.
Returns:
aurrrc (float): area under risk rejection rate curve.
rejection_rates (list): rejection rates for each bin (returned only if return_counts is True).
selection_thresholds (list): selection threshold for each bin (returned only if return_counts is True).
risks (list): risk in each bin (returned only if return_counts is True).
Computes the metrics specified in the option which can be string or a list of strings. Default option all computes
the [aurrrc, ece, auroc, nll, brier, accuracy] metrics.
Parameters:
y_true – array-like of shape (n_samples,)
ground truth labels.
y_prob – array-like of shape (n_samples, n_classes).
Probability scores from the base model.
option – string or list of string contained the name of the metrics to be computed.
Entropy based decomposition [2] of predictive uncertainty into aleatoric and epistemic components.
References
Parameters:
y_prob_samples – ndarray of shape (mc_samples, n_samples, n_classes)
Samples from the predictive distribution. Here mc_samples stands for the number of Monte-Carlo samples,
n_samples is the number of data points and n_classes is the number of classes.
Returns:
total_uncertainty: entropy of the predictive distribution.
aleatoric_uncertainty: aleatoric component of the total_uncertainty.
epistemic_uncertainty: epistemic component of the total_uncertainty.
Plots the risk vs rejection rate curve showing the risk for different rejection rates. Multiple curves
can be plot by passing data as lists.
Parameters:
y_true – array-like or or a list of array-like of shape (n_samples,)
ground truth labels.
y_prob – array-like or or a list of array-like of shape (n_samples, n_classes).
Probability scores from the base model.
y_pred – array-like or or a list of array-like of shape (n_samples,)
predicted labels.
selection_scores – ndarray or a list of ndarray containing scores corresponding to certainty in the predicted labels.
risk_func – risk function under consideration.
attributes – (optional) if risk function is a fairness metric also pass the protected attribute name.
num_bins – number of bins.
subgroup_ids – (optional) ndarray or a list of ndarray containing subgroup_ids to selectively compute risk on a
subgroup of the samples specified by subgroup_ids.
Returns:
aurrrc_list: list containing the area under risk rejection rate curves.
rejection_rate_list: list containing the binned rejection rates.
selection_thresholds_list: list containing the binned selection thresholds.
Computes the metrics specified in the option which can be string or a list of strings. Default option all computes
the [“rmse”, “nll”, “auucc_gain”, “picp”, “mpiw”, “r2”] metrics.
Parameters:
y_true – Ground truth
y_mean – predicted mean
y_lower – predicted lower bound
y_upper – predicted upper bound
option – string or list of string contained the name of the metrics to be computed.
nll_fn – function that evaluates NLL, if None, then computes Gaussian NLL using y_mean and y_lower.
Prediction Interval Coverage Probability (PICP). Computes the fraction of samples for which the grounds truth lies
within predicted interval. Measures the prediction interval calibration for regression.
Parameters:
y_true – Ground truth
y_lower – predicted lower bound
y_upper – predicted upper bound
Returns:
the fraction of samples for which the grounds truth lies within predicted interval.
Class with main functions of the Uncertainty Characteristics Curve (UCC).
Parameters:
normalize – set initial axes normalization flag (can be changed via set_coordinates())
precompute_bias_data – if True, fit() will compute statistics necessary to generate bias-based
UCCs (in addition to the scale-based ones). Skipping this precomputation may speed up the fit() call
if bias-based UCC is not needed.
Calculates internal arrays necessary for other methods (plotting, auc, cost minimization).
Re-entrant.
Parameters:
X – [numsamples, 3] numpy matrix, or list of numpy matrices.
Col 1: predicted values
Col 2: lower band (deviate) wrt predicted value (always positive)
Col 3: upper band wrt predicted value (always positive)
If list is provided, all methods will output corresponding metrics as lists as well!
gt – Ground truth array (i.e.,the ‘actual’ values corresponding to predictions in X
aucfct – specifies AUC integrator (can be “trapz”, “simps”)
partial_x – tuple (x_min, x_max) defining interval on x to calc a a partial AUC.
The interval bounds refer to axes as visualized (ie. potentially normed)
partial_y – tuple (y_min, y_max) defining interval on y to calc a a partial AUC. partial_x must be None.
Returns:
list of floats with AUUCCs for each input component, or a single float, if there is only 1 component.
Finds corresponding operating point on the current UCC, given a point on either x or y axis. Returns
a list of recipes how to achieve the point (x,y), for each component. If there is only one component,
returns a single recipe dict.
Parameters:
req_x_axis_value – requested x value on UCC (normalization status is taken from current display)
req_y_axis_value – requested y value on UCC (normalization status is taken from current display)
vary_bias – set to True when referring to bias-induced UCC (scale UCC default)
Find minima of a linear cost function for each component.
Cost function C = x_axis_cost * x_axis_value + y_axis_cost * y_axis_value.
A minimum can occur in the scale-based or bias-based UCC (this can be constrained by the ‘search’ arg).
The function returns a ‘recipe’ how to achieve the corresponding minimum, for each component.
Parameters:
x_axis_cost – weight of one unit on x_axis
y_axis_cost – weight of one unit on y_axis
augment_cost_by_normfactor – when False, the cost multipliers will apply as is. If True, they will be
pre-normed by the corresponding axis norm (where applicable), to account for range differences between axes.
search – list of types over which minimization is to be performed, valid elements are ‘scale’ and ‘bias’.
Returns:
list of dicts - one per component, or a single dict, if there is only one component. Dict keys are -
‘operation’: can be ‘bias’ (additive) or ‘scale’ (multiplicative), ‘modvalue’: value to multiply by or to
add to error bars to achieve the minimum, ‘new_x’/’new_y’: new coordinates (operating point) with that
minimum, ‘cost’: new cost at minimum point, ‘original_cost’: original cost (original operating point).
Will plot/display the UCC based on current data and coordinates. Multiple curves will be shown
if there are multiple data components (via fit())
Parameters:
titlestr – Plot title string
syslabel – list is label strings to appear in the plot legend. Can be single, if one component.
outfn – base name of an image file to be created (will append .png before creating)
vary_bias – True will switch to varying additive bias (default is multiplicative scale)
markers – None or a list of marker styles to be used for each curve.
List must be same or longer than number of components.
Markers can be one among these [‘o’, ‘s’, ‘v’, ‘D’, ‘+’].
xlim – tuples or lists of specifying the range for the x axis, or None (auto)
ylim – tuples or lists of specifying the range for the y axis, or None (auto)
**kwargs – Additional arguments passed to the main plot call.
Returns:
list of areas under the curve (or single area, if one data component)
list of operating points (or single op): format of an op is tuple (xaxis value, yaxis value, xunit, yunit)
Assigns user-specified type to the axes and normalization behavior (sticky).
Parameters:
x_axis_name – None-> unchanged, or name from self.axes_name2idx
y_axis_name – ditto
normalize – True/False will activate/deactivate norming for specified axes. Behavior for
Axes_name that are None will not be changed.
Value None will leave norm status unchanged.
Note, axis==’missrate’ will never get normalized, even with normalize == True
Sets the UCC’s unit to be used when displaying normalized axes.
Parameters:
std_unit – if None, the unit will be calculated as stddev of the ground truth data
(ValueError raised if data has not been set at this point)
or set to the user-specified value.