Extrinsic UQ Algorithms

Auxiliary Interval Predictor

class uq360.algorithms.auxiliary_interval_predictor.AuxiliaryIntervalPredictor(model_type=None, main_model=None, aux_model=None, config=None, device=None, verbose=True)

Auxiliary Interval Predictor [1] uses an auxiliary model to encourage calibration of the main model.

References

Parameters:
  • model_type – The model type used to build the main model and the auxiliary model. Currently supported values are [mlp, custom]. mlp modeltype learns a mlp neural network using pytorch framework. For custom the user provide main_model and aux_model.

  • main_model – (optional) The main prediction model. Currently support pytorch models that return mean and log variance.

  • aux_model – (optional) The auxiliary prediction model. Currently support pytorch models that return calibrated log variance.

  • config – dictionary containing the config parameters for the model.

  • device – device used for pytorch models ignored otherwise.

  • verbose – if True, print statements with the progress are enabled.

fit(X, y)

Fit the Auxiliary Interval Predictor model.

Parameters:
  • X – array-like of shape (n_samples, n_features). Features vectors of the training data.

  • y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values

Returns:

self

predict(X, return_dists=False)

Obtain predictions for the test points.

In addition to the mean and lower/upper bounds, also returns full predictive distribution (return_dists=True).

Parameters:
  • X – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.

Returns:

A namedtupe that holds

y_mean: ndarray of shape (n_samples, [n_output_dims])

Mean of predictive distribution of the test points.

y_lower: ndarray of shape (n_samples, [n_output_dims])

Lower quantile of predictive distribution of the test points.

y_upper: ndarray of shape (n_samples, [n_output_dims])

Upper quantile of predictive distribution of the test points.

dists: list of predictive distribution as scipy.stats objects with length n_samples.

Only returned when return_dists is True.

Return type:

namedtuple

Blackbox Metamodel Classification

Blackbox Metamodel Regression

Infinitesimal Jackknife

class uq360.algorithms.infinitesimal_jackknife.InfinitesimalJackknife(params, gradients, hessian, config)

Performs a first order Taylor series expansion around MLE / MAP fit. Requires the model being probed to be twice differentiable.

Initialize IJ. :param params: MLE / MAP fit around which uncertainty is sought. d*1 :param gradients: Per data point gradients, estimated at the MLE / MAP fit. d*n :param hessian: Hessian evaluated at the MLE / MAP fit. d*d

approx_ij(w_query)
Parameters:

w_query – A n*1 vector to query parameters at.

Returns:

new parameters at w_query

get_params(deep=True)

This method should not take any arguments and returns a dict of the __init__ parameters.

ij(w_query)
Parameters:

w_query – A n*1 vector to query parameters at.

Returns:

new parameters at w_query

predict(X, model)
Parameters:
  • X – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • model – model object, must implement a set_parameters function

Returns:

A namedtupe that holds

y_mean: ndarray of shape (n_samples, [n_output_dims])

Mean of predictive distribution of the test points.

y_lower: ndarray of shape (n_samples, [n_output_dims])

Lower quantile of predictive distribution of the test points.

y_upper: ndarray of shape (n_samples, [n_output_dims])

Upper quantile of predictive distribution of the test points.

Return type:

namedtuple

Classification Calibration

class uq360.algorithms.classification_calibration.ClassificationCalibration(num_classes, fit_mode='features', method='isotonic', base_model_prediction_func=None)

Post hoc calibration of classification models. Currently wraps CalibratedClassifierCV from sklearn and allows non-sklearn models to be calibrated.

Parameters:
  • num_classes – number of classes.

  • fit_mode – features or probs. If probs the fit and predict operate on the base models probability scores, useful when these are precomputed.

  • method – isotonic or sigmoid.

  • base_model_prediction_func – the function that takes in the input features and produces base model’s probability scores. This is ignored when operating in probs mode.

fit(X, y)

Fits calibration model using the provided calibration set.

Parameters:
  • X – array-like of shape (n_samples, n_features) or (n_samples, n_classes). Features vectors of the training data or the probability scores from the base model.

  • y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values

Returns:

self

get_params(deep=True)

This method should not take any arguments and returns a dict of the __init__ parameters.

predict(X)

Obtain calibrated predictions for the test points.

Parameters:

X – array-like of shape (n_samples, n_features) or (n_samples, n_classes). Features vectors of the training data or the probability scores from the base model.

Returns:

A namedtupe that holds

y_pred: ndarray of shape (n_samples,)

Predicted labels of the test points.

y_prob: ndarray of shape (n_samples, n_classes)

Predicted probability scores of the classes.

Return type:

namedtuple

UCC Recalibration

class uq360.algorithms.ucc_recalibration.UCCRecalibration(base_model)

Recalibration a regression model to specified operating point using Uncertainty Characteristics Curve.

Parameters:

base_model – pretrained model to be recalibrated.

fit(X, y)

Fit the Uncertainty Characteristics Curve.

Parameters:
  • X – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values

Returns:

self

get_params(deep=True)

This method should not take any arguments and returns a dict of the __init__ parameters.

predict(X, missrate=0.05)

Generate prediction and uncertainty bounds for data X.

Parameters:
  • X – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • missrate – desired missrate of the new operating point, set to 0.05 by default.

Returns:

A namedtupe that holds

y_mean: ndarray of shape (n_samples, [n_output_dims])

Mean of predictive distribution of the test points.

y_lower: ndarray of shape (n_samples, [n_output_dims])

Lower quantile of predictive distribution of the test points.

y_upper: ndarray of shape (n_samples, [n_output_dims])

Upper quantile of predictive distribution of the test points.

Return type:

namedtuple

Structured Data Predictor

class uq360.algorithms.blackbox_metamodel.structured_data_classification.StructuredDataClassificationWrapper(base_model=None)

This predictor allows flexible feature and calibrator configurations, and uses a meta-model which is an ensemble of a GBM and a Logistic Regression model. It returns no errorbars (constant zero errorbars) of its own. PostHocUQ model based on the “structured_data” performance predictor

(uq360.algorithms.blackbox_metamodel.predictors.core.structured_data.py).

Returns an instance of a structured data predictor

Parameters:

base_model – scikit learn estimator instance which has the capability of returning confidence (predict_proba). base_model can also be None

Returns:

predictor instance

fit(x_train, y_train, x_test, y_test, test_predicted_probabilities=None)

Fit base and meta models.

Parameters:
  • x_train – Features vectors of the training data.

  • y_train – Labels of the training data

  • x_test – Features vectors of the test data.

  • y_test – Labels of the test data

  • test_predicted_probabilities – predicted probabilities on test data should be passed if the predictor is not instantiated with a base model

Returns:

self

predict(x, return_predictions=True, predicted_probabilities=None)

Generate a base prediction for incoming data x

Parameters:
  • x – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • return_predictions – data point wise prediction will be returned when this flag is True

  • predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict

Returns:

namedtuple: A namedtuple that holds

y_mean: ndarray of shape (n_samples, [n_output_dims])

Mean of predictive distribution of the test points.

y_pred: ndarray of shape (n_samples,) Predicted labels of the test points. y_score: ndarray of shape (n_samples,)

Confidence score the test points.

Short Text Predictor

class uq360.algorithms.blackbox_metamodel.short_text_classification.ShortTextClassificationWrapper(base_model=None, encoder=None)

This is very similar to the structured data predictor but it is fine tuned to handle text data. The meta model used by the predictor is an ensemble of an SVM, GBM, and MLP. Feature vectors can be either raw text or pre-encoded vectors. If raw text is passed and no encoder is specified in the initialization, USE embeddings will be used by default. PostHocUQ model based on the “text_ensemble” performance predictor (uq360.algorithms.blackbox_metamodel.predictors.core.short_text.py).

Returns an instance of a short text predictor :param base_model: scikit learn estimator instance which has the capability of returning confidence (predict_proba). base_model can also be None :return: predictor instance

fit(x_train, y_train, x_test, y_test, test_predicted_probabilities=None)

Fit base and meta models.

Parameters:
  • x_train – Features vectors of the training data.

  • y_train – Labels of the training data

  • x_test – Features vectors of the test data.

  • y_test – Labels of the test data

  • test_predicted_probabilities – predicted probabilities on test data should be passed if the predictor is not instantiated with a base model

Returns:

self

predict(x, return_predictions=True, predicted_probabilities=None)

Generate a base prediction for incoming data x

Parameters:
  • x – array-like of shape (n_samples, n_features). Features vectors of the test points.

  • return_predictions – data point wise prediction will be returned when this flag is True

  • predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict

Returns:

namedtuple: A namedtuple that holds

y_mean: ndarray of shape (n_samples, [n_output_dims])

Mean of predictive distribution of the test points.

y_pred: ndarray of shape (n_samples,) Predicted labels of the test points. y_score: ndarray of shape (n_samples,)

Confidence score the test points.

Confidence Predictor

Latent Space Anomaly Detection Scores

class uq360.algorithms.layer_scoring.mahalanobis.MahalanobisScorer(model=None, layer=None)

Implementation of the Mahalanobis Adversarial/Out-of-distribution detector [1].

[1] “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks”, K. Lee et al., NIPS 2018.

Parameters:
  • model – torch Module to analyze

  • layer – layer (torch Module) inside the model whose output is to be analyzed

Notes

The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.

fit(X: ndarray, y: ndarray)

Register data X and class labels y as in-distribution data

get_params()

This method is parameterless

predict(X: ndarray)

Compute the Mahalanobis distance between query data X and the in-distribution data classes

class uq360.algorithms.layer_scoring.knn.KNNScorer(n_neighbors: int, method: str = 'knn', nearest_neighbors: Optional[BaseNearestNeighbors] = None, nearest_neighbors_kwargs={}, model=None, layer=None)

KNN-based latent space anomaly detector. Return some measure of distance to the training data.

Parameters:
  • n_neighbors – number of nearest neighbors to consider in in-distribution data

  • method – one of (“knn”, “avg”, “lid”). These correspond respectively to the distance to the k-th neighbor, the mean of the kNN,

  • nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors

  • nearest_neighbors_kwargs – keyword arguments for the NN algorithm

  • model – torch Module to analyze

  • layer – layer (torch Module) inside the model whose output is to be analyzed

Notes

The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.

fit(X: ndarray)

Register X as in-distribution data

get_params()

This method should not take any arguments and returns a dict of the __init__ parameters.

predict(X: ndarray, n_neighbors=None, method: Optional[str] = None)

Compute a KNN-distance-based anomaly score on query data X.

Parameters:
  • X – query data

  • n_neighbors – number of nearest neighbors to consider in in-distribution data

  • method – one of (“knn”, “avg”, “lid”). These correspond respectively to the distance to the k-th neighbor, the mean of the kNN,

Returns:

anomaly scores

class uq360.algorithms.layer_scoring.aklpe.AKLPEScorer(nearest_neighbors: Optional[BaseNearestNeighbors] = None, nearest_neighbors_kwargs={}, n_neighbors: int = 50, n_bootstraps: int = 10, batch_size: int = 1, random_state: int = 123, model=None, layer=None)

Implementation of Averaged K nearest neighbors Localized P-value Estimation (aK_LPE) [1].

[1] J. Qian and V. Saligrama, “New statistic in P-value estimation for anomaly detection,” 2012 IEEE Statistical Signal Processing Workshop (SSP)

Parameters:
  • nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors

  • nearest_neighbors_kwargs – keyword arguments for the NN algorithm

  • n_neighbors – number of NN to consider

  • n_bootstraps – number of bootstraps to estimate the p-value

  • batch_size – int

  • random_state – seed for RNG

  • model – torch Module to analyze

  • layer – layer (torch Module) inside the model whose output is to be analyzed

Notes

The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.

fit(X: ndarray)

Register X as in-distribution data

get_params()

This method should not take any arguments and returns a dict of the __init__ parameters.

predict(X: ndarray)

Compute the anomaly score based on the AKLPE G-statistics

Parameters:

X – query vector

Returns:

g_stats, p_value containing respectively the G-statistics and the corresponding AKLPE anomaly p-value.

Return type:

pair of numpy arrays

Nearest Neighbors Algorithms for KNN-based anomaly detection

class uq360.utils.transformers.nearest_neighbors.exact.ExactNearestNeighbors

Exact nearest neighbor search using scikit-learn