Extrinsic UQ Algorithms
Auxiliary Interval Predictor
- class uq360.algorithms.auxiliary_interval_predictor.AuxiliaryIntervalPredictor(model_type=None, main_model=None, aux_model=None, config=None, device=None, verbose=True)
Auxiliary Interval Predictor [1] uses an auxiliary model to encourage calibration of the main model.
References
- Parameters:
model_type – The model type used to build the main model and the auxiliary model. Currently supported values are [mlp, custom]. mlp modeltype learns a mlp neural network using pytorch framework. For custom the user provide main_model and aux_model.
main_model – (optional) The main prediction model. Currently support pytorch models that return mean and log variance.
aux_model – (optional) The auxiliary prediction model. Currently support pytorch models that return calibrated log variance.
config – dictionary containing the config parameters for the model.
device – device used for pytorch models ignored otherwise.
verbose – if True, print statements with the progress are enabled.
- fit(X, y)
Fit the Auxiliary Interval Predictor model.
- Parameters:
X – array-like of shape (n_samples, n_features). Features vectors of the training data.
y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values
- Returns:
self
- predict(X, return_dists=False)
Obtain predictions for the test points.
In addition to the mean and lower/upper bounds, also returns full predictive distribution (return_dists=True).
- Parameters:
X – array-like of shape (n_samples, n_features). Features vectors of the test points.
return_dists – If True, the predictive distribution for each instance using scipy distributions is returned.
- Returns:
A namedtupe that holds
- y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
- y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
- y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
- dists: list of predictive distribution as scipy.stats objects with length n_samples.
Only returned when return_dists is True.
- Return type:
namedtuple
Blackbox Metamodel Classification
Blackbox Metamodel Regression
Infinitesimal Jackknife
- class uq360.algorithms.infinitesimal_jackknife.InfinitesimalJackknife(params, gradients, hessian, config)
Performs a first order Taylor series expansion around MLE / MAP fit. Requires the model being probed to be twice differentiable.
Initialize IJ. :param params: MLE / MAP fit around which uncertainty is sought. d*1 :param gradients: Per data point gradients, estimated at the MLE / MAP fit. d*n :param hessian: Hessian evaluated at the MLE / MAP fit. d*d
- approx_ij(w_query)
- Parameters:
w_query – A n*1 vector to query parameters at.
- Returns:
new parameters at w_query
- get_params(deep=True)
This method should not take any arguments and returns a dict of the __init__ parameters.
- ij(w_query)
- Parameters:
w_query – A n*1 vector to query parameters at.
- Returns:
new parameters at w_query
- predict(X, model)
- Parameters:
X – array-like of shape (n_samples, n_features). Features vectors of the test points.
model – model object, must implement a set_parameters function
- Returns:
A namedtupe that holds
- y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
- y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
- y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
- Return type:
namedtuple
Classification Calibration
- class uq360.algorithms.classification_calibration.ClassificationCalibration(num_classes, fit_mode='features', method='isotonic', base_model_prediction_func=None)
Post hoc calibration of classification models. Currently wraps CalibratedClassifierCV from sklearn and allows non-sklearn models to be calibrated.
- Parameters:
num_classes – number of classes.
fit_mode – features or probs. If probs the fit and predict operate on the base models probability scores, useful when these are precomputed.
method – isotonic or sigmoid.
base_model_prediction_func – the function that takes in the input features and produces base model’s probability scores. This is ignored when operating in probs mode.
- fit(X, y)
Fits calibration model using the provided calibration set.
- Parameters:
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes). Features vectors of the training data or the probability scores from the base model.
y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values
- Returns:
self
- get_params(deep=True)
This method should not take any arguments and returns a dict of the __init__ parameters.
- predict(X)
Obtain calibrated predictions for the test points.
- Parameters:
X – array-like of shape (n_samples, n_features) or (n_samples, n_classes). Features vectors of the training data or the probability scores from the base model.
- Returns:
A namedtupe that holds
- y_pred: ndarray of shape (n_samples,)
Predicted labels of the test points.
- y_prob: ndarray of shape (n_samples, n_classes)
Predicted probability scores of the classes.
- Return type:
namedtuple
UCC Recalibration
- class uq360.algorithms.ucc_recalibration.UCCRecalibration(base_model)
Recalibration a regression model to specified operating point using Uncertainty Characteristics Curve.
- Parameters:
base_model – pretrained model to be recalibrated.
- fit(X, y)
Fit the Uncertainty Characteristics Curve.
- Parameters:
X – array-like of shape (n_samples, n_features). Features vectors of the test points.
y – array-like of shape (n_samples,) or (n_samples, n_targets) Target values
- Returns:
self
- get_params(deep=True)
This method should not take any arguments and returns a dict of the __init__ parameters.
- predict(X, missrate=0.05)
Generate prediction and uncertainty bounds for data X.
- Parameters:
X – array-like of shape (n_samples, n_features). Features vectors of the test points.
missrate – desired missrate of the new operating point, set to 0.05 by default.
- Returns:
A namedtupe that holds
- y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
- y_lower: ndarray of shape (n_samples, [n_output_dims])
Lower quantile of predictive distribution of the test points.
- y_upper: ndarray of shape (n_samples, [n_output_dims])
Upper quantile of predictive distribution of the test points.
- Return type:
namedtuple
Structured Data Predictor
- class uq360.algorithms.blackbox_metamodel.structured_data_classification.StructuredDataClassificationWrapper(base_model=None)
This predictor allows flexible feature and calibrator configurations, and uses a meta-model which is an ensemble of a GBM and a Logistic Regression model. It returns no errorbars (constant zero errorbars) of its own. PostHocUQ model based on the “structured_data” performance predictor
(uq360.algorithms.blackbox_metamodel.predictors.core.structured_data.py).
Returns an instance of a structured data predictor
- Parameters:
base_model – scikit learn estimator instance which has the capability of returning confidence (predict_proba). base_model can also be None
- Returns:
predictor instance
- fit(x_train, y_train, x_test, y_test, test_predicted_probabilities=None)
Fit base and meta models.
- Parameters:
x_train – Features vectors of the training data.
y_train – Labels of the training data
x_test – Features vectors of the test data.
y_test – Labels of the test data
test_predicted_probabilities – predicted probabilities on test data should be passed if the predictor is not instantiated with a base model
- Returns:
self
- predict(x, return_predictions=True, predicted_probabilities=None)
Generate a base prediction for incoming data x
- Parameters:
x – array-like of shape (n_samples, n_features). Features vectors of the test points.
return_predictions – data point wise prediction will be returned when this flag is True
predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict
- Returns:
namedtuple: A namedtuple that holds
- y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_pred: ndarray of shape (n_samples,) Predicted labels of the test points. y_score: ndarray of shape (n_samples,)
Confidence score the test points.
Short Text Predictor
- class uq360.algorithms.blackbox_metamodel.short_text_classification.ShortTextClassificationWrapper(base_model=None, encoder=None)
This is very similar to the structured data predictor but it is fine tuned to handle text data. The meta model used by the predictor is an ensemble of an SVM, GBM, and MLP. Feature vectors can be either raw text or pre-encoded vectors. If raw text is passed and no encoder is specified in the initialization, USE embeddings will be used by default. PostHocUQ model based on the “text_ensemble” performance predictor (uq360.algorithms.blackbox_metamodel.predictors.core.short_text.py).
Returns an instance of a short text predictor :param base_model: scikit learn estimator instance which has the capability of returning confidence (predict_proba). base_model can also be None :return: predictor instance
- fit(x_train, y_train, x_test, y_test, test_predicted_probabilities=None)
Fit base and meta models.
- Parameters:
x_train – Features vectors of the training data.
y_train – Labels of the training data
x_test – Features vectors of the test data.
y_test – Labels of the test data
test_predicted_probabilities – predicted probabilities on test data should be passed if the predictor is not instantiated with a base model
- Returns:
self
- predict(x, return_predictions=True, predicted_probabilities=None)
Generate a base prediction for incoming data x
- Parameters:
x – array-like of shape (n_samples, n_features). Features vectors of the test points.
return_predictions – data point wise prediction will be returned when this flag is True
predicted_probabilities – when the predictor is instantiated without a base model, predicted_probabilities on x from the pre-trained model should be passed to predict
- Returns:
namedtuple: A namedtuple that holds
- y_mean: ndarray of shape (n_samples, [n_output_dims])
Mean of predictive distribution of the test points.
y_pred: ndarray of shape (n_samples,) Predicted labels of the test points. y_score: ndarray of shape (n_samples,)
Confidence score the test points.
Confidence Predictor
Latent Space Anomaly Detection Scores
- class uq360.algorithms.layer_scoring.mahalanobis.MahalanobisScorer(model=None, layer=None)
Implementation of the Mahalanobis Adversarial/Out-of-distribution detector [1].
[1] “A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks”, K. Lee et al., NIPS 2018.
- Parameters:
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.
- fit(X: ndarray, y: ndarray)
Register data X and class labels y as in-distribution data
- get_params()
This method is parameterless
- predict(X: ndarray)
Compute the Mahalanobis distance between query data X and the in-distribution data classes
- class uq360.algorithms.layer_scoring.knn.KNNScorer(n_neighbors: int, method: str = 'knn', nearest_neighbors: Optional[BaseNearestNeighbors] = None, nearest_neighbors_kwargs={}, model=None, layer=None)
KNN-based latent space anomaly detector. Return some measure of distance to the training data.
- Parameters:
n_neighbors – number of nearest neighbors to consider in in-distribution data
method – one of (“knn”, “avg”, “lid”). These correspond respectively to the distance to the k-th neighbor, the mean of the kNN,
nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors
nearest_neighbors_kwargs – keyword arguments for the NN algorithm
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.
- fit(X: ndarray)
Register X as in-distribution data
- get_params()
This method should not take any arguments and returns a dict of the __init__ parameters.
- predict(X: ndarray, n_neighbors=None, method: Optional[str] = None)
Compute a KNN-distance-based anomaly score on query data X.
- Parameters:
X – query data
n_neighbors – number of nearest neighbors to consider in in-distribution data
method – one of (“knn”, “avg”, “lid”). These correspond respectively to the distance to the k-th neighbor, the mean of the kNN,
- Returns:
anomaly scores
- class uq360.algorithms.layer_scoring.aklpe.AKLPEScorer(nearest_neighbors: Optional[BaseNearestNeighbors] = None, nearest_neighbors_kwargs={}, n_neighbors: int = 50, n_bootstraps: int = 10, batch_size: int = 1, random_state: int = 123, model=None, layer=None)
Implementation of Averaged K nearest neighbors Localized P-value Estimation (aK_LPE) [1].
[1] J. Qian and V. Saligrama, “New statistic in P-value estimation for anomaly detection,” 2012 IEEE Statistical Signal Processing Workshop (SSP)
- Parameters:
nearest_neighbors – nearest neighbor algorithm, see uq360.utils.transformers.nearest_neighbors
nearest_neighbors_kwargs – keyword arguments for the NN algorithm
n_neighbors – number of NN to consider
n_bootstraps – number of bootstraps to estimate the p-value
batch_size – int
random_state – seed for RNG
model – torch Module to analyze
layer – layer (torch Module) inside the model whose output is to be analyzed
Notes
The model and layer arguments are optional. If no model or layer is provided, it is expected that the inputs are already latent vectors. If both a model and layers are provided, inputs are expected to be model inputs to be mapped to latent vectors.
- fit(X: ndarray)
Register X as in-distribution data
- get_params()
This method should not take any arguments and returns a dict of the __init__ parameters.
- predict(X: ndarray)
Compute the anomaly score based on the AKLPE G-statistics
- Parameters:
X – query vector
- Returns:
g_stats, p_value containing respectively the G-statistics and the corresponding AKLPE anomaly p-value.
- Return type:
pair of numpy arrays
Nearest Neighbors Algorithms for KNN-based anomaly detection
- class uq360.utils.transformers.nearest_neighbors.exact.ExactNearestNeighbors
Exact nearest neighbor search using scikit-learn