Classification Metrics

uq360.metrics.classification_metrics.area_under_risk_rejection_rate_curve(y_true, y_prob, y_pred=None, selection_scores=None, risk_func=<function accuracy_score>, attributes=None, num_bins=10, subgroup_ids=None, return_counts=False)

Computes risk vs rejection rate curve and the area under this curve. Similar to risk-coverage curves [3] where coverage instead of rejection rate is used.

References

Parameters:

y_true – array-like of shape (n_samples,) ground truth labels.
y_prob – array-like of shape (n_samples, n_classes). Probability scores from the base model.
y_pred – array-like of shape (n_samples,) predicted labels.
selection_scores – scores corresponding to certainty in the predicted labels.
risk_func – risk function under consideration.
attributes – (optional) if risk function is a fairness metric also pass the protected attribute name.
num_bins – number of bins.
subgroup_ids – (optional) selectively compute risk on a subgroup of the samples specified by subgroup_ids.
return_counts – set to True to return counts also.

Returns:

aurrrc (float): area under risk rejection rate curve.
rejection_rates (list): rejection rates for each bin (returned only if return_counts is True).
selection_thresholds (list): selection threshold for each bin (returned only if return_counts is True).
risks (list): risk in each bin (returned only if return_counts is True).

Return type:

float or tuple

uq360.metrics.classification_metrics.compute_classification_metrics(y_true, y_prob, option='all')

Computes the metrics specified in the option which can be string or a list of strings. Default option all computes the [aurrrc, ece, auroc, nll, brier, accuracy] metrics.

Parameters:

y_true – array-like of shape (n_samples,) ground truth labels.
y_prob – array-like of shape (n_samples, n_classes). Probability scores from the base model.
option – string or list of string contained the name of the metrics to be computed.

Returns:

a dictionary containing the computed metrics.

Return type:

dict

uq360.metrics.classification_metrics.entropy_based_uncertainty_decomposition(y_prob_samples)

Entropy based decomposition [2] of predictive uncertainty into aleatoric and epistemic components.

References

Parameters:

y_prob_samples – ndarray of shape (mc_samples, n_samples, n_classes) Samples from the predictive distribution. Here mc_samples stands for the number of Monte-Carlo samples, n_samples is the number of data points and n_classes is the number of classes.

Returns:

total_uncertainty: entropy of the predictive distribution.
aleatoric_uncertainty: aleatoric component of the total_uncertainty.
epistemic_uncertainty: epistemic component of the total_uncertainty.

Return type:

tuple

uq360.metrics.classification_metrics.expected_calibration_error(y_true, y_prob, y_pred=None, num_bins=10, return_counts=False)

Computes the reliability curve and the expected calibration error [1] .

References

Parameters:

y_true – array-like of shape (n_samples,) ground truth labels.
y_prob – array-like of shape (n_samples, n_classes). Probability scores from the base model.
y_pred – array-like of shape (n_samples,) predicted labels.
num_bins – number of bins.
return_counts – set to True to return counts also.

Returns:

ece (float): expected calibration error.
confidences_in_bins: average confidence in each bin (returned only if return_counts is True).
accuracies_in_bins: accuracy in each bin (returned only if return_counts is True).
frac_samples_in_bins: fraction of samples in each bin (returned only if return_counts is True).

Return type:

float or tuple

uq360.metrics.classification_metrics.multiclass_brier_score(y_true, y_prob)

Brier score for multi-class.

Parameters:

y_true – array-like of shape (n_samples,) ground truth labels.
y_prob – array-like of shape (n_samples, n_classes). Probability scores from the base model.

Returns:

Brier score.

Return type:

float

uq360.metrics.classification_metrics.plot_reliability_diagram(y_true, y_prob, y_pred, plot_label=[''], num_bins=10)

Plots the reliability diagram showing the calibration error for different confidence scores. Multiple curves can be plot by passing data as lists.

Parameters:

y_true – array-like or or a list of array-like of shape (n_samples,) ground truth labels.
y_prob – array-like or or a list of array-like of shape (n_samples, n_classes). Probability scores from the base model.
y_pred – array-like or or a list of array-like of shape (n_samples,) predicted labels.
plot_label – (optional) list of names identifying each curve.
num_bins – number of bins.

Returns:

ece_list: ece: list containing expected calibration error for each curve.
accuracies_in_bins_list: list containing binned average accuracies for each curve.
frac_samples_in_bins_list: list containing binned sample frequencies for each curve.
confidences_in_bins_list: list containing binned average confidence for each curve.

Return type:

tuple

uq360.metrics.classification_metrics.plot_risk_vs_rejection_rate(y_true, y_prob, y_pred, selection_scores=None, plot_label=[''], risk_func=None, attributes=None, num_bins=10, subgroup_ids=None)

Plots the risk vs rejection rate curve showing the risk for different rejection rates. Multiple curves can be plot by passing data as lists.

Parameters:

y_true – array-like or or a list of array-like of shape (n_samples,) ground truth labels.
y_prob – array-like or or a list of array-like of shape (n_samples, n_classes). Probability scores from the base model.
y_pred – array-like or or a list of array-like of shape (n_samples,) predicted labels.
selection_scores – ndarray or a list of ndarray containing scores corresponding to certainty in the predicted labels.
risk_func – risk function under consideration.
attributes – (optional) if risk function is a fairness metric also pass the protected attribute name.
num_bins – number of bins.
subgroup_ids – (optional) ndarray or a list of ndarray containing subgroup_ids to selectively compute risk on a subgroup of the samples specified by subgroup_ids.

Returns:

aurrrc_list: list containing the area under risk rejection rate curves.
rejection_rate_list: list containing the binned rejection rates.
selection_thresholds_list: list containing the binned selection thresholds.
risk_list: list containing the binned risks.

Return type:

tuple