Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sigmoid function in PrecisionRecallCurve leads to information loss #1526

Open
djdameln opened this issue Feb 20, 2023 · 3 comments · May be fixed by #1676
Open

Sigmoid function in PrecisionRecallCurve leads to information loss #1526

djdameln opened this issue Feb 20, 2023 · 3 comments · May be fixed by #1676
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed v0.11.x

Comments

@djdameln
Copy link

🐛 Bug

Hello, first of all, thank you for the awesome library! I am a maintainer of the Anomalib library, and we are using TorchMetrics extensively throughout our code base to evaluate our models.

The most recent version of TorchMetrics introduced some changes to the PrecisionRecallCurve metric, which are causing some problems in one of our components. The problems are related to the re-mapping of the prediction values to the [0,1] range by applying a sigmoid function.

Some context

The goal of the models in our library is to detect anomalous samples in a dataset that contains both normal and anomalous samples. The task is similar to a classical binary classification problem, but instead of generating a class label and a confidence score, our models generate an anomaly score, which quantifies the distance of the sample to the distribution of normal samples seen during training. The range of possible anomaly score values is unbounded and may differ widely between models and/or datasets, which makes it tricky to set a good threshold for mapping the raw anomaly scores to a binary class label (normal vs. anomalous). This is why we apply an adaptive thresholding mechanism as a post-processing step. The adaptive threshold mechanism returns the threshold value that maximizes the F1 score over the validation set.

Our adaptive thresholding class inherits from TorchMetrics' PrecisionRecallCurve class. After TorchMetrics computes the precision and recall values, our class computes the F1 scores for the range of precision and recall values, and finally returns the threshold value that corresponds to the highest observed F1 score.

The problem

In the latest version of the PrecisionRecallCurve metric, the update method now re-maps the predictions to the [0, 1] range by applying a sigmoid function. As a result, the thresholds variable returned by compute is now not in the same domain as the original predictions, and the values are not usable for our purpose of finding the optimal threshold value.

In addition, the sigmoid function squeezes the higher and lower values, which leads to lower resolution at the extremes of the input range, and in some cases even information loss.

To Reproduce

Here's an example to illustrate the problem. Let's say we have a set of binary targets and a set of model predictions in the range [12, 17]. Previously, the PrecisionRecallCurve metric would return the values of precision and recall for the different thresholds that occur naturally in the data.

v0.10.3

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([12, 13, 14, 15, 16, 17])
>>>
>>> metric = PrecisionRecallCurve()
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.7500, 0.6667, 1.0000, 1.0000, 1.0000])
>>> recall
tensor([1.0000, 0.6667, 0.6667, 0.3333, 0.0000])
>>> thresholds
tensor([14., 15., 16., 17.])

Given these outputs it is straightforward to obtain the F1 scores for the different threshold values and use this to find the optimal threshold that maximizes F1.

After the recent changes, the predictions are now re-mapped by the sigmoid function. While we can still compute the F1 scores, we can no longer find the value of the threshold that yields the highest F1 score, because the values of the thresholds variable are no longer in the same domain as the original predictions.

v0.11.1

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([12, 13, 14, 15, 16, 17])
>>>
>>> metric = PrecisionRecallCurve(task="binary")
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.7500, 0.6667, 1.0000, 1.0000, 1.0000])
>>> recall
tensor([1.0000, 0.6667, 0.6667, 0.3333, 0.0000])
>>> thresholds
tensor([1.0000, 1.0000, 1.0000, 1.0000])

Note that the elements of the thresholds variable all appear as 1.0000 because the numerical differences between the threshold candidates are minimized due to the squeezing effect of the sigmoid.

It gets even worse when we increase the absolute values of the predictions to [22, 27]. The output of the sigmoid now evaluates to 1.0 for all predictions due to rounding, and the metric is not able to compute any meaningful precision and recall values.

v0.11.1

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([22, 23, 24, 25, 26, 27])
>>>
>>> metric = PrecisionRecallCurve(task="binary")
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.5000, 1.0000])
>>> recall
tensor([1., 0.])
>>> thresholds
tensor(1.)

I guess this change was made to accommodate classical binary classification problems, where the predictions are generally confidence scores in the [0, 1] range, but I feel this is too restricting for other problem classes. Mathematically there is no reason why the precision-recall curve cannot be computed using predictions that fall outside of this range.

Expected behavior

The re-mapping of the prediction values to [0,1] by applying a sigmoid function should be optional.

Environment

  • TorchMetrics 0.11.1 (pip)
  • Python 3.8
  • PyTorch 1.13.1
@djdameln djdameln added bug / fix Something isn't working help wanted Extra attention is needed labels Feb 20, 2023
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi @djdameln, thanks for reporting this issue. Sorry for not getting back to you sooner.
I open PR #1676 with the proposed solution to add a new format_input=True/False argument which essentially can be used to enable/disable the internal formatting:

    if format_input:
        if not torch.all((preds >= 0) * (preds <= 1)):
            preds = preds.sigmoid()
        target = target.long()

thus in your case simply added format_input=False when you initialize metrics should work.

On a note, you are completely right that we introduced this as our standard formatting of input to standardize all possible input for further processing. This was especially related to the inclusion of the new thresholds argument for precision recall curve, where thresholds=100 means that we need to pre-set 100 thresholds before seeing the users input and in that the only solution is to standardize everything to one format.

@djdameln
Copy link
Author

djdameln commented Apr 5, 2023

@SkafteNicki Thanks for clarifying. The proposed solution should solve our problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed v0.11.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@Borda @SkafteNicki @djdameln and others