yzhao062/pyod

Question regarding precision@n and roc(@n?)

Open

#120 opened on Jul 2, 2019

View on GitHub
 (12 comments) (2 reactions) (0 assignees)Python (7,762 stars) (1,308 forks)batch import
enhancementgood first issuehelp wanted

Description

Hello,

first and foremost, thank you for building this wrapper it is of great use for me and many others.

I have question regarding the evaluation: Most outlier detection evaluation settings work by setting the ranking number n equal the number of outliers (aka contamination) and so did I in my experiments.

My thought concerning the ROC and AUC score was:

  1. Don't we have to to rank the outlier scores from highest to lowest and evaluate ROC only on the n numbers. Thus, needing a ROC@n curve?
  2. Why do people use ROC and AUC for outlier detection problems which by nature are heavily skewed and unbalanced. Hitting a lot of true negatives is easy and guaranteed, if the algorithms knows that there only n numbers of outliers.

In my case the precision@n of my chosen algorithms are valued in the range of 0.2-0.4 because it is a difficult dataset. However, the AUC score is quite high at the same.

I would appreciate any thoughts on this since I am fairly new to the topic and might not grasp the intuition of the ROC curve for this task.

Best regards

Hlam

Contributor guide