Making Sense of the ROC Curve
So today I would like to discuss a little about the ROC curve, or the Receiver Operating Curve, where in the horizontal axis we plot the True Positive Rate, and in the vertical axis we plot the False Positive Rate.
Given the following data points where we want to classify between the reds and the blues, we want to come up with a number that describes what a good, bad and perfect split is between 0 and 1, where 1 corresponds to a perfect split.
What we do is get the true positive and false positive rates at all possible splits. The two extremes are consistently either 1,1 (all are classified positives) or 0,0 (all are classified negatives).
We plot all these numbers in a curve and calculate the area under the curve. The area under the curve of a good split is shown below.
To summarize, given 3 possible splits for the dataset (good, bad, and perfect), we can give a value of how nicely our model has split the data by looking under the ROC curve. Hence, the closer the area under the ROC curve is to 1, the better our classifier is!
Comments
Post a Comment