In machine learning, two-class classification problems are when we want to predict the class of an event given some attributes (e.g. spam/not spam, sick/not sick, pregnant/not pregnant, cancer/not cancer, etc.). We keep track of the prediction accuracy by constructing a confusion matrix and make a tally of the number of true positives, false positives, false negatives, and true negatives.
Here are some useful statistical metrics for two-class classification problems:
Accuracy: The proportion of all instances that are correctly predicted.
- Accuracy = (TP + TN)/(TP + TN + FP + FN)
Specificity: The proportion of actual negatives (i.e. 0) that were correctly predicted as such (e.g., the percentage of people who are healthy who were correctly predicted as being healthy).
- Specificity = (TN/(TN + FP))
Precision: The proportion of all positive predictions that were correct (e.g. the percentage of people who were predicted to have the disease and actually had the disease).
- Precision = (TP/(TP + FP))
Recall: The proportion of actual positives that were correctly identified as such (e.g., the percentage of people who have the disease who were correctly predicted to have the disease).
- Recall = (TP/(TP + FN))
Negative Predictive Value: The proportion of all negative predictions that were correct (e.g. the percentage of people who were predicted to be healthy who actually are healthy).
- Negative Predictive Value = (TN/(TN + FN))
Miss Rate: The proportion of actual positives that were predicted to be negative (e.g. the percentage of people who have the disease that were predicted to be healthy).
- Miss Rate = (FN/(FN + TP))
Fall-Out: The proportion of actual negatives that were predicted to be positive (e.g. the percentage of people who are healthy that were predicted to have the disease).
- Fall-Out = (FP/(FP + TN))
False Discovery Rate: The proportion of all positive predictions that were incorrect (e.g. the percentage of people who were predicted to have the disease that are actually healthy).
- False Discovery Rate = (FP/(FP + TP))
False Omission Rate: The proportion of all negative predictions that were incorrect (e.g. the percentage of people who were predicted to be healthy that actually have the disease).
- False Omission Rate = (FN / (FN + TN))
F1 Score: Measures prediction accuracy as a function of precision and recall. An F1 score of 1 is good…perfect precision and recall. An F1 score of 0 is bad, the worst-case precision and recall.
- F1 Score = ((2TP)/(2TP + FP + FN))