on
7 mins to read.
Multiclass Classification - One-vs-Rest / One-vs-One
Although many classification problems can be defined using two classes (they are inherently multi-class classifiers), some are defined with more than two classes which requires adaptations of machine learning algorithm.
Logistic Regression can be naturally extended to multi-class learning problems by replacing the sigmoid function with the softmax function. The KNN algorithm is also straightforward to extend to multiclass case. When we find $k$ closest examples using a distance metric such as Euclidean Distance, for the input $x$ and examine them, we return the class that we saw the most among the $k$ examples. Multi-class labeling is also trivial with Naive Bayes classifier.
SVM cannot be naturally extended to multi-class problems. Other algorithms can be implemented more efficiently in the binary case. What should you do if you have a multi-class problem but a binary classification learning algorithm?
One common strategy is called One-vs-All (usually referred to as One-vs-Rest or OVA classification). The idea is to transform a multi-class problem into C binary classification problem and build C different binary classifiers. Here, you pick one class and train a binary classifier with the samples of selected class on one side and other samples on the other side. Thus, you end up with C classifiers. While testing, you simply classify the sample as belonging to the class with maximum score among C classifier. For example, if we have three classes, $y \in \{1, 2, 3\}$, we create copies of the original dataset and modify them. In the first copy, we replace all labels not equal to 1 by 0. In the second copy, we replace all labels not equal to 2 by 0. In the third copy, we replace all labels not equal to 3 by 0. Now we have three binary classification problems where we have to learn to distinguish between labels 1 and 0; 2 and 0; and 3 and 0. Once we have the three models, to classify the new input feature vector, we apply the three models to the input and we get three predictions. We then pic the prediction of a non-zero class which is the most certain.
Another strategy is One-vs-One (OVO, also known as All-versus-All or AVA). Here, you pick 2 classes at a time and train a binary classifier using samples from the selected two-classes only (other samples are ignored in this step). You repeat this for all the two-class combinations. So, you end up with $\frac{C (C-1)}{2}$ number of classifiers. At prediction time, a voting scheme is applied: all $C (C − 1) / 2$ classifiers are applied to an unseen sample and the class that got the highest number of “+1” predictions gets predicted by the combined classifier. All-versus-all tends to be superior to one-versus-all.
A problem with the previous schemes is that binary classifiers are sensitive to errors. If any classifier makes an error, it can affect the vote count.
In One-vs-One scheme, each individual learning problem only involves a small subset of data whereas with One-vs-All, the complete dataset is used number of classes
times.
OneVsRestClassifier of Sci-kit Learn
OneVsRestClassifier
is designed to model each class against all of the other classes independently, and create a classifier for each situation. The way I understand this process is that OneVsRestClassifier
grabs a class, and creates a binary label for whether a point is or isn’t that class. Then this labelling gets fed into whatever estimator you have chosen to use. I believe the confusion comes in in that SVC also allows you to make this same choice, but in effect with this implementation the choice will not matter because you will always only be feeding two classes into the SVC.
- https://stackoverflow.com/a/43506826/1757224
- https://stackoverflow.com/questions/39604468/what-is-the-difference-between-onevsrestclassifier-with-svc-and-svc-with-decisio?rq=1
REFERENCES
- https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsOneClassifier.html
- https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html
- https://scikit-learn.org/stable/modules/multiclass.html
- http://www.stat.ucdavis.edu/~chohsieh/teaching/ECS289G_Fall2015/lecture9.pdf
- https://gemfury.com/stream/python:scikit-learn/-/content/tests/test_multiclass.py