Decision tree algorithms such as ID3 provide a convenient way to show all of the possible outcomes of a decision. Decision trees can be used for either classification or regression. Below are some of the advantages of using decision trees as opposed to other types of machine learning models.
One of the things that I like about decision trees is that you can easily explain the model to somebody who has a non-technical background. Decision trees create straightforward if-then-else rules which could be communicated to a boss, project manager, product manager, or outside stakeholder.
Contrast decision trees with other more black box-like machine learning algorithms such as logistic regression, neural networks, or reinforcement learning method, and you can see that decision trees would provide a refreshing level of transparency not always common in machine learning.
No Large Data Requirement
If you take a look at an algorithm like the k nearest neighbors algorithm, which classifies an unseen instance based on instances that are most similar to that instance, you need a lot of data in order to get accurate results. The more data you have, the better.
However, there may be certain instances or certain problems when a lot of data is not available. The benefit of using decision trees is that you do not not need a lot of data in order to create something useful.
Best and Worst Case
In some settings, you want to be able to determine a worst case, a best case, and a management case. With a decision tree, you can easily see all of the possible outcomes. Each test instance gets put into one of the outcomes, so you pretty much know what to expect ahead of time. Even outliers don’t phase a decision tree.
Continuous and Discrete Data
Decision trees can handle both continuous and discrete data depending on which decision tree model you use. In contrast, many machine learning algorithms can only handle either continuous or discrete data, but not both.
Decision trees can capture nonlinear relationships.
Classifying a test instance is fast and just depends on the depth of the tree.
Because of the way decision trees are computed using the Information Gain, irrelevant attributes are handled with ease.