Having built a decision tree, we are using reduced error pruning to reduce the size of the tree. We select a node to collapse. For this particular node, on the left branch, there are 3 training data points with the following outputs: 5, 7, 9.6 and for the right branch, there are four training data points with the following outputs: 8.7, 9.8, 10.5, 11. What were the original responses for data points along the two branches (left & right respectively) and what is the new response after collapsing the node?

10.8, 13.33, 14.48
10.8, 13.33, 12.06
7.2, 10, 8.8
7.2, 10, 8.6

The correct answer is C. 7.2, 10, 8.8.

Reduced error pruning is a technique used in decision tree learning to reduce the size of the tree by removing nodes that do not contribute significantly to the accuracy of the tree. To do this, the algorithm first calculates the error rate of each node in the tree. The error rate is the percentage of data points that are misclassified by the node. The node with the highest error rate is then removed from the tree.

In this case, the node with the highest error rate is the node with the three data points (5, 7, and 9.6). The error rate of this node is 33.33%, since one of the data points (5) is misclassified. Therefore, this node is removed from the tree.

After the node is removed, the two branches are merged into a single branch. The new response for this branch is the average of the responses of the data points in the two branches. The responses of the data points in the left branch are 5, 7, and 9.6, and the responses of the data points in the right branch are 8.7, 9.8, 10.5, and 11. The average of these responses is 8.8. Therefore, the new response after collapsing the node is 8.8.

Option A is incorrect because it is the average of the responses of the data points in the right branch only. Option B is incorrect because it is the average of the responses of the data points in the left branch only. Option D is incorrect because it is not a valid response.

Exit mobile version