Learn Before
Building a regression tree
-
Use recursive binary splitting to grow a large tree on the training data, stopping only when each terminal node has fewer than some minimum number of observations.
-
Apply cost complexity pruning to the large tree in order to obtain a sequence of best subtrees, as a function of α.
-
Use K-fold cross-validation to choose α. That is, divide the training observations into K folds. For each k = 1, . . . , K: (a) Repeat Steps 1 and 2 on all but the kth fold of the training data. (b) Evaluate the mean squared prediction error on the data in the left-out kth fold, as a function of α. Average the results for each value of α, and pick α to minimize the average error.
-
Return the subtree from Step 2 that corresponds to the chosen value
0
5
Tags
Data Science