Exercises Unit D - Applied

Bachelor’s Degree Programme in Philosophy, International and Economic Studies, Ca’ Foscari University of Venice.

Author

Affiliation

Aldo Solari

Department of Economics, Ca’ Foscari University of Venice

Homepage

Chapter 8 Exercise 7.

In the lab, we applied random forests to the Boston data using mtry = 6 and ntree = 25 and 500. Create a plot showing the test error from random forests on this data set over a broader range of values for mtry and ntree. You may model your plot after Figure 8.10.

Describe the results you obtain.

Chapter 8 Exercise 8.

In the lab, a classification tree was applied to the Carseats data set after converting Sales into a qualitative response variable. Now we will instead predict Sales using regression trees and related methods, treating the response as a quantitative variable.

Split the data set into a training set and a test set.
Fit a regression tree to the training set. Plot the tree and interpret the results. What test MSE do you obtain?
Use cross-validation to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?
Apply the bagging approach to this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.
Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

Chapter 8 Exercise 10.

We now use boosting to predict Salary in the Hitters data set.

Remove observations for which the salary information is missing, and log-transform Salary.
Create a training set consisting of the first 200 observations and a test set consisting of the remaining observations.
Perform boosting on the training set with 1,000 trees for a range of values of the shrinkage parameter λ. Produce a plot with shrinkage values on the x-axis and the corresponding training set MSE on the y-axis.
Produce a plot with shrinkage values on the x-axis and the corresponding test set MSE on the y-axis.
Compare the test MSE of boosting with the test MSE obtained from applying two regression methods from Chapters 3 and 6.
Which variables appear to be the most important predictors in the boosted model?
Now apply bagging to the training set. What is the test set MSE for this approach?

Chapter 8 Exercise 12.

Apply boosting, bagging, and random forests to a data set of your choice. Be sure to fit the models on a training set and to evaluate their performance on a test set. How accurate are the results compared to simple methods like linear regression? Which of these approaches yields the best performance?