**To view tables and graphs referred to in the errata, please log in.**

p. 33 | The URL for the dataset no longer works. Instead, go to https://data.boston.gov/dataset/property-assessment and choose Property Assessment FY2014 |

p. 58, para 3 | Text should read: "For example, in the left panel of Figure 3.2, there are about 20 tracts where the median value (MEDV) is between $5000 and $10,000." |

Fig 3.16 caption | "blue and orange" should be "light pins"; "red" should be "dark pins" |

Figures 4.8-4.11 | The first print of the textbook had an error with these four figures. See this PDF file for the correct figures. |

p. 98 | Sentence should read "High scores on principal component 1 mean that the cereal is low in calories and the amount per bowl, and high in protein and potassium." |

p. 98 and 100 | End of p.98 should read "as we move from right (bran cereals) to left"; p.100 top should read "middle-left" |

p. 102 Problem 4.2(a) | Some print editions are missing the file name. It should be Wine.xls |

p. 103 Problem 4.4 | Some print editions are missing the file name. It should be ToyotaCorolla.xls |

p. 119 | One-variable tables in Excel: bullet 3 should be "A13 to A33" (instead of "B13 to B33") |

p. 120 |
Text about ROC diagonal should read: "The comparison curve is the diagonal, which reflects the average performance of a guessing classifier that has no information about the predictors or outcome variable. This guessing classifier guesses that a proportion alpha of the records is 1's and therefore assigns each record an equal probability P(Y=1)=alpha. In this case, on average, a proportion alpha of the 1s will be correctly classified (sensitivity=alpha), and a proportion alpha of the 0s will be correctly classified (1-specificity=alpha). As we increase the cutoff value alpha from 0 to 1, we get the diagonal line Sensitivity = 1-Specificity. Note that the naive rule is one point on this diagonal line, where alpha=proportion of actual 1's. A common metric to summarize an ROC curve is area under the curve (AUC), which ranges from 1 (perfect discrimination between classes) to 0.5 (no better than random guessing)" |

p. 121 box | In the box, replace "False-Positive Rate" with "False Discovery Rate", and replace "False-Negative Rate" with "False Omission Rate" |

p. 132 |
Text about ROC diagonal should read: "The comparison curve is the diagonal, which reflects the average performance of a guessing classifier that has no information about the predictors or outcome variable. This guessing classifier guesses that a proportion alpha of the records is 1's and therefore assigns each record an equal probability P(Y=1)=alpha. In this case, on average, a proportion alpha of the 1s will be correctly classified (sensitivity=alpha), and a proportion alpha of the 0s will be correctly classified (1-specificity=alpha). As we increase the cutoff value alpha from 0 to 1, we get the diagonal line Sensitivity = 1-Specificity. Note that the naive rule is one point on this diagonal line, where alpha=proportion of actual 1's. A common metric to summarize an ROC curve is "area under the curve" (AUC), which ranges from 1 (perfect discrimination between classes) to 0.5 (no better than random guessing)" |

p. 133 | Table "Classification Matrix, Reweighted", row "Actual 1": the numerators should be 80 and 420 (not 19,180 and 5,420) |

p. 151, para 3 | Paragraph should read: "For the Toyota Corolla price example, forward selection yields exactly the same results as those found in an exhaustive search: For each number of predictors (up to 6 predictors) the same subset is chosen (it therefore gives a table identical to the one in Figure 6.4 for up to 7 coefficients)... In other words, it correctly identifies CC and Met_Color as the least useful predictors." [and delete last part "Backward elimination... Age and HP"] |

p. 151, last para | The corrected paragraph should read: "The results for stepwise selection can be seen in Figure 6.6. It chooses the same subsets as exhaustive search for subset size of one to 9 predictors. R^{2}-adj is largest at 9 predictors, and Cp also indicates the 9-predictor model is best." |

p. 152 | Delete sentence before last ("This example shows clearly that it is not always so") |

p. 153 | Problem 6.1 part (c), ignore the final text "What is the prediction error?" |

Table 8.1 | (X=1) should be under Prior Legal, (X=0) should be under No Prior Legal and Total should be in the last column. |

Table 8.3 |
Last line should read: Weather - Coded as 1 if inclement, 0 otherwise |

p. 191 | [This is a clarification] Addition: "As with k-nearest-neighbors, a predictor with m categories (m>2) should be factored into m dummies (not m-1). In addition, whether predictors are numerical or categorical, it does not make any difference whether they are standardized (normalized) or not." |

Prob 9.3 | In parts (a) and (b), replace the instruction "Keep the minimum... least restrictive." to "Set the parameters for the tree so as to produce as deep a tree as possible and obtain scores from this deep tree." In (a)(iv.) change "full tree" to "deep tree". |

Prob 9.3(a)iii | Due to the software change, replace this problem with "How might we achieve better validation predictive performance at the expense of training performance?" |

Prob 9.3(a)iv | Replace text with "Create a best pruned tree using the same data partitioning. Compared to the deeper tree, what is the predictive performance on the validation set? and on the training set?" |

Ch 10, Sec. 10.2 | Text right after eq. (10.5) should be: "a unit increase in predictor xj is associated with an average increase of e ^{βj} ×100% in the odds" |

p. 240 | Corrupted word in title is "Profiling" |

p. 252 | The acceptance score for observation 4 should be "dislike" |

p. 255 | For Output6, the last term in the exponent should be (-0.02)(0.52) |

p. 258 | For output node 6 the error is 0.481(1-0.481)(0-0.481)= -0.120 |

p. 283 | -50.58 should be -51.58 |

p. 285 | Sentence should read: "For instance, the no-injury classification score for the first accident in the training set is -24.5+(1.95)(1)+(1.19)(0) +...+ (16.36)(1) = 31.42. The nonfatal score is similarly computed as 30.93, and the fatal score as 25.94. |

p. 294, eq (13.2) | On right-hand-side, replace 1/2 with 1/4. The last term should be 2x (1/4) Cov(e1i,e2i) |

p. 300 | Paragraph after table 13.2, last sentence should read "...the lift from the flyer is 5.8" (instead of 4.8) |

p. 300, Table 13.3 | Voter 1's values for "Flyer" and "Moved_AD" should be 0 for "Flyer" and 1 for "Moved_AD" |

p. 304 | Problem 13.2 part (a) should read "setting... terminal nodes to 50" |

p. 311, Table 14.1 | Row 7 should read "red, blue" instead of "white, orange" |

p. 344 (top) | In Distance Measures for Categorical Data, replace "x_ij's" with "p measurements", and replace n with p in the table and in the Matching coefficient formula. |

p. 357, 2nd para |
"cluster 1" should be "cluster 6". "cluster 3" should be "cluster 4" |

p. 383 | In last sentence, replace "Month" with "Season" |

Ch 15-17 | Several of the time series datasets used in the problems (souvenir sales, shampoo sales, Australian wine sales) have a new source reference: Hyndman, R., and Yang, Y. Z. (2018). tsdl: Time Series Data Library. v0.1.0. https://pkg.yangzhourang.com/tsdl/ |

END ERRATA |