Errata (Python Edition)

To view tables and graphs referred to in the errata, please log in.

Chap 2, Table 2.6 (code) remove: "# use drop_first=True to drop the first dummy variable
housing_df = pd.get_dummies(housing_df, prefix_sep='_', drop_first=True)"

add: "# the missing values will create a third category
# use the arguments drop_first and dummy_na to control the outcome
housing_df = pd.get_dummies(housing_df, prefix_sep='_', dtype=int)"

Chap 3, Fig 3.6 Right panel bar charts should not use multiple colors 
Chap 5, Fig 5.1 (code) should be: boxdata_df = pd.concat([pred_error_train, pred_error_valid])
Chap 5, Fig 5.2 code update: https://github.com/gedeck/dmba/issues/11
Chap 5, p. 140 should be: "As we increase the cutoff value 1-alpha from 0 to 1..."
Chap 5, mid p. 147 should be: "we see that taking 10% of the records ... selection of 10% of the records."
Chap 5, Fig 5.10, 5.11 "Classify as 'x'" should be at bottom and "Classify as 'o'" should be at top
Chap 7, Table 7.2 should be: outcome = 'Ownership'
Chap 8, Table 8.5 should be: pd.set_option('display.precision', 4)
Chap 9, footnote 1 Footnote 1 should be removed
Chap 10, after eq (10.5) should be: "a unit increase in predictor xj is associated with an
average increase of eβj ×100% in the odds"
Chap 10, eq. (10.8) in denominator, the first term in exponent, 6.04892 should not have a minus sign
Chap 10, Table 10.2 (code) should be: "bank_df.Education.cat.rename_categories(new_categories)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True, dtype=int)"
Chap 10, Fig 10.3 code update for Fig 10.3: https://github.com/gedeck/dmba/issues/11
Chap 11, pp. 290 In Back Propagation of Error,  the text should read: "in Figure 11.3, for a person with output class “like” we have y6 = 1)."
Chap 11, pp. 295  MLPClassify() should be  MLPClassifier()
Chap 11, pp. 297-298 (twice) MLPCRegressor()  should be MLPRegressor()
Chap 11, Table 11.2 should be: hidden_layer_sizes=[3]
Chap 11, Table 11.6 should be: hidden_layer_sizes=[2]
Chap 12, eq 12.2 Formulas should have square-root
Chap 12, Table 12.4 (code) should be: "fct = pd.concat([
    pd.DataFrame([lda_reg.intercept_], columns=lda_reg.classes_, index=['constant']),
    pd.DataFrame(lda_reg.coef_.transpose(), columns=lda_reg.classes_, 
                              index=list(accidents_df.columns)[:-1])])"
Chap 12, problem 12.3 d+e should be "Compute the intercept of the classification function"
Chap 14, Table 14.12 should be: "print('Top-4 recommended items for each user')"
Chap 14, problem 14.3 replace "You will get a Null matrix." with "All recommendations will be 1."
Chap 17, Fig 17.1 code replace "# shorter and longer time series" with "# plot the time series"
Chap 17, Fig 17.7 add to end of caption: "Autocorrelation plot for lags 1--12 (for first 24 months of Amtrak ridership, 95% confidence region is blue shaded)"
Chap 17, Table 17.8 (code) should be: "train_res_arima = ARIMA(train_lm_trendseason.resid, order=(1, 0, 0), freq='MS', trend='c').fit()
forecast = train_res_arima.get_forecast(1)
conf_int = forecast.conf_int()"
Chap 17, Table 17.9 (code) should be: sp500_arima = ARIMA(sp500_ts, order=(1, 0, 0)).fit()
Chap 17, p. 435 add clarification after "is at lag 6 and is negative (exceeding the 95% confidence interval). Autocorrelations that fall outside the confidence interval point to possible model improvement."
Chap 19, mid p. 481 should be "by the number of all possible shortest paths between the other nodes (n-1)(n-2)/2"
Chap 21, case 21.1, p. 521 should be: "The full set of 16 predictors in the dataset"
Chap 21, case 21.2 should be: "Use this vector to create a cumulative gains chart for the validation set that incorporates the net profit."
Chap 21, case 21.5, p. 536 In second bullet, delete sentence "The data file contains... date/time field"
Chap 21, case 21.5, p. 536 first line: NULL should be NaN
END ERRATA