The text book is
Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner 3rd edition
authors are Galit Shmueli, Peter C. Bruce, Nitin R. Patel
The Question I need answered
I. Exercise 6.3: Predicting Airfares on New Routes (p. 154-156).
a. Explore the numerical prediction and response (FARE) by creating a correlation table, and examining some scatterplots between FARE, and those prediction. What seem to be the best single predictor of FARE?
b. Explore the categorical predictors (excluding the first four) by computing the percentage of flights in each category. Create a pivot table with the average fare in each category. Which category predictor seems best for predicting FARE?
c. Find a model for predicting the average fare on a new route
i. Convert categorical variables (e.g. S.W) into the dummy variables. Then partition the data into training and validation sets. The model will be fit to the training data and evaluated on the validation set.
ii. Use stepwise to reduce the number of predictors. You can ignore the first four predictors (S_CODE, S_CITY, E_CODE, E_CITY). Report the estimated model selected.
iii. Repeat (ii) using exhaustive search instead of stepwise regression. Compare the resulting best model to the one you obtained in (ii) in terms of the predictors that are in the model.
Iv Compare the predictive accuracy of both models (ii) and (iii) using measures such as RMSE and average error and lift charts.
v. using model iii, predict the average fare on a route with the following characteristics: COUPON = 1.202, NEW =3, VACATION = No, SW = NO, HI = 4442.141, S_INCOME=$28,760, E_INCOME = $27,664, S_POP = 4,557,004, E_POP = 3,195,503, SLOT= Free, GATE = FREE, PAX= 12,782, DISTANCE= 1976 miles.
vi. Using model iii, predict the reduction in average fare on the route in (v) if Southwest decides to cover this route.
vii. In reality, which of the factors will not be available for predicting the average fare from a new airport (i.e., before flights start operating on those routes)? Which ones can be estimated? How?
viii. Select a model that includes only factors that are available before flights begin to operate on the new route. Use an exhaustive search to find such a model.
ix. Use the model in (iii) to predict the average fare on a route with characteristics COUPON- 1.202, NEW = 3, VACATION = NO, SW= NO, HI =4442.141, S_INCOME = $28,760,
E_INCOME = $27,664, S_POP = 4,557,004, E_POP= 3,195,503, SLOT = FREE, GATE = FREE, PAX = 12,782, DISTANCE= 1976 miles.
x. Compare the predictive accuracy of this model with model iii. Is this model good enough, or is it worthwhile re-evaluating the model once flights begin on the new route?
d. In competitive industries, a new entrant with a novel business plan can have a disruptive effect on existing firms. If a new entrant’s business model is sustainable, other players are forced to respond by changing their business practices. If the goal of the analysis was to evaluate the effect of the Southwest Airlines presence on the airline industry rather than predicting fares on new routes, how would the analysis be different? Describe technical and conceptual aspects.