Open source textbook. A professor using an open source introductory statistics book predicts that 60% of the students will purchase a hard copy of the book, 25% will print it out from the web, and 15% will read it online. At the end of the semester he asks his students to complete a survey where they indicate what format of the book they used. Of the 126 students, 71 said they bought a hard copy of the book, 30 said they printed it out from the web, and 25 said they read it online. (a) State the hypotheses for testing if the professor’s predictions were inaccurate. (b) How many students did the professor expect to buy the book, print the book, and read the book exclusively online? (c) This is an appropriate setting for a chi-square test. List the conditions required for a test and verify they are satisfied. (d) Calculate the chi-squared statistic, the degrees of freedom associated with it, and the p-value. (e) Based on the p-value calculated in part (d), what is the conclusion of the hypothesis test? Interpret your conclusion in this context.

The objective of the professor is to test if the observed data follows the percentages he expected. The best way to analyze this is through a Goodness to Fit Chi-Square test.

a) The hypotheses are:

H₀: P(A)= 0.60; P(B)= 0.25; P(C)= 0.15

H₁: The observed frequencies don't adjust to the theoretical model

α: 0.05

b) For this test you can calculate the expected frequencies for each variable using the formula:

[tex]E_i= n* P_i[/tex]

[tex]E_A= n * P_A= 126*0.60= 75.6\\[/tex]

[tex]E_B= n * P_B= 126*0.25= 31.5[/tex]

[tex]E_C= n * P_C= 126* 0.15= 18.9[/tex]

Note, if the expected frequencies for all the variables are calculated corrrectly then ∑Ei= n, in this case

75.6+31.5+18.9= 126 ⇒ Is a quick way to check if the calculatios are well done, especially when you are working with more categories.

c) The conditions for this test are:

-All observartions should be independent

-The expected frequencies for all categories should be greater than 5

Ei>5

d) The statistic is:

[tex]X^2= sum \frac{(O_i-E_i)^2}{E_i} ~~X^2_{k-1}[/tex]

k= number of categories of the variable

[tex]X^2_{H_0}= \frac{(71-75.6)^2}{75.6} +\frac{(30-31.5)^2}{31.5} +\frac{(25-18.9)^2}{18.9}= 2.32[/tex]

This test is always one-tailed to the right, wich means that you will reject the null hypothesis to big values of X²

The critical value is [tex]X^2_{k-1;1-\alpha }= X^2_{2;0.95}= 4.303[/tex]

The rejection regios is [tex]X^2_{H_0} \geq 4.303[/tex]

The p-value for this test is also one-tailed to the right:

P(X₂²≥2.32)= 1 - P(X₂²<2.32)= 1 - 0.6866 = 0.3134

e) The decision rule using the p-value approach is:

If p-value ≥ α, the decision is to reject the null hypothesis.

If the p-value < α, the decision is to not reject the null hypothesis.

In this case the p-value: 0.3134 is less than the significance level I've selected α: 0.05, so the decision is to not reject the null hypothesis.

Using a level of significance of 5%, there is no significant evidence to reject the null hypothesis. Then you can conclude that the book format the students choose to use for the introductory statistics course follows the theoretical percentages expected by the professor.

I hope this helps!