[include in exam: z-score table] 1. Tony is in the 10% tax bracket. Edie is in the 35% tax bracket. They each itemize their deductions and they each pay $20,000 in (fully tax-deductible) mortgage interest payments during the year. What is the difference between Tony's true costs for mortgage interest payments and Edie's true costs for mortgage interest payments? A. No difference B. $5000 C. $13,000 D. $18,000 Solution: The true cost of a mortgage payment is the amount of the payment minus the tax benefit. Since Tony is in the 10% tax bracket, a tax-deductible payment of $20,000 reduces his taxable income by $20,000, and hence reduces his tax by 10% of $20,000, which is $2000. So his true cost is $20,000 minus $2000, or $18,000. Likewise, Edie's true cost is $20,000 minus 35% of $20,000, i.e., $20,000 minus $7000, which is $13,000. The difference between $18,000 and $13,000 is $5000. so the answer is B. 2. Jane Jones, after leaving UW, started with $0 in the bank (a "graduation gift" from her parents: "We want you to have the experience of starting with nothing, just like we did"). In her first year, Jones had an after-tax income of $20,000, with total expenses of $18,000. In the second year, her after-tax income increased to $21,000, but her total expenses increased to $25,000. In the third year, her after-tax income increased to $24,000, and her total expenses decreased to $23,000. During which of these three years was Jane in debt? (If and when Jane goes into debt, she gets an interest-free loan from Grandma Jones, so she doesn't have to worry about interest payments.) A. Year 2 only B. Year 3 only C. Years 2 and 3 only D. None Solution: In each year, we compute surplus (or deficit) by subtracting expenses from after-tax income: Year 1: $20K - $18K = $2K (surplus) Year 2: $21K - $25K = -$4K (deficit) Year 3: $24K - $23K = $1K (surplus). At the end of Year 1, Jane has no debt (and has a cushion of $2K). In Year 2, her $4K deficit erases that cushion and puts her $2K into debt. In Year 3, her $1K surplus eases the $2K debt, by reducing it to $1K, but she is still in debt. (As I wrote in the solutions to practice problem #2, don't confuse deficit with debt!) So the answer is C. 3. A study is done to determine whether a certain pill helps people who suffer from migraines. 1000 people come to a clinic to receive either the pill being tested or a placebo. The pills that contain the actual drug are round, and the placebo pills are square; the doctors who give the subjects the pills know this but the subjects don't. How would you describe this study? A. Single-blind experiment B. Double-blind experiment C. Case-control observational study D. Non-case-control observational study Solution: This is an experiment, since there is a treatment (in the form of a pill) that is being applied by the researchers. Since the experimenter knows whether any given subject is part of the treatment group or the control group (because of the shape of the pill) but the subject does not, this is a single-blind experiment. So the answer is A. 4. A UW student writes: "To determine Wisconsin tax-payers' attitudes towards state subsidization of higher education, I administered a short poll in my poli sci class. 100% of the students in my class filled out the questionnaire (anonymously). 75% of the respondents said that their parents would favor state tax increases if the extra funds went to higher education. Hence I believe that a tax-hike earmarked for purposes of higher education would almost certainly be supported by a majority of Wisconsin's tax-payers." What is most likely to be the main problem with this study? A. Selection bias B. Participation bias C. Confounding variables D. Setting and wording Solution: There are two main sources of selection bias: Some of the students' parents are not from Wisconsin, and the parents of a college-age student are more likely to support state subsidies for higher education than the average tax-payer. So the best answer is A. There is no participation bias, since participation bias only arises when (a) some of the people that the pollster tried to include in the sample opted not to participate, and (b) the people who chose to participate have a different profile (vis-a-vis the things that the pollsters are trying to measure) than the people who chose not to participate. If 50% of the students in the class had participated and 50% had not, then we might worry that there was participation bias. But since 100% of the students whom the pollster tried to sample participated, there's no potential for participation bias. It remains possible that the *reason* why everyone in the class participated is that they all care strongly about the issue (after all, they're in a political science class), and that might mean that their parents also care more strongly about the issue, so the parents might vote in atypical ways. But in that case we're back to selection bias: students in a political science class may have been the wrong people to ask if we want information about the typical voter. Note that if we'd done the poll in an art history class, and only 50% of the students participated, then we could say that there were two significant sources of bias: selection bias (the pollster chose to poll an art history class: why?), and participation bias (some of the students in the class took the survey and others didn't: why?). But since the pollster asked a group in which everyone participated, there's no participation bias --- just selection bias. If the survey hadn't been anonymous, then it's possible that the students would have given the answer they thought the professor agreed with, so that they'd get a better grade. That would be an example of bias due to setting. But that doesn't apply here since the survey was anonymous. 5. One of the bars in this histogram has gotten erased. What proportion of students got a C or higher on the exam? [show figure from page 332, with the middle bar erased] A. About 36% B. About 56% C. About 80% D. Any of the above could be correct; there is not enough information Solution: Since 12% of the students got a D and 8% of the students got an F, a total of 12%+8%=20% of the students got a D or an F, and hence 100%-20%=80% of the students got A, B, or C; that is, 80% of the students got a C or higher. (You could also compute this number by using the histogram to compute what proportion of the students DIDN'T get a C, then subtracting this from 100% to find out what proportion of the students DID get a C, and then adding the proportion of students who got an A or a B. But the method I used here involves less calculation and is more accurate.) So the answer is C. (If you correctly computed that 36% of the students got a C and then carelessly wrote "A" as your answer, you'll get half credit. When you're taking an exam and you've got the time to check your work, it pays to spend time re-reading the question as well, to make sure that you haven't missed short phrases like "or higher", or small words like "not", that can totally change the answer.) 6. Examine the stack plot shown below. [show figure from page 345] What happened to the combined death rate for cardiovascular illness and tuberculosis over the period from 1900 to 1950? A. It increased significantly. B. It decreased significantly. C. It stayed roughly constant. D. One can't tell (there is not enough information). Solution: The combined death rate for cardiovascular illness and tuberculosis is given by the green and blue regions. Specifically, it's given by the difference between the two of the green region and the bottom of the blue region. Since the slope of the top of the green region is about the same as the slope of the bottom of the blue region, the combined death rate did not change very much. The best answer is C. You could also work out the numbers in detail from the plot: The death-rate per 100,000 for cardiovascular illness went from 620-250=370 in 1900 to 670-160=510 in 1950, the death-rate per 100,000 for tuberculosis went from 250-70=180 in 1900 to 160-140=20 in 1950, so the combined rate for the two illnesses went from 370+180=550 in 1900 to 510+20=530 in 1950. Or you could treat the green and blue regions as one big region, and save yourself some trouble: the thickness of the green-blue region went from 620-70=550 in 1900 to 670-140=530 in 1950. Note that, according to these calculations the combined death rate from cardiovascular illness and tuberculosis went DOWN very slightly. We can also see this from the picture: the bottom border of the blue region is slightly steeper than the top of the green region. But you have to squint and/or use measuring instruments to see this, so there's no significant increase or decrease. Note that you'll get full credit if you correctly calculated that the combined death rate went down slightly but gave the answer B instead of C because you thought that the 4% decrease was significant. However, if you thought that the combined death rate went up, you probably used an incorrect method of calculation. 7. Examine the following scatter diagram, in which the horizontal axis measures how many years of post-high-school education a person has had and the vertical axis measures how many times per week a person has sex. (I made up the data, but they are consistent with reality; see http://www.mydna.com/health/sexual/relationship/sex_america.html .) ___________________________________________ 5| | | * | 4| | | | 3| * * | | * | 2| * * | | * | 1| * | | | |___________________________________________| 0 1 2 3 4 5 6 7 8 Assuming the data are correct, what sort of correlation exists between how many years of post-high-school education a person has had and how many times per week a person has sex? A. Positive correlation B. Negative correlation C. No correlation D. Question does not make sense because there is no cause-and-effect relationship between sex and education Solution: There is a clear downward trend; the line that best fits the data goes through the upper-left corner and the lower-right corner. Note that answer D is wrong because a mathematical correlation can exist even when there is no cause-and-effect relationship. So the answer is B. 8. Consider a data set whose histogram is as follows: 7 | ___ 6 | ___ | | 5 | | | --- | | 4 | | | | | | | 3 | | | | | | | 2 | --- | | | | | | ___ 1 | | | | | | | | | | | ----------------------------------------- 1 2 3 4 5 (The values on the horizontal axis are the data, and the values on the vertical axis are the frequencies.) What is the relationship between the median and the mode of the data set? A. The median is greater than the mode B. The median is smaller than the mode C. The median equals the mode D. It is impossible to tell from the information provided Solution: This is a nearly symmetrical distribution, so the median should be nearly the middle value, which is 3. (More specifically, writing the data set out in full as 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5 we see that the median is 3.) But the mode is 4, so the median is smaller than the mode, and the answer is B. 9. Compare the standard deviations of two data sets called "X" and "Y": Data set X: 2 ft., 2 ft., 3 ft., 4 ft., 4 ft. Data set Y: 2 in., 2 in., 14 in., 26 in., 26 in. (Recall that 1 ft. = 12 in.) Which data set has larger standard deviation? A. Data set X has larger standard deviation than data set Y B. Data set Y has larger standard deviation than data set X C. The two data sets have the same standard deviation D. It is impossible to compare Solution: For data set X, the mean is (2+2+3+4+4 ft.)/5 = 3 ft., so the deviations are -1 ft., -1 ft., 0 ft., +1 ft., and +1 ft. For data set Y, the mean is (2+2+14+26+26 in.)/5 = 14 in., so the deviations are -12 in., -12 in., 0 in., +12 in., and +12 in. But those are the same deviations as we got in data set X. So when we apply the formula for standard deviation, we will get the same answer both ways. (We could also work this out in detail, but then calculation errors are more likely.) So the answer is C. 10. Suppose that the scores of some students on some test are governed by a normal distribution. The mean score of the students was 85 points, and the standard deviation was 2.5 points. About how many of the students scored between 80 and 90 points? A. 68% B. 80% C. 90% D. 95% Solution: 80 points is 2 standard deviations below the mean and 90 points is 2 standard deviations above the mean. According to the 68%-95%-99.7% rule, about 95% of the data in a normally-distributed data set lie within 2 standard deviation of the mean. Hence about 95% of the students scored between 80 and 90 points, and the answer is D. 11. A marketing company claims that the mean household income of SUV owners across the population is greater than $80,000. A random sample of 1700 households with SUVs shows that the mean household income is $82,364. Assuming that the true mean is $80,000, the probability of selecting a sample of size 1700 with a mean income of $82,364 or more is 0.007. What should the marketing company do? A. They should reject the null hypothesis with a statistical significance at the 0.01 level. B. They should reject the null hypothesis with a statistical significance at the 0.05 level (but not at the 0.01 level). C. They should fail to reject the null hypothesis. D. They should accept the null hypothesis. Solution: The null hypothesis is "The true mean household income of households with SUVs is $80,000". The probability of seeing an outcome as extreme as what was observed (a sample mean of $82,364 or more), if the null hypothesis were true, is only 0.007. Since 0.007 is less than 0.01, the observed outcome of the survey is inconsistent with the null hypothesis. Hence they should reject the null hypothesis with a statistical significance at the 0.01 level. So the answer is A. 12. A math class holds a carnival to celebrate the end of the semester, and at the carnival there is a gambling game. When you play this game, 10% of the time, your profit is $2; 20% of the time, your profit is $1; 30% of the time, your profit is $0; and 40% of the time, your profit is -$1. How much money do you win/lose on average per game? A. You break even on average B. You win twenty cents per game on average C. You lose twenty cents per game on average D. You lose forty cents per game on average Solution: The expected value of this game is (10% * $2) + (20% * $1) + (30% * $0) - (40% * $1) = $(.2+.2+.0-.4) = $0. So the answer is A.