21 Boxplot A graph of the five-number summary; A box spans the quartiles, with an interior line marking the median. Lines extend out from this box to the extreme high and low observations. Correlation A measure of the direction and strength of the linear relationship between two variables; Correlations take values between 0 (no linear relationship) and 61 (perfect straight-line relationship). Distribution The pattern of outcomes of a variable; The distribution describes what values the variable takes and how often each value occurs. Exploratory data analysis The practice of examining data for unanticipated patterns or effects, as opposed to seeking answers to specific questions. Five-number summary A summary of a distribution of values consisting of the median, the first and third quartiles, and the largest and smallest observations. Histogram A graph of the distribution of outcomes (often divided into classes) for a single variable; The height of each bar is the number of observations in the class of outcomes covered by the base of the bar; All classes should have the same width. Individuals The people, animals, or things described by a data set. Least squares regression line A line drawn on a scatterplot that makes the sum of the squares of the vertical distances of the data points from the line as small as possible; The regression line can be used to predict the response variable y for a given value of the explanatory variable x. Mean The ordinary arithmetic average of a set of observations; To find the mean, add all the observations and divide the sum by the number of observations summed. Median The midpoint of a set of observations; Half the observations fall below the median and half fall above. Outlier A data point that falls clearly outside the overall pattern of a set of data. Quartiles The first quartile of a distribution is the point with 25% of the observations falling below it; the third quartile is the point with 75% below it. Regression line Any line that describes how a response variable y changes as we change an explanatory variable x; The most common such line is the least squares regression line. Response variable, explanatory variable A response variable measures an outcome of a study; An explanatory variable attempts to explain the observed outcomes. Scatterplot A graph of the values of two variables as points in the plane; Each value of the explanatory variable is plotted on the horizontal axis and the value of the response variable for the same individual is plotted on the vertical axis. Skewed distribution A distribution in which observations on one side of the median extend notably farther from the median than do observations on the other side; In a right-skewed distribution, the larger observations extend farther to the right of the median than the smaller observations extend to the left. Standard deviation A measure of the spread of a distribution about its mean as center; It is the square root of the average squared deviation of the observations from their mean. Stemplot A display of the distribution of a variable that attaches the final digits of the observations as leaves on stems made up of all but the final digit. Symmetric distribution A distribution with a histogram or stemplot in which the part to the left of the median is roughly a mirror image of the part to the right of the median. Variable Any measured characteristic of an individual. Variance A measure of the spread of a distribution about its mean; It is the average squared deviation of the observations from their mean; The square root of the variance is the standard deviation.