Wading Through the Data Swamp:
Program Evaluation 201
Glossary
- Attrition
- A gradual, natural reduction in membership or personnel, as through study participation drop-out.
- Best-fit line
- Also known as a regression line, a best-fit line is a straight line that passes through or as near to as many of the data points as possible. By drawing such a line, we attempt to minimize the effects of random errors in measurement. The best-fit line also tells you the approximate relationship between variables. For example, a straight line going up and to the right indicates a positive linear relationship.
- Chance
- An unexpected, random or unpredictable event.
- Chi-square
- A statistic used in testing a hypothesis concerning the discrepancy between observed and expected results.
- Correlation
- A synonym for association or the relationship between variables.
- Degrees of freedom
- Any of the unrestricted, independent random variables that make up a statistic.
- Dependent variable
- A variable that may, it is believed, be predicted by or caused by one or more other variables called independent variables. FOR EXAMPLE, if it is hypothesized that the treatment will reduce rearrest for drug us, then "rearrest for drug use" is the dependent variable, which is impacted by the independent variable or treatment.
- Descriptive statistics
- A statistic used to describe a set of cases upon which observations were made. Descriptive statistics include quantitative information such as correlation coefficients, measures of central tendency, and measures of variability that describes and summarizes specific measures.
- Direction of a relationship
- Varying in the same manner as another quantity (two variables) - increasing as the other variable increases (positive) or decreases as another variable increases (negative)
- Equality of variance
- When the standard deviations of the two groups are fairly equal.
- Expected frequencies
- When using contingency tables, the expected frequencies are the frequencies that you would predict or expect in each cell of the table if you knew only the rows were independent.
- Extreme scores
- Scores that are aberrant or do not fit with other scores: scores that, compared to others, are at the extremes on relevant dimensions.
- GPRA
- The Government Performance and Results Act of 1993 seeks to shift the focus of government decisionmaking and accountability away from a preoccupation with the activities that are undertaken - such as grants dispensed or inspections made - to a focus on the results of those activities, such as real gains in employability, safety, responsiveness, or program quality.
- Independent t-test
- used to determine whether the mean value of a continuous outcome variable in one group differs significantly from that in another group, eg, comparing the effect on lowering blood pressure (a continuous outcome variable) of drug A versus drug B.
- Independent variable
- A variable that may, it is believed, predict or cause fluctuation in an dependent variable. FOR EXAMPLE, if it is believed that age influences the frequency of delinquent behavior, age is the independent variable and frequency of delinquent behavior is the dependent variable.
- Interval data
- A quantitative measure with equal intervals between categories, but with no absolute zero. FOR EXAMPLE, IQ scores.
- Loss to follow-up
- Loss to followup occurs when a participant has not dropped out of the program but was not there the day outcome measures were collected.
- Mean
- The arithmetic average of a set of scores and the most widely used statistic for describing central tendency. The mean is a measure of central tendency, the arithmetic average; a statistic used primarily with interval-ratio variables following symmetrical distributions (e.g., the average age or average height of a group of middle school students).
- Median
- A statistic describing the point at which 50 percent of the observed data fall below. A measure of central tendency that is preferred when the distribution of cases is highly skewed since the median is not sensitive to outliers. The median is the value of the case marking the midpoint of an ordered list of values of all cases; it is a statistic used primarily with ordianl variables and asymmertically distributed interval-ratio variables.
- Mode
- A measure of central tendency, the value of a variable that occurs most frequently; a statistic used primarily with nominal variables. The mode is the most frequently occurring score.
- Nominal data
- A quantitative variable whose attributes have no inherent order. FOR EXAMPLE, "sex" or "race."
- Non-parametric statistics
- Statistics that do not make assumptions about parameters of the population being tested, such as the normality of the data or the homogeneity of the variance.
- Normality assumption
- Assuming a variable, or group of variables, will conform or adhere to a typical or usual pattern or distribution (usually a bell-shaped curve).
- Normative education
- Normative education focuses on correcting misconceptions about the prevalence and acceptability of drug use and on establishing a more accurate perception of drug use norms.
- Not linear
- When data do not conform to a straight line.
- Null hypothesis
- A statement that is the opposite of what you are trying to prove.
- Observed frequencies
- The ratio of the number of times an event occurs in a series of trials of a chance experiment to the number of trials of the experiment performed.
- One-tailed test
- The test of a given statistical hypothesis in which only a value of the statistic that is, for example, sufficiently large will lead to rejection of the hypothesis tested.
- Ordinal data
- A quantitative variable whose attributes are ordered but for which the numerical differences between adjacent attributes are not necessarily interpreted as equal. FOR EXAMPLE, amount of school completed - (1)elementary school, (2)middle school, (3)high school, (4)college.
- Outcome
- The results of a program or activity (e.g., anticipated outcomes of after school prevention programs may include increased knowledge about drugs and alcohol).
- Outliers
- Instances that are aberrant or do not fit with other instances: instances that, compared to other members of a population, are at the extremes on relevant dimensions. FOR EXAMPLE, while sentences for most criminal offenders may involve between one and twenty years, extreme cases may involve sentences (multiple consecutive sentences) of 300 years or more.
- Paired t-test
- used to determine whether the mean value of a continuous outcome variable in one group differs significantly from that in another group, eg, comparing the effect on lowering blood pressure (a continuous outcome variable) of drug A versus drug B. The paired t test considers the difference between pairs of observations on either the same individual or on matched individuals.
- Participation scale
- An aggregate measure that assigns a value to a case based on a pattern obtained from a group of related measures - in this case, the the level and degree of invovlement of a study participant.
- Pearson's correlation coefficient
- A measure of association; a statistic used with interval-ratio variables.
- Quantify
- To attach numbers to an observation.
- R value
- This represents the probability of obtaining the study results by chance alone if the null hypothesis is true. The null hypothesis is rejected in favor of an alternative one if the p value is less than a predetermined level of statistical significance.
- Ratio-level data
- A level of measurement which has all the attributes of nominal, ordinal, and interval measures, and is based on a "true zero" point. As a result, the difference between two values or cases may be expressed as a ratio. FOR EXAMPLE, it may be reported that person A weighed twice as much as person B, because weight is typically measured using a ratio measure (i.e., pounds).
- Relationship statistics
- Statistics that describe a relationship between variables, such as chi-square or Pearson&'s correlation coefficient.
- Research hypothesis
- What you would like to prove.
- Scatterplots
- A scatter plot is a plot of a set of data. For example, each participant’s score is represented by one data point on the scatter plot, resulting in a pattern of dots that indicates the type and strength of relationship between two variables.
- Sign in
- Participants were required to sign in and out everyday.
- Significance level
- The probability of rejecting a set of assumptions when they are in fact true.
- Skewed distributions
- Variation of characteristics across cases. When skewed, the data are not symmetric about the mean.
- Standard deviation
- A measure of the spread, the square root of the variance; a statistic
- Statistically significant
- The degree to which a value is greater or smaller than would be expected by chance. Typically, a relationship is considered statistically significant when the probability of obtaining that result by chance is less than 5% if there were, in fact, no relationship in the population.
- Strength of a relationship
- The degree to which one variable will generate a reaction or effect on another.
- strong
- Subgroup analysis
- Examining the data from a distinct group within a larger group.
- T-critical
- Value obtained after conducting a t-test. This is the value used to determine statistical significance.
- T-test
- used to determine whether the mean value of a continuous outcome variable in one group differs significantly from that in another group, eg, comparing the effect on lowering blood pressure (a continuous outcome variable) of drug A versus drug B.
- Tail
- An area at the extreme of a random distribution, where the degree of extremity is sufficient to be notable judged against some nominal value.
- Two-tailed test
- The test of a given statistical hypothesis in which a value of the statistic that is either sufficiently small or sufficiently large will lead to rejection of the hypothesis tested.








