- Disclosure
- List of Tables & Figures
- Chapter 1. Introduction
- Chapter 2. The Late Pretest Problem
- Chapter 3. Measuring the Variance-Bias Tradeoff
- Chapter 4. Considered Designs
- Chapter 5. Theoretical Framework
- Chapter 6. The Variance-Bias Tradeoff for Various ATE Estimators
- Chapter 7: Empirical Analysis
- Chapter 8: Summary and Conclusions
- References
- Appendix A: Proof of Asymptotic Results for the ANCOVA Estimator (174 KB)
- PDF & Related Info

Pretest-posttest experimental designs are often used to examine the impacts of educational interventions on student achievement test scores. For these designs, a test is administered to students in the fall of the school year (the pretest) and at a spring follow-up (the posttest). Average treatment effects are then estimated by either examining treatment-control differences on pretest-posttest gain scores or by including pretests as covariates in posttest regression models.

In clustered randomized control trials (RCTs) in the education field, the availability of pretests on individual students is critical for obtaining, at reasonable cost, precise posttest impact estimates (Schochet 2008; Bloom et al. 2005). In these RCTs, groups (such as schools or classrooms) rather than students are typically randomly assigned to the treatment or control conditions. This clustering considerably reduces statistical power due to the dependency of student outcomes within groups. The inclusion of pretests in the analysis, however, can substantially increase precision levels, because group-level pretest-posttest correlations tend to be large. Schochet (2008), for example, demonstrates that for a design in which schools are the unit of random assignment, about 44 total schools are required to detect an impact of 0.25 standard deviations if pretests are used in the analysis, compared to about 86 schools if pretest data are not available. This occurs because pretests tend to explain a large proportion of the variance in posttest scores.

For logistic reasons, however, pretests on individual students are typically collected *after* the start of the
school year. In these cases, including late pretests in the analysis could bias the posttest impact estimates
in the presence of early treatment effects. Because of variance gains, however, these biased estimators
could yield impact estimates that tend to be *closer* to the truth than unbiased estimators that exclude the
late pretests. Thus, the issue of whether to collect and use late pretest data in RCTs involves a variancebias
tradeoff.

This paper is the first to systematically examine, both theoretically and empirically, the late pretest problem in education RCTs for several commonly-used impact estimators. The paper addresses three main research questions:

These conditions are important for assessing whether or not to collect expensive pretest data.*Under what conditions does the variance-bias tradeoff favor the inclusion rather than exclusion of late pretests in the posttest impact models?*Large-scale RCTs in the education field are typically powered to detect minimum detectable posttest impacts of about 0.15 to 0.30 standard deviations, ignoring the potential late pretest problem. If pretest data are to be collected, how much larger do school sample sizes need to be in the presence of late pretests to achieve posttest impact estimates with the same level of statistical precision?*What are statistical power losses when late pretests are included in the estimation models?*For example, historic aggregate schoollevel data could be collected on test scores that are related to the posttest. The correlations between these alternative test scores and the posttests are likely to be smaller than the pretestposttest correlations, and thus, the alternative test scores will reduce variance less. However, these data are likely to be uncontaminated, and thus, will not bias the posttest impact estimates.*Instead of collecting pretest data, under what conditions is it preferable to collect "true" baseline test score data from alternative sources?*

The theory presented in this paper is based on a unified regression approach for group-based RCTs that is anchored in the causal inference and hierarchical linear modeling (HLM) literature. The empirical analysis quantifies the late pretest problem in education RCTs using simulations that are based on key parameter values found in the literature that pertain to achievement test scores of elementary school and preschool students in low-performing school districts. The focus on test scores is consistent with accountability provisions of the No Child Left Behind Act of 2001, and the ensuing federal emphasis on testing interventions to improve reading and mathematics scores of young students.

The rest of this paper is in seven chapters. Chapter 1 discusses the late pretest problem in more detail, and Chapter 2 discusses two measures for quantifying the variance-bias tradeoff when late pretests are included in the impact models. Chapter 3 discusses the considered school-based designs, and Chapter 4 presents the causal inference statistical theory underlying the late pretest problem. Chapter 5 applies this theory to several commonly-used impact estimators, and Chapter 6 presents simulation results. Finally, Chapter 7 presents a summary and conclusions.