Comparison of NAEP Assessments With Current-Generation State Assessments

As the “Nation’s Report Card,” the National Assessment of Educational Progress (NAEP) conducts assessments and publishes reports on student achievement in a number of subject areas—including mathematics, reading, writing, and science, among others—that are widely used by federal, state, and local policymakers, educators, researchers, and the public. To sustain the program’s deserved reputation as the gold standard for information on the achievements of the nation’s students, these reports, and the assessments on which they are based, must be reliable, valid, and useful. 

Since the mid 1990’s, the NAEP Validity Studies (NVS) Panel has identified and pursued critical research to support these goals. One focus of this research has been the content of the NAEP assessments and the extent to which this content appropriately balances the continuity required for trend reporting and the evolution necessary to reflect more recent learning goals established by states. If there is too much divergence from state goals, the utility of NAEP reports is diminished and their validity is brought into question—NAEP could be underestimating student learning if students are learning material that NAEP is not assessing or assessing material that students are not being taught.
 

Comparing NAEP Data with State Assessments

It is in this context that NVS undertook the current studies, which use the judgments of expert panelists to compare 2017 NAEP assessment items in mathematics (study 1) and reading and writing (study 2) with 2017 items from a sample of state assessments that are based on College and Career Readiness standards. Given the shifts in the educational landscape that have taken place since 2009, the focus of the comparisons was broadened beyond just those assessments developed by the Common Core State Standards (CCSS) test consortia.

While the labor-intensive judgmental methodology of the current studies precluded the inclusion of a large number of state assessments, the study teams did include, in addition to the two consortia assessments, one assessment from a state that was using a non-consortia test to measure CCSS, and one from a state that had not adopted the CCSS but was measuring its own state-specific CCR standards. To complement these reports, a statistical analysis (study 3) was conducted to illustrate the relationship (if any) between the item attributes, such as content, focus, and complexity as catalogued by expert judgements, with empirical item difficulty, and to compare the distribution of item difficulty on NAEP to that on state assessments.

Reports