As researchers have learned more about the importance of quality in early care and education (ECE) in recent years, policymakers have been trying to figure out how to best raise the caliber of programs. The primary policy solution in almost all states has been to create Quality Rating and Improvement Systems (QRISs) that evaluate the quality of ECE programs based on a common set of metrics established by the state.

The logic behind these systems goes like this: ECE programs earn ratings, parents can use these ratings to tell which programs are high quality (recent research shows they may not be able to tell on their own), parents will choose to send their children to better programs, and ECE programs with low ratings will take steps to improve their quality in order to keep up enrollment as they respond to market demands. And then the ultimate goal– improving child outcomes– will be achieved. At least in theory.

As Jill Cannon and her colleagues at RAND explain in a recent policy paper, there are many assumptions in that logic model that do not always hold true. Unexpected complications arise with most policy solutions, especially in the early stages of implementation, and QRISs are no exception. Quality Rating and Improvement Systems for Early Care and Education Programs: Making the Second Generation Better examines how the “first generation” of QRISs are faring and offers recommendations for reform so that future iterations can better measure and improve ECE program quality.

According to RAND, 49 states have a QRIS either implemented or in the planning or piloting stages. While these accountability systems already existed in some states prior to 2008, the Obama administration accelerated their implementation by making them a key priority in Race to the Top – Early Learning Challenge (RTT-ELC). To be eligible for close to one billion dollars in competitive RTT-ELC funding, states needed to use or create a QRIS with multiple tier levels reflecting varying program quality. Federal and state officials made massive investments in QRISs, even though there was actually limited research around the effectiveness of these systems at the time.

A lot more research on QRIS effectiveness has recently become available, however, thanks to a requirement that RTT-ELC grantees conduct a validation study to determine whether their tiers accurately reflect meaningful differences in program quality and whether different levels of quality lead to different child outcomes. QRIS validation studies have been published in 11 states so far: California, Colorado, Delaware, Indiana, Maine, Minnesota, Missouri, Oklahoma, Pennsylvania, Virginia, and Wisconsin. (I wrote about Minnesota’s validation study when it was released last year.)

Unfortunately, it’s still somewhat unclear whether benefits of these systems outweigh the costs in their current form. RAND’s review of the validation studies found some progress in developing valid rating systems, but concludes that “evidence is still quite limited and often contradictory, preventing firm conclusions about the validity of QRIS ratings as currently designed.” Validation studies found that QRIS ratings are related to “one or more independent measures of program quality” but relationships were modest and the differences in quality between tiers were usually small. Several studies also showed a positive relationship between QRIS ratings and child outcomes in at least one domain, but again, relationships tend to be weak and limited. The RAND authors do note potential limitations from some of the validation studies because some QRISs were still being implemented or had limited programs participating.

As RAND explains, when QRISs were first being designed early child experts were using the information they had available to design the best systems. But experts knew little at the time about which quality indicators actually support child development, how to weigh different indicators, and how much it would cost programs to implement certain requirements.

There’s been more research on what is needed for a high quality program in recent years. Unfortunately, what matters most is difficult to measure. For instance, research shows that having a curriculum is essential. It’s easy to check a box on whether a program has a curriculum in place, or even a developmentally-appropriate curriculum. It’s more difficult to measure whether a program is implementing that curriculum well. Fidelity of implementation is the essential component. We also know that learning for young children depends largely on the quality of interactions and relationships they have with adults. Measuring the quality of adult-child interactions is costly and time consuming.

Another aspect of the QRIS that often gets overlooked is the “I” or improvement component. There is limited research on what types of supports are most effective for helping programs improve quality. Many QRISs offer multiple types of supports and interventions, including research-supported options like coaching, as well as options with no research base, such as peer-supported activities. Individualized coaching is one of the few types of professional learning that is shown to work, but again, it is expensive and time intensive.

While there have been challenges with designing QRISs to best measure quality and support improvement, there have also been unexpected complications that defy the original logic of the intervention. For instance, many Americans live in child care deserts where there are extremely limited child care options, regardless of quality. Implementing a QRIS doesn’t mean more child care centers appear. Some providers also don’t feel incentivized to join the QRIS because they already have full enrollment, even if they are low quality. Parents also sometimes choose to stay in a program that may be considered low quality because it appeals to them for other reasons, such as proximity to home or work. Other programs may want to improve but lack the resources to do so. Many states do offer higher reimbursement rates for participating providers that accept child care subsidies, but sometimes the higher rate is not enough to justify the cost of making the quality improvements.

Across the country, state officials have invested a great deal of time and money in this policy intervention with the hope of improving child outcomes. More research is needed to determine how to effectively and efficiently measure program quality and how to support programs as they try to improve. States must be willing to use the results of validation studies and other research to refine these rating systems if they want to see meaningful results.

[Cross-posted at Ed Central]