Forty-two states and the District of Columbia are now using the same math and English standards, but the tests they use to determine how well students have mastered them still vary significantly.

One of the goals of the Common Core State Standards was to be able to compare student performance from state to state on a yearly basis. Five years ago, it looked like that would happen. Nearly all Common Core adopters were in at least one of two national consortia that would be creating new exams to accompany the standards, the Smarter Balanced Assessment Consortium and Partnership for College and Career Readiness, known as PARCC.

Those numbers have dwindled. Just 20 states and the District of Columbia plan to give one of the two tests this spring. Others are back where they started: Using tests unique to their state. So even though, in theory, students in Connecticut, Wisconsin and Arizona are all learning the same thing, they’ll be measured differently.

The Common Core writers were very interested in improving how fractions are taught in U.S. classrooms, so we looked at six of these tests to compare how they deal with word problems involving fractions in the fifth grade: those from New York, Wyoming, Florida, as well as PARCC, Smarter Balanced and ACT Aspire, an exam made by the group that produces the college readiness exam. (So far, Aspire is only given in Alabama.)

As always, when we’re talking about testing, there are caveats. Even though all these questions deal with fractions, they may be testing different standards. They’re also a mix of actually tested items and sample questions. The sample questions never appeared on the tests but were published ahead of the exams to give teachers and students an idea of what to expect, and the actual test items were released after appearing on an exam last spring. In both cases, just because a type of question doesn’t show up here, doesn’t mean it wasn’t on the test – or won’t be on future tests.

In other words, without an army of undercover fifth-grade reporters spying for us, it’s impossible to do a comprehensive comparison of the exams.

Nevertheless, while we can’t draw conclusions about which test was best from this sample of questions, we can see some important differences in each one’s approach and how they differ – or not – from the old way of doing things.

Let’s start with the obvious: whether a question is multiple choice. Smarter Balanced and PARCC are computer based assessments, meaning it’s easy to go beyond multiple choice and require students to type in open-response answers. The contrast is seen most starkly when comparing these questions from ACT Aspire, a paper-based test, and PARCC.

It’s virtually the same question, but it’s much easier to guess the right answer on ACT Aspire than PARCC. That’s compounded by the fact that the correct answer on the ACT question, 36, is a clear outlier.

“You don’t want the correct answer to stand out,” said Andrew Latham, director of Assessment & Standards Development Services at WestEd, which developed test questions for both PARCC and Smarter Balanced. If students understand enough to know the answer must be greater than 9, they don’t have to do any math to get the right answer, so the question doesn’t necessarily test whether they’re able to divide by a fraction.

The test makers also made different choices about how many steps each problem would take to solve. Look at these two similar questions from Smarter Balanced and Wyoming’s state exam.

The Smarter Balanced question only requires one step to get the right answer, dividing 2 by 1/5. To get Wyoming’s test question right, students first need to be able to use the number line to figure out the shortest and longest distances before they can do the rest of the math. “If you can’t read a number line, it doesn’t matter if you can subtract fractions or not, you’ll get it wrong,” Latham said.

And that’s not a good thing, he added. “I prefer when you focus more on a given standard. If they got it wrong, I don’t know why they got it wrong.”

Phil Daro, one of the lead writers of the Common Core math standards, pointed out that students are asked to do multistep problems in the classroom, however. “You have to have them on the test,” he said.

Of course, paper-and-pencil tests don’t only rely on multiple-choice questions. They’ll also include open response questions, like the New York examples, where students are given a standard word problem and asked to show their work before writing down an answer.

These questions take more time and money to grade, and are more prone to human error while grading, but there are pluses. Like the write-in answer on the above PARCC example, short answer math questions eliminate students’ ability to guess with the added benefit of allowing students to get partial credit, if they set up the problem correctly but make an error adding two fractions, for instance.

PARCC sometimes mimics that process with a question like this, which requires students to first write out the expression they’d use to find the answer.

Regardless of whether a question is multiple choice or open response, clarity matters a lot. That’s a particularly important consideration on math tests, where you run the risk of a student getting a question wrong because of weak reading skills rather than weak math skills.

Daro pointed to this ACT Aspire question as an example of “inconsiderate” writing:

He suggested the phrasing could have been made better for a fifth grader by using consistent language and saying “Mario divided his circle into two equal sections.” And the instructions were particularly confusing. “Selecting a word that names the fraction?” Daro said. “That’s like a grammarian talking, not a fifth grader.”

The range of ambiguity in question phrasing is highlighted by these two PARCC items.

The first question’s wording is straightforward, the experts said. The second is more convoluted. In part it’s because the questions are attempting to assess different things. The first just checks if students can subtract fractions, while the second tries to measure students reasoning. (If you’re curious but all the fractions are making you cross-eyed, the correct answer is B and E.)

Daro criticized the test makers for using multiple choice at all to attempt to test how students think. “Multiple choice is the wrong genre for that,” he said. “Either have the kid produce the argument or show them a single argument and have them critique that.”

He also cautioned against blaming Common Core for poorly written test questions – particularly when many, if not all of these items could have been on old exams. Many people think that “whenever you see something that looks odd, it’s because of the Common Core, but that’s just not true,” he said. “Standards can differ in ways that don’t manifest in different items on a test.”