The state has just begun to look at the student scores from this spring’s standardized tests — but already the results are raising concern.

At least 3,546 exams have come back with missing scores, according to a review released Wednesday. It follows a year-end testing window marked by major computer glitches and massive delays. And the review is only the first analysis of the numbers.

It’s possible more tests will have low or inaccurate scores because students were locked out of the online exams in the middle of putting in answers. Other kids might not even be accounted for after many reported not being able to log in. In some schools, it’s likely the results are wrong because the grade on the screen when students finished didn’t match the reports that teachers later downloaded.

As such, the review suggested: “[We] may not fully understand these impacts for months to come.”

Still, the new report from the Office of the Legislative Auditor General begins to quantify for the first time just how costly the disruptions were and confirms the worst fears of many teachers and administrators. Even based on these earliest results, the review concludes, the validity of the data is now strongly questionable.

That could have extensive and unforeseeable impacts for annual school grades, teacher performance evaluations and federal grant money, which are all based on the test scores.

“We’re trying to understand just how far-reaching those were,” said Matt Harvey, who supervised the review, during a state legislative hearing Wednesday.

But the issues weren’t entirely unexpected. The review hints, in fact, that had Utah not decided to switch testing companies, it’s possible the glitches could have been avoided altogether.

In February 2018, the state signed a contract with a new company — Questar Assessment Inc. — to conduct its year-end tests. Despite concerns reported in other states, Utah Board of Education members approved a 10-year, $44 million deal as part of an effort to rebrand the annual exam, give it a new name, RISE, and encourage more students to take it.

On Nov. 27, Questar had issues developing the student rosters. On Jan. 9 of this year, it reported problems with putting IDs on files. On March 19, it failed to go live with spring testing as scheduled.

By the end of April, Utah had experienced five major testing interruptions. The outages here delayed more than 18,000 public school students in completing their assessments in April and May. For one day, no one was able to take a science exam. On at least four others, testing was stopped entirely for some school districts. The window to complete the assessments had to be expanded into June.

Because the computers froze when students were in the middle of tests, no other student could log on to the same device. And Questar couldn’t fix the problem without rebooting the systems — something that took 24 hours. Each glitch, then, pushed back testing another day. Sometimes, it wiped out everything kids had submitted and left no record that they had started an exam.

The review, the first to come in a series that will examine the testing problems, noted: “Failures by Questar’s testing system caused disruptions that negatively impacted schools across the state and raised questions about the accuracy and reliability of the assessment data.”

Roughly a million tests are completed in Utah each year (with many students taking more than one, based on subjects). Though the more than 3,000, so far, that have come back without a score represent only a small fraction of the total results, they suggest deeper issues with the data and its dependability.

The reviewers surveyed school district superintendents and charter school administrators statewide about the effect. Of the 90 they talked to, 82 reported their schools were disrupted and expected negative results. And of those, 44 said they thought the impact would be “extremely significant” given their experiences with the interruptions.

Several said it took extra time away from teaching. Many noted that it affected student morale and concentration. One reported they had to cancel classes for the day because the computers were not working. Most suggested they had no confidence in the numbers.

In response to the review, Patty Norman, the deputy superintendent of student achievement for the state, wrote: "The Utah State Board of Education concurs that the impact of the standards assessment is significant to stakeholders of public education, particularly our students.”

Debbie Davis, the chief audit executive for the school board, added during the legislative hearing Wednesday that she agrees with the findings. And, she noted, “We want to get it right.”

The board received the scores on Aug. 2 — when Questar delivered them weeks after the July 15 deadline — and began analyzing them. Members have also hired a third party to review the accuracy of the data, which should be completed by next month. Further results and recommendations are expected at an Oct. 3 board meeting.

Already, members have promised that if the numbers are not reliable, they will not be used for school grades or have an impact on those districts that are low-performing and in turnaround status. The review, though, is critical of the switch to Questar altogether — regardless of the proposed solutions.

The state switched vendors last year after previously contracting with American Institutes for Research (AIR) to conduct what was then called the SAGE test. AIR had zero issues in administering the tests, but SAGE had failed to gain traction in Utah since it was implemented in 2013.

More and more parents each year opted their students out of the test. That is allowed under state law, though the school board has said it undermines the accuracy of using the exams for accountability rankings.

Most board members had hoped the newly named RISE tests conducted by Questar would encourage more parents to have their kids take the assessment. The review suggests the state should have better weighed and understood the costs of switching companies — when there was likely no pressing need to do so.

“We met with school administrators who questioned the need for the time and expense of changing from AIR to Questar if the delivery system was reportedly going to be virtually identical to SAGE,” the report noted. “As we met with various stakeholders, some questioned why [the Utah Board of Education] needed to switch from an established platform into an untested one.”

Sen. Karen Mayne, D-West Valley City, a former paraeducator who worked in Granite School District, echoed that point, asking why the state school board felt it needed to switch vendors.

Harvey, the review supervisor, told her: “The contract that they had expired.” It could have been renewed, though, he noted, adding that he was unsure if the state was dissatisfied with the performance of AIR.

“It sounds like they need to think this through long term," Mayne responded. “It really is so disruptive in a school.”

In response to criticisms over the issues — and questions about why the state Board of Education signed a deal with a company that has a history of similar malfunctions — members voted in June to cancel its contract with Questar.

The board had said it would terminate the agreement “rather than risk continued interruptions." Meanwhile, it can still seek damages from Questar. According to the contract, the state can charge the company up to $50,000 each day there was a major disruption. That would amount to about $250,000.

Since then, the board also announced it would return to using AIR to conduct the exams. The contract will be for three years and $21 million.

The review, though, said the unnecessary switch not only cost money in the contracts and concerns about test validity, but also used a significant amount of employee time.

The state spent hundreds of hours in implementing the change, holding meetings to understand the new software and conduct trainings on how to use it. In switching from AIR to Questar, the state’s assistant superintendent of student learning said staff spent 480 hours in meetings. The assessment and development coordinator told reviewers that employees also spent between 720 and 800 hours testing the platform for weaknesses and issues and working with Questar to resolve those.

That also doesn’t count the time that the state spent training school administrators, who then taught teachers how to use the software. Those educators then turned around and spent time instructing students in it.

AIR had bid with Questar for the new contract and was declined. In its proposal, AIR had written: “Utah has tested 100% online with virtually no disruptions for multiple years.” It was late once in giving the state its scores and paid $200,000 for that mistake. But it also helped Utah lease its questions to others, which earned the state $15 million.

Meanwhile, in the disclosure section of the Questar contract, where the bidding vendor was required to detail whether they have been accused of poor performance, there were three separate incidents listed. The oldest dated to 2014. And the issues involved Educational Testing Services (ETS) software — the same one used in Utah.

In Texas, Questar said, its parent company, ETS, paid $5 million for “the disruption of testing and reporting schedules.” In California, ETS lost more than $3 million for not providing the correct testing materials and not delivering scores in the set timeframe. With AP and SAT exams ETS offered through the College Board, it had “minor to moderate failures” in administering the tests in the schools where it had contracts.

Additionally, in some states, the tests were marked by cyberattacks. One school district threw out its results because the software was so unreliable. In another, all of the students had to start over when the programming shut down and didn’t save their responses.

Questar, which is based in Minnesota, previously told The Salt Lake Tribune after its contract was cancelled: "While we regret this decision, Questar Assessment Inc. is going to do everything possible to ensure a smooth transition.”

Annual testing is required by federal law in grades three through eight (as well as at least once in high school). The exams focus on language arts, writing, science and math and are used to assess how well students are improving year to year. Results can tell teachers, most importantly, who is falling below grade level.