I wanted to take the draft more seriously this year.
Some of the reasoning behind that is obvious: The Utah Jazz have more at stake. While they didn’t win the lottery, the tank season of 2024-25 meant that they ended up with the franchise’s first top-5 pick since 2014. Making the right decision here is paramount.
The second reason is that I was a little bit embarrassed by how my preferred selection last year turned out. Like the Jazz, I also liked Cody Williams in last year’s NBA draft at the No. 10 pick. And unfortunately, Williams put up one of the worst rookie statistical seasons in recent memory, looking like he didn’t belong on an NBA court for much of the year. (That’s not to say Williams can’t recover and have a fine career, but the early signs haven’t been very positive.)
I was wooed by Williams’ smooth athleticism. The way his shot looked, and the way he deferred to his more experienced teammates. The things he said about growing as a player. And, of course, I was wooed by the fact that his brother is an All-Star in Oklahoma City.
Those are the same mistakes the scouts at the beginning of “Moneyball” made. They cared too much about the intangibles and not enough about the production. We’re not selling jeans here, even if Williams did make the LeagueFits All-Rookie team for his ability to wear them well.
(Trent Nelson | The Salt Lake Tribune) Utah Jazz forward Cody Williams (5) as the Utah Jazz host the Miami Heat, NBA basketball in Salt Lake City on Thursday, Jan. 9, 2025.
So I decided it was time to return to my roots: numbers. I have a math degree, after all. It’s time to use it to model the age-old question:
Which college basketball players will be most successful in the NBA?
At least, that’s an age-old question to me. In fact, I did my college thesis on the subject.
My previous attempts
The next two sections are going to be filled with the methodology of how this model was created. If you’re not particularly data-inclined, there’s a decent chance you’ll be bored out of your mind here, and I officially give you permission to skip directly to the results section.
Anyway: Richard Wellman, a math professor at Westminster College when I attended from 2008-12, was an early pioneer in machine learning. I had him for a class or two, and thanks to his good teaching, became pretty engaged in the topic as well. Eventually, I merged his passion with mine to decide to study how to apply advanced data prediction methods to the age-old basketball question. Even better, Westminster gave me a financial grant to spend a summer full-time working on it — cleaning data, coding, analyzing results, then iterating on the process.
In the end, I and another one of Richard’s students — Kali Wickens — decided to attack the project using two different leading machine learning techniques of the time. I would try it out with support vector machines, Kali used decision trees. “Beyond Regression: Using learning machines to predict NBA performance” was the title of the resulting paper and presentation, which we delivered in Boston at the Joint Mathematics Meetings in 2012.
I would say our results were good, considering, but certainly not great. Our models hinted at the value of the process, but I certainly don’t think they would have beaten traditional scouting at the time.
Since 2012, as you’ve probably noticed, things have gotten a lot more advanced. Scientists have made great strides in computing’s ability to model and solve problems; machine learning, deep learning, and yes, artificial intelligence have seemingly taken over the world. I’ve always wanted to see if the new and improved techniques would do a better job than we did then.
The first and obvious thing to do, then, is just ask ChatGPT to do it all for me. It’s an artificial intelligence that solves problems at an incredibly impressive level. Can you ask it to simply create a model for you to decide which college basketball players will be most successful in the NBA?
You can, of course ... but it does a bad job. If you allow it to use the internet, it will just regurgitate old mock drafts and big boards.
If you run ChatGPT offline, and feed it all of the data you have on players, it still does a bad job. Interestingly, it turns out that the specific kind of artificial intelligence that ChatGPT uses simply isn’t great at working with the kind of data we have on basketball players. (More on this and why it might be here.)
How I built my NBA draft model
(Nam Y. Huh | AP) Ace Bailey talks to media at the 2025 NBA basketball draft combine in Chicago, Wednesday, May 14, 2025.
So what does work? Essentially, an evolved version of the decision tree method Kali explored back in 2012. XGBoost is an open-source library that has won a number of machine learning prediction contests over the years, with its most recent version coming out in March. Because of the way it works, though, in many ways it’s a black box: You give XGBoost your numbers, it does fancy math on them, and then it creates excellent models that are difficult to understand but just work.
So into XGBoost I fed college basketball stats from barttorvik.com, perhaps the most useful college basketball stat website out there. The data there only goes back to 2008, but provides a database of both basic and advanced stats for every college basketball player since. Figuring that, nearly always, a player’s final season in college basketball (typically either their senior season or the one before they entered pro basketball) was the one NBA teams would be evaluating most heavily, I simplified the data to only include each player’s most recent season to make our data two dimensional.
One thing barttorvik doesn’t have in its player data: strength of schedule. It’s obviously easier to score 20 points per game at Westminster than it is at Duke. So I pulled KenPom.com‘s strength of schedule for every college basketball team since 2008, and fed it into the model for each college basketball player.
Finally, we needed a measure of NBA success. Obviously a controversial topic, how to define who is better in the NBA has been the topic of barbershop conversations since George Mikan. As longtime readers will know, I’m relatively partial to DunksAndThrees’ EPM metric, as it tries to include contributions on both the offensive and defensive side of the ball with both box-score statistics and plus-minus impact.
Because many of the players in our sample have not yet finished their careers, and the unpredictable impact of injuries on overall career success, I decided to use simply one-year peak EPM as the success metric. By that measure, Steph Curry, James Harden, Joel Embiid, Nikola Jokic, and LeBron James had the highest peaks of the post-2008 era.
Initial results were center-heavy — a frequent scourge of draft models because very tall people are much more likely to make it into the NBA than short people. I tried a number of methods to address this, including creating separate models for each position, but ultimately found the best mix by simply adjusting the results after the model ran to take big-man bias into account.
When bad black box models train on a set, it can become overfitted — it can develop affinities for, say, 6-foot-2 guards who scored big points at Weber State like Damian Lillard did, then it can use those harebrained ideas on 2025 data and come up with crazy conclusions. So you have to tune its “hyperparameters” (think of them as knobs on the outside of the black box), until you get good results in cross-validation. A separate automated script did that.
While ChatGPT was awful and either bugged out or returned nonsense if you asked it to do any of these various tasks together, it turned out to be pretty proficient at helping with any one step at a time. Walk and chew bubblegum though it cannot, ChatGPT can code short routines, perform simple data scraping, and debug quite well — it certainly helped me with my lagging coding skills.
Note that this model has no inputs or outputs for non-NCAA players. So it’s no use for international guys, or domestic players who take unusual routes to the NBA, like Overtime Elite or G League Ignite. Maybe next year.
(Nam Y. Huh | AP) Jeremiah Fears talks to media at the 2025 NBA basketball draft combine in Chicago, Wednesday, May 14, 2025.
The results of my NBA draft analysis
OK! With all that out of the way, here’s how the model ranks the set of 2025 college basketball players — the model’s big board, if you will.
Cooper Flagg’s historic freshman season earns him the No. 1 spot on this list. That’s a good sign for the model! So are its high rankings of other players generally predicted to be in the draft’s top 10. Getting such high agreement between the standard NBA mock draft and the stats-only model above certainly wasn’t happening in my 2012-era modeling.
The most surprising placement in the top 10 is Asa Newell at No. 2. The athletic and mobile Georgia big man was impressive as a freshman last year, putting up high scoring totals thanks to very solid finishing around the rim, and he’s got good mobility. Basically, the question for him is whether the perimeter game will come around to match; if it doesn’t, he may not be tall enough to play center full time. The model believes his production is incredibly promising, though.
If you wanted to mix and match the model’s outcomes with scouts’ big boards, as ESPN analyst Kevin Pelton does, you might prioritize V.J. Edgecombe, Tre Johnson, then Khaman Maluach if you’re the Jazz picking at No. 5.
After the top 10, though, the model differs from the consensus quite a bit. You’ll notice many players who chose to accept NIL money and go back to college — I decided to keep those players in above because it tells an interesting story of the changing strength of the NBA draft. Later on, some players who aren’t really considered future NBA guys at all are included.
You’ll also notice some projected early draft picks like Jeremiah Fears and BYU’s Egor Demin fall significantly compared to where they’re projected now by other draft analysts.
I’m not that worried about that, though, because that is the nature of the NBA draft in general: After the lottery it simply becomes a crapshoot, with players at least as likely to fail as to succeed. The model is doing its best.
Heck, given the nature of the NBA draft in general and the relative limitations of the inputs — the model doesn’t know players’ measurements beyond height, combine results, interview scores, teammates’ abilities, performance beyond the most recent season, and so on — I think it’s a remarkably solid list.
Time will tell if this statistical modeling method is more accurate than the tape-and-vibes based approach I was using last year to evaluate Cody Williams.
But overall, I do feel better about the rigor used here. Now, players’ on-court production matters most, just as it should be.
Note to readers • This story is available to Salt Lake Tribune subscribers only. Thank you for supporting local journalism.