A large number of people have developed models for predicting the outcomes of college basketball games. For those that have made their picks publicly available, ThePredictionTracker does a great service by tracking the live performance of each model over the course of the season. Unfortunately, it's difficult to do a direct comparison of models using the summary page on the Tracker. For one thing, each model has predicted a different subset of games (in many cases this is accidental -- schedules get modified and web scrapers don't pick up the changes -- but some models don't start making picks altogether until weeks or months into the season). Further, there are a few misprinted lines in the Tracker data. For example, the Tracker shows an opening line of -22 and a closing line of +64.5 for UCLA vs. Presbyterian on 11/19/2019 (the line closed at -23 or -23.5 depending on the book).

Throughout the 2018-19 season, I'll try to update this page with some further analysis of the Tracker data. For what follows, I chose a subset of models which have made picks since the very beginning of the season, and I threw out games for which any of those models did not make a pick. In a (somewhat lazy) attempt to address misprinted lines, I filtered out any games for which the opening and closing lines differed by more than 5 points (it's very rare that this really happens).

Results are shown as of 2019-03-13 for a set of 4876 games:
Mean Squared Error (MSE) Model
123.717 Line
125.066 Opening Line
125.447 Erik Forseth
126.209 TeamRankings
127.318 Dokter Entropy
129.065 Sagarin Rating
129.450 Sagarin Predictor
129.745 Sagarin Golden Mean
131.098 Kenneth Massey
131.895 ESPN BPI
133.166 DRatings.com
134.225 Sonny Moore
138.077 StatFox
138.136 ComPughter Ratings
145.951 Sagarin Recent

Binary Straight Up (%) Model
74.17 Opening Line
74.15 Erik Forseth
73.97 Line
73.67 Sagarin Predictor
73.60 TeamRankings
73.56 Sagarin Rating
73.38 Sagarin Golden Mean
73.35 ESPN BPI
73.28 Dokter Entropy
73.09 Kenneth Massey
72.94 DRatings.com
72.62 Sonny Moore
72.55 StatFox
72.55 ComPughter Ratings
71.40 Sagarin Recent

Clearly the line is the best statistical predictor of the outcome. Nevertheless, we can ask how each model would have done against the spread, shown below:
Against the Spread (%) Model
51.39 ESPN BPI
50.69 Sagarin Golden Mean
50.18 Sagarin Rating
50.11 TeamRankings
50.05 Sonny Moore
49.99 Erik Forseth
49.87 Dokter Entropy
49.85 Sagarin Predictor
49.57 StatFox
49.51 DRatings.com
49.30 Kenneth Massey
49.24 Sagarin Recent
49.16 Opening Line
48.24 ComPughter Ratings

Although no individual model predicts the point spread as well as the line, we might ask whether any linear combination of models can do so. Let's regress the observed margins of victory onto the predictions of each model, but constrain the regression to have nonnegative coefficients. This would give the optimal (backward-looking) mixture of predictors. We find:
Coefficient Model
0.489 Erik Forseth
0.267 Dokter Entropy
0.168 ESPN BPI
0.045 Sagarin Golden Mean
0.038 Sonny Moore
0.009 StatFox
0.000 Kenneth Massey
0.000 Sagarin Rating
0.000 TeamRankings
0.000 Sagarin Recent
0.000 Sagarin Predictor
0.000 DRatings.com
0.000 ComPughter Ratings

The MSE of this hypothetical predictor would be 124.59. Note that this is optimistic, since in addition to being backward-looking, we both fit the model and then computed the MSE using the full dataset.

Out of curiosity, what if we included the line itself in the above regression? Can any of our models add value when combined with the line?
Coefficient Model
0.731 Line
0.146 Erik Forseth
0.100 ESPN BPI
0.054 Dokter Entropy
0.000 Opening Line
0.000 Kenneth Massey
0.000 Sagarin Rating
0.000 Sonny Moore
0.000 TeamRankings
0.000 StatFox
0.000 Sagarin Recent
0.000 Sagarin Predictor
0.000 Sagarin Golden Mean
0.000 DRatings.com
0.000 ComPughter Ratings

It seems that a few of the models are able to add (marginal) value to the line. The hypothetical MSE of this mixture would be 123.366.