Week 14 TNF CPR

[Editor’s note: I’m trying to add as many acronyms into the headline as possible.]

A quick data analysis of the Raiders @ Chiefs Week 14 game:
The crowd swung over to the Raiders close to kickoff. Looking at the data, we had just five total predictions, and two came in on Thursday. One predicted a Raiders win, one predicted a Chiefs win. The difference is that the Raiders win prediction expected a 13-point victory while the Chiefs win prediction expected just a 6-point victory.

Overall, of the 5 people who predicted the game, 2 predicted the Raiders to win while 3 predicted the Chiefs, and the 13-point spread was twice as big as any other prediction.

It goes to show that volume of predictions is critical to the success of the hypothesis because if we had 1000 predictions, one outlier isn’t going to throw the average off too much.


Week 13 CPR

Hi all, Week 13 results are in and, while the crowd was just better than 50% this week, there were some pretty interesting results to be had:

  • Total Predictions: 5-8 (some people didn’t complete all of the predictions, which would be a good feature addition)
  • ATS: 8-7
  • Straight-up: 11-4
  • Against the Total: 8-7

Next-Level Crowd Wisdom

The crowd predicted that the Chiefs would defeat the Falcons by 1 point and that the Patriots would defeat the Rams by 16 points. Every time one of these happens, it strikes me as pretty astonishing.

There were no perfect predictions this week, so we’re hoping that if we can get some better participation, the likelihood of more next-level crowd wisdom will go up.

The Grain of Salt

The main issue that we are facing at this point is simply the volume of predictions, so if the CSS Crowd out there can grow some more, we may have a better idea about whether the idea has merit.

Having said that, it was encouraging to see some overs predicted this week. In the first week of predictions (Week 11), 11 of 14 games went under and the crowd predicted all 14 as under, and I was concerned that the crowd would always be conservative. However, in Week 13, there were quite a few high predictions, though only one correct over prediction.


Week 11 Crowd Performance Review (CPR)

Hi all, the first results are in, and you all have some prognosticating game!
High-level summary:
  • ATS: 8-6
  • Straight up: 10-4
  • Total: 11-3

Next Level Crowd Wisdom:

The crowd predicted within a point the Baltimore-Dallas score of 27-17, the Bears-Giants score of 22-16, and the Seahawks score of 26-15, meaning the crowd nailed both the spread and the game total.
The crowd predicted within a point the spreads for Steelers-Browns (15) and Dolphins-Rams (5).
If the WOTC theory doesn’t ultimately hold water, this is a pretty astonishing outcome, I’d say.

Key Performers (names anonymized):

Game ATS Total
NO-CAR 3: CA, MR -2: KE
BAL-DAL 0 (on the $): GA 1: MG, KE, PC
JAC-DET 0 (on the $): PC 2: PC, GA
BUF-CIN 1: CA, PC 12: JL
TB-KC 1: JL, MR 2: CC
CHI-NYG 0 (on the $): CA 1: JL
MIA-LA 0 (on the $): JL 1: JC, CC
NE-SF 1: CC 1: GA
PHI-SEA 1: AJ, SW 0 (on the $): SB, EP
GB-WAS 7: SB 14: PC
Week 11 Lowest Combined Delta PC (81) JL (122)

Grain of Salt

While I would love it if the crowd kept up this performance, I’d say that there are a couple of factors worth noting:

  • The crowd predicted that the total would be under the Vegas total in 14 of 14 games, and the actual totals were under the Vegas total in 11 of 14 games. It seems unlikely that that trend would be followed, as in Week 10 only 6 games came in under the Vegas total.
  • Having said that, the entire premise of WOTC is that the crowd weighs various factors, and it may have been possible that the crowd leaned towards lower scores this particular weekend. Time will ultimately tell the tale.

Nice job Crowd! Please keep up your participation, and please share if you can.

Thanks in advance!

-PCSM Chris


The First Week Results

Thanks to some dedicated volunteers, we have seen our first set of results come through. While it would be nice to draw some conclusions, there were only 4 submissions, so we’re still likely in the realm of random chance.

Still, a couple of things:

  1. There was some very useful feedback that Paul and I are very grateful to have received.
  2. Overall, the crowd followed the favorites, generally selecting them to win and usually cover the spread.
    1. This past weekend, this didn’t work out so well.
    2. Correct picks against the spread:
      1. Tampa Bay
      2. Kansas City
      3. Dallas
      4. Seattle
      5. Cincinnati

Out of 13 games, getting 5 out of 13 is not far from a coin flip, so everyone, don’t feel too bad :). I’ll do a bit more digging to see if there were any outlier picks that may have skewed it one way or the other.

We’re hoping to get some more activity over the course of the season, so please ask your friends to help out if they can!


Crowdsourcing Spread vs. Margin of Victory

This is a follow up to our post on how crowdsourcing scores can work. In that post, we talked about how who-picked-whom is a crude metric of crowdsourcing predictions, but it’s main gap is that it doesn’t provide margin of victory to determine a more accurate crowdsourced prediction. As a result, even if the majority thinks that the home team will cover the spread, it does not indicate what the crowd believes the margin of victory will be and whether there is value in the spread set by the sportsbook.

Paul and I were talking this week about how to present spread prediction alongside margin of victory. Here’s the basic idea:

Team CSS Score CSS Margin of Victory # Win Predictions
Away 24 +4 200
Home 27 +7 100

Total Predictions: 300
CSS Game Total: 51
CSS Spread: Home -3

I’ll quickly summarize so everyone understands what this means. “CSS Score” is the average of all of the score predictions for each team (Σ(Away)/#Predictions),(Σ(Home)/#Predictions). The “CSS Game Total” is the average of all of the totals for each prediction ((Σ(Away) +Σ(Home))/#Predictions), and the CSS Spread is the average delta of all the home scores and away scores (Σ(Home) – Σ(Away)/#Predictions).

The Margin of Victory (MoV) is the data point I want to address. Where does MoV come in? This is kind of fascinating to me, actually. Margin of victory would be, for example, the delta between the home team’s score and the away team’s score only in games in which the home team’s score is greater. This would indicate a bit more than the spread by itself. As I discussed before, if more people believe that the home team will cover the spread, it doesn’t necessarily tell you how confident they are. Margin of victory provides that detail. So, let’s say that the CSS spread predictions is Home -3. That tells you where the average delta is across all predictions. But MoV provides a window into how confident users are in a win by either team. More people think that the away team will win, but the people who think that the home team will win think that they’ll win by a larger margin. In that scenario, you’d see the CSS Spread lean more towards the home team.

I am very excited to see where this data takes us. There is not enough of it to go around, and I think it’ll provide a lot of insight and advantage when making your picks.



Crowdsourced Scores as a Leading Indicator

I listened to a couple of podcasts this week and, combined with my own experience, my limited consensus says that Week 3 was a rough week for most people. The Sports Gambling Podcast hypothesized that it was because our conclusions are formed but are based on iffy data. You have 2 weeks of data, and in reality you have no idea how to read it, but because you have 2 weeks of data, you think it’s locked.

For example, consider the Arizona at Buffalo game. The Cardinals had just whipped the Bucs. The Bucs beat the Falcons, the Falcons beat the Raiders, the Raiders beat the Saints. Meanwhile, the Bills lost to the Ravens and to the Jets, neither of whom are highly regarded this season outside their core fan bases. Additionally, all of the money was on the Cardinals. All signs pointed to the continuing trend.

So of course, the Bills smacked the Cardinals around and won outright. But this wasn’t the only one; There were a lot of surprising results in Week 3.

So what happened?

Well, it’s hard to say in any kind of precise way, but what is really interesting is whether crowdsourcing the scores would have indicated something more granular afoot. For example, when people base decisions off assumptions, it gets harder to put numbers on their predictions. In other words, in the Cardinals-Bills game, you assume that Arizona would score 40 points again and the Bills would lay down, but if you put actual numbers against it, maybe it doesn’t seem so likely. You then start thinking through it a bit more, and maybe the crowdsourced results get closer to what actually happened than what all the signs pointed to happening.

At the end of the week, maybe the crowdsourced scores don’t say much different from the actuals, but it would be fascinating to see if maybe there was some sort of indicator that it was going to be a weird week.


How Can Crowdsourcing Scores Work?

One of the earliest forms of pushback that we received regarding the idea of crowdsourcing game scores was simply that there would not be enough expertise available to predict accurately.

“There are too many variables.”

“No one will every have enough information.”

“You’ll never get it right.”

To be honest, when we heard such gut-reaction pushback, it made us all the more excited to try the experiment. But beyond that, it’s important to recognize that those objections are addressed precisely by the Wisdom of the Crowds hypothesis.

Additional Reading – Wikipedia

One of the first experiments testing the WotC was at a county fair in the early 20th century estimating the weight of an oxen. The result of the 1000+ guesses was an estimate within 1% of the actual weight. I think we can agree that there weren’t 1000 ox-weighing experts attending a single county fair, so how did they get so close?

Here is where the WotC model makes so much sense. It relies on 4 key pillars:

Criteria Description
Diversity of opinion Each person should have private information even if it’s just an eccentric interpretation of the known facts.
Independence People’s opinions aren’t determined by the opinions of those around them.
Decentralization People are able to specialize and draw on local knowledge.
Aggregation Some mechanism exists for turning private judgments into a collective decision.

Each person at the fair brought their own expertise about how much things weigh. Even though it’s unlikely that many people could easily identify objects that weighed over 1000 pounds, they likely could identify objects that weighed 100 or 200 pounds, and they made their best educated guess based on that experience (“My son weighs 50 pounds and the ox looks like 20 of him, so it’s probably 1000 pounds”). When all the results were tallied, those who underestimated were balanced by those who overestimated, and the final result was almost exactly right.

What we are assuming is that the principles that allowed a group of people to predict the weight of an ox would apply to game scores as well. Specifically, by building a user base of independent, decentralized users from all over the country, we will have a group that provides little pieces of information with each prediction. CSS aggregates the result of that information in the form of final scores to generate predictions, and as a result, we expect that the final tallies will reflect a conclusion based on a wide variety of information.

Let’s look at a specific example: the Week 2 49ers-Panthers game in which the Panthers won by a score of 46-27.

In this example, let’s assume something that we expect is not realistic: our users come from 4 primary locations – Charlotte, San Francisco, New York City, and Dallas (4 metropolises, 2 associated to the specific teams, 2 independent).

Let’s also assume that the users in SF and Charlotte weigh their team’s performance more highly than the independent cities – the 49ers score more and their defense allows fewer points, for examples, in predictions submitted by users in San Francisco. The local cities reflect the biases of their local media. In the independent cities, the average prediction is more evenly distributed based on non-local media. So that is one way that the scores could balance out. You could also add predictions that believe that Blaine Gabbert is bound to have a great game and Cam Newton will feel the effects of the hits he took in Week 1. Those would be balanced by the people who didn’t believe Gabbert was very good and thought that Newton would play like a madman and go nuts. Some individuals would weigh ESPN’s sources more heavily, others would weigh’s more heavily, and everywhere in between. Some would look at DVOA from and consider that the definitive statistic for determining their prediction, others would look at a variety of data to figure out what the result will be.

The point is that there are many, many data points that feed perspectives, and the WotC is a very powerful aggregator.

When the aggregation is completed, what we expect to see is an even spread that reflects information from a wide variety of sources weighed by individuals. In the end, the final submission for each of the users is the result of their investigation and weighing of data to form a conclusions, and the wisdom of the crowd assumes that, across all of the different pieces of information from all of the users aggregates to a more accurate prediction than any single person could muster on their own.

Accounting for Expected Divergence

It seems unlikely that the consensus opinion for the 49ers-Panthers game would predict a final score of 46-27. As a general rule, a score this high is pretty rare, and we do not expect that the WotC would successfully predict it.

However, we are optimistic that the WotC would be accurate about two pieces of data that are really critical to our core value proposition:

  1. The game total of 45: it is unlikely that the median prediction would land on 73, but we expect that the median would predict whether the game total is too high or too low.
  2. The spread of 13 1/2: we expect that the WotC would indicate whether the spread was too high or too low independent of the actual scores.

Therefore, whereas when guessing the weight of an ox should be precise, we think that the true value of CSS lies in users accurately predicting whether the lines provided by the sportsbooks have value. In our world, it doesn’t matter whether the final totals beat the lines by 1 point or 20. Either case is a winning proposition for our users.

On the other hand, who knows? Maybe there were enough people out there predicting 50-point blowouts by the Panthers. We’ll see over time.



Welcome to Crowdsourced Scores!

We are very excited to have you join our little experiment!

Alright, so what is this all about?

Crowdsourced Scores is based on the theory that says that a average prediction by across a group of individuals will perform better than any individual over time, called the Wisdom of the Crowds. Our goal is to collect prediction from you all about the final scores of games and compare the predictions to the actual totals.

We’ll also be presenting the results against the spread and totals as defined by the lines at

To make it appealing to return every week, we’ll be keeping track of how accurate you all are showing who the leaders are for each game, each week, and each season.

Why should I participate?

There are a couple of reasons that we think you’ll like. First and foremost, we think that you’re kind of like us and you’ll be interested to see whether the hypothesis actually works. You like football and you like to demonstrate your understanding of all of the information at your fingertips.

Second, if you’re involved in a weekly pick’em pool and are anything like Chris is, you can use all the help you can get to try to be in the mix past the opening weekend. If the wisdom of crowds proves correct, it will provide you with an edge over your competitors.

How is this different from the Who Picked Whom (WPW) graphs on other Pickem sites?

CSS is more detailed than the WPW graphs. The WPW graphs are binary showing only where the majority of picks are, but it doesn’t show you critical details. For example, in the opening game of Carolina (-2.5) at Denver, let’s imagine that 60 of 100 users picked Carolina, this would imply that the WotC indicates that Carolina would cover, but what if the average spread for Carolina pickers was Carolina +4 and the average spread for Denver was Denver +7. The WotC result, then, indicates that the average spread would be Denver +0.4 (60 * 4 = +240 CAR; 40 * 7 = +280 DEN; 280-240 = 40 / 100 = +0.4 DEN).

So ultimately, the WotC would have proven correct because the extra detail of the actual scores would have indicated that, while more people picked Carolina, more people expected Denver to win by a bigger margin.

We want to see if this will play out over the course of the season.

What is a good forecasting record?

We’ll consider the wisdom of the crowds hypothesis to be feasible if the crowd can achieve 60% accuracy or better.  AccuScore boasts anywhere between 50% and 75% based on how they are pivoting. The overall success/failure of the hypothesis will be determined at the end of the season, though we will certainly be providing week-by-week updates.

Why should I use your site over a site like AccuScore or Sportsline?

First, CSS is free this season; you have to pay for many other sites. So there’s that. In the long term, to be sure, AccuScore looks like a very promising option. It seems to show a strong success rate. The main differences between us and AccuScore are two-fold:

  1. Rather than spend money on building an algorithm and passing the cost through to the customers, we simply ask the customer to provide their own knowledge and trust the hypothesis of the wisdom of crowds to provide the accuracy. As such, we will never charge as much as AccuScore will because we’re just making sure that we can keep the site going.
  2. Our site will have a strong community focus. AccuScore is one-way: they tell you their predictions and keep the model secret. On, we’re going to provide predictions as well as eventually enable community discussion and feedback. This is actually a critical component to good forecast accuracy. Having conversations with people and changing your mind on predictions. A great forecaster doesn’t pick a result and hold onto it; they continue to evaluate their prediction based on new information.