A Quick Recap of the Theory

With the season just about to start, we wanted to recap how we think the Wisdom of the Crowd can work with NFL betting. (A more detailed post is here.)

Tl;Dr: Vegas’ goal is to match betting lines to public sentiment, not to predict final outcomes. We believe that the crowd can identify when public sentiment differs significantly from the predicted final outcome so we can take advantage of betting line values.

Thanks for visiting Crowdsourced Scores! Please get your Week 1 predictions in now!

What is the Wisdom of the Crowd?

The Wisdom of the Crowd is the theory that, over time, the crowd is a more effective predictor than any single individual. For the crowd wisdom to be “valid”, the crowd has to be:

  • Diverse: the crowd has to have a significant variety of opinion to ensure that biases are normalized.
  • Independence: an individual in the crowd develops their opinion on their own and without being influenced by another person.
  • Decentralization: the crowd is formed by people who have come from different backgrounds and experiences.
  • Aggregated: the individual opinions can be aggregated appropriately.

In short, the crowd needs to have a diverse group of individuals who are able to generate their opinions on their own.

How can the Wisdom of the Crowd Work for Sports Scores?

We believe that sports fans satisfy the three core components of a crowd: diversity, independence, and decentralization. will do the work of aggregating. The each individual of the crowd is their own super computer, processing all of the news and information out there and putting a prediction together based on their own calculations, and we aggregate the results.

How does this differ from what Vegas does? Doesn’t Vegas move the line based on the crowd behavior?

The main difference between us and Vegas is that Vegas is interested in matching the perception of each side and we are interested identifying where the public perception may be inaccurate. Most importantly, Vegas wants to balance the money. If one side has a heavy amount of money riding on it, the book is in serious danger is that side wins. A perfect 50/50 split BEFORE THE GAME is Vegas’ goal. They are not interested in matching the final outcome; they are interested in perfectly matching public perception to keep money flowing evenly on both sides.

Additionally, Vegas moves the line based on dollar values rather than on gross volume of bets. One $10,000 bet on the favorite would require 100 $100 bets on the underdog to match it so the line may move to encourage bets on the underdog even though only one person has bet on the favorite.

What we are trying to do is identify value in the line. In the stock market, this is called the margin of safety. If the crowd predicts a significant deviation from the Vegas line, we want to see if that prediction has value (we believe that it does).

What about Who Picked Whom?

Who-picked-whom seems like it is the same as crowd-sourcing predictions. If 70% of users are picking the underdog, how is that different from what Crowdsourced Scores is doing?

The short answer is that Who-Picked-Whom is a binary question that is limited by the same lack of detail as the betting lines. Do you believe that the underdog will simply cover the spread or win outright? Do you believe that the favorite will not only cover the spread but beat the tar out of the underdog?

In this situation, all choices fall into one of two buckets, and they are not distinguished at all. Therefore, it may tell you where the public sentiment is, but it doesn’t tell you if there is any value in the line. Additionally, the public is roughly 50% season over season.

We’re aiming to top that.

By aggregating score predictions, we are adding fidelity that will identify not only how many users choose one side or the other, but also how much each side believes its pick will win by. This will help identify when the crowd sees value in a given betting line and inform bettors.

Best of luck!


Wild Card CPR

Saturday: The crowd went 4-2 across the spread, game totals, and straight-up winners.

Sunday: The crowd went 5-1 across the spread, game totals, and straight-up winners.

Game Spread Total
Crowd (Odds) Actual Crowd (Odds) Actual
OAK (15.6)-HOU (17.2) -1.6 (-4) -13 32.8 (36.5) 41
DET (12.75)-SEA (27.25) -14.5 (-8) -20 40 (43) 32
MIA(13.67)-PIT (29) -15.33 (-10) -18 42.67 (46.5) 42
NYG (17)-GB (22.5) -5.5 (-4.5) -25 39.5 (44.5) 51

Overall, a 9-3 result was is quite respectable, and 3-1 against the spread is good as well.

  • The crowd did particularly well in the MIA-PIT result, predicting the game total exactly and coming within a point of each team’s actual score.

The interesting trend to point out is the DET-SEA and MIA-PIT results. In both cases, the home teams were big favorites, and the crowd not only predicted them correctly but predicted them with pretty high confidence. Specifically, the crowd predicted the favorite to outperform the spread by at least 50%. In both those games, the crowd went 6-0 across the key metrics. Conversely, in the other two wild card games, the difference between the crowd spread and the actual odds was small, and the crowd went

As we go forward, it will be interesting to consider the results in which the crowd predicts a significant over- or under-performance. Many touts use 3-star designations when they see a big opportunity, and as time goes on, we will track these types of performances to see if the crowd is particularly good (or bad) about smoking out the big opportunities in the odds.



Week 14 CPR

The week 14 results are in, and the crowd performance was average this week, hitting on 50% both against the spread and against the total:

  • ATS: 8-8
  • Total: 8-8
  • Straight-up: 10-6

The mode for the number of predictions per game was 5, and there were 4 games that received 7 predictions. The crowd did not perform significantly differently among games with more predictions than fewer.

Next-Level Crowd Wisdom

The crowd came within 1 point of predicting the score of the Bengals-Browns game and the Jets-49ers game:

  • Bengals-Browns:
    • Crowd prediction: 23-9
    • Actual score: 23-10
  • Jets-49ers:
    • Crowd prediction: 24-17
    • Actual score: 23-17

The crowd also predicted exactly the actual spread in the Cardinals-Dolphins game, and was within one point of the actual spread in the Saints-Bucs game.

Key Performers

Raiders-Chiefs 2: C.C. 3: C.A.
Steelers-Bills 4: G.A., C.A. 1: P.C.
Broncos-Titans 5: P.C. 10: D.B.
Saints-Bucs 4: P.C. 3: G.A.
Redskins-Eagles 1: P.C., C.A. 1: M.R., P.C.
Cardinals-Dolphins On the $: B.C. 1: P.C.
Chargers-Panthers 8: D.B. 4: P.C.
Bengals-Browns 2: B.C. 2: B.C.
Bears-Lions 6: G.A. 3: D.B., C.A.
Texans-Colts 2: B.C. On the $: P.C.
Vikings-Jaguars 2: P.C. 17: D.B.
Jets-49ers 4: G.A. 5: B.C.
Falcons-Rams 10: B.C. 2: G.A.
Seahawks-Packers 21: D.B. 3: G.A.
Cowboys-Giants On the $: B.C., P.C. 24: D.B.
Ravens-Patriots 4: G.A., B.C., P.C.  1: C.A.
Week 14 Lowest Combined Delta 128: P.C. 147: C.A.

A couple of interesting notes from the data:

  • B.C. was on the Bengals-Browns game by predicting both the spread and the total within 2 points of the actual results.
  • P.C. was first in with his picks; D.B. was last in.
  • Biggest out-on-a-limb prediction: G.A. who was within 2 points of the total of the highest total of the week (Falcons-Rams at 56 points)

On a personal note, I picked the Texans-Colts total within 1 but had the winner incorrect. I point that out for two reasons:

  1. It goes to show that the granularity of the predictions enables the wisdom of the crowd to provide greater guidance on more levels than a simple binary question of who will win/who will cover.
  2. Despite obsessing over this stuff to an unhealthy degree, I clearly don’t know anything.

Thanks for your continuing support!


Week 14 TNF CPR

[Editor’s note: I’m trying to add as many acronyms into the headline as possible.]

A quick data analysis of the Raiders @ Chiefs Week 14 game:
The crowd swung over to the Raiders close to kickoff. Looking at the data, we had just five total predictions, and two came in on Thursday. One predicted a Raiders win, one predicted a Chiefs win. The difference is that the Raiders win prediction expected a 13-point victory while the Chiefs win prediction expected just a 6-point victory.

Overall, of the 5 people who predicted the game, 2 predicted the Raiders to win while 3 predicted the Chiefs, and the 13-point spread was twice as big as any other prediction.

It goes to show that volume of predictions is critical to the success of the hypothesis because if we had 1000 predictions, one outlier isn’t going to throw the average off too much.


Week 13 CPR

Hi all, Week 13 results are in and, while the crowd was just better than 50% this week, there were some pretty interesting results to be had:

  • Total Predictions: 5-8 (some people didn’t complete all of the predictions, which would be a good feature addition)
  • ATS: 8-7
  • Straight-up: 11-4
  • Against the Total: 8-7

Next-Level Crowd Wisdom

The crowd predicted that the Chiefs would defeat the Falcons by 1 point and that the Patriots would defeat the Rams by 16 points. Every time one of these happens, it strikes me as pretty astonishing.

There were no perfect predictions this week, so we’re hoping that if we can get some better participation, the likelihood of more next-level crowd wisdom will go up.

The Grain of Salt

The main issue that we are facing at this point is simply the volume of predictions, so if the CSS Crowd out there can grow some more, we may have a better idea about whether the idea has merit.

Having said that, it was encouraging to see some overs predicted this week. In the first week of predictions (Week 11), 11 of 14 games went under and the crowd predicted all 14 as under, and I was concerned that the crowd would always be conservative. However, in Week 13, there were quite a few high predictions, though only one correct over prediction.


Week 11 Crowd Performance Review (CPR)

Hi all, the first results are in, and you all have some prognosticating game!
High-level summary:
  • ATS: 8-6
  • Straight up: 10-4
  • Total: 11-3

Next Level Crowd Wisdom:

The crowd predicted within a point the Baltimore-Dallas score of 27-17, the Bears-Giants score of 22-16, and the Seahawks score of 26-15, meaning the crowd nailed both the spread and the game total.
The crowd predicted within a point the spreads for Steelers-Browns (15) and Dolphins-Rams (5).
If the WOTC theory doesn’t ultimately hold water, this is a pretty astonishing outcome, I’d say.

Key Performers (names anonymized):

Game ATS Total
NO-CAR 3: CA, MR -2: KE
BAL-DAL 0 (on the $): GA 1: MG, KE, PC
JAC-DET 0 (on the $): PC 2: PC, GA
BUF-CIN 1: CA, PC 12: JL
TB-KC 1: JL, MR 2: CC
CHI-NYG 0 (on the $): CA 1: JL
MIA-LA 0 (on the $): JL 1: JC, CC
NE-SF 1: CC 1: GA
PHI-SEA 1: AJ, SW 0 (on the $): SB, EP
GB-WAS 7: SB 14: PC
Week 11 Lowest Combined Delta PC (81) JL (122)

Grain of Salt

While I would love it if the crowd kept up this performance, I’d say that there are a couple of factors worth noting:

  • The crowd predicted that the total would be under the Vegas total in 14 of 14 games, and the actual totals were under the Vegas total in 11 of 14 games. It seems unlikely that that trend would be followed, as in Week 10 only 6 games came in under the Vegas total.
  • Having said that, the entire premise of WOTC is that the crowd weighs various factors, and it may have been possible that the crowd leaned towards lower scores this particular weekend. Time will ultimately tell the tale.

Nice job Crowd! Please keep up your participation, and please share if you can.

Thanks in advance!

-PCSM Chris


The First Week Results

Thanks to some dedicated volunteers, we have seen our first set of results come through. While it would be nice to draw some conclusions, there were only 4 submissions, so we’re still likely in the realm of random chance.

Still, a couple of things:

  1. There was some very useful feedback that Paul and I are very grateful to have received.
  2. Overall, the crowd followed the favorites, generally selecting them to win and usually cover the spread.
    1. This past weekend, this didn’t work out so well.
    2. Correct picks against the spread:
      1. Tampa Bay
      2. Kansas City
      3. Dallas
      4. Seattle
      5. Cincinnati

Out of 13 games, getting 5 out of 13 is not far from a coin flip, so everyone, don’t feel too bad :). I’ll do a bit more digging to see if there were any outlier picks that may have skewed it one way or the other.

We’re hoping to get some more activity over the course of the season, so please ask your friends to help out if they can!


Crowdsourcing Spread vs. Margin of Victory

This is a follow up to our post on how crowdsourcing scores can work. In that post, we talked about how who-picked-whom is a crude metric of crowdsourcing predictions, but it’s main gap is that it doesn’t provide margin of victory to determine a more accurate crowdsourced prediction. As a result, even if the majority thinks that the home team will cover the spread, it does not indicate what the crowd believes the margin of victory will be and whether there is value in the spread set by the sportsbook.

Paul and I were talking this week about how to present spread prediction alongside margin of victory. Here’s the basic idea:

Team CSS Score CSS Margin of Victory # Win Predictions
Away 24 +4 200
Home 27 +7 100

Total Predictions: 300
CSS Game Total: 51
CSS Spread: Home -3

I’ll quickly summarize so everyone understands what this means. “CSS Score” is the average of all of the score predictions for each team (Σ(Away)/#Predictions),(Σ(Home)/#Predictions). The “CSS Game Total” is the average of all of the totals for each prediction ((Σ(Away) +Σ(Home))/#Predictions), and the CSS Spread is the average delta of all the home scores and away scores (Σ(Home) – Σ(Away)/#Predictions).

The Margin of Victory (MoV) is the data point I want to address. Where does MoV come in? This is kind of fascinating to me, actually. Margin of victory would be, for example, the delta between the home team’s score and the away team’s score only in games in which the home team’s score is greater. This would indicate a bit more than the spread by itself. As I discussed before, if more people believe that the home team will cover the spread, it doesn’t necessarily tell you how confident they are. Margin of victory provides that detail. So, let’s say that the CSS spread predictions is Home -3. That tells you where the average delta is across all predictions. But MoV provides a window into how confident users are in a win by either team. More people think that the away team will win, but the people who think that the home team will win think that they’ll win by a larger margin. In that scenario, you’d see the CSS Spread lean more towards the home team.

I am very excited to see where this data takes us. There is not enough of it to go around, and I think it’ll provide a lot of insight and advantage when making your picks.



Crowdsourced Scores as a Leading Indicator

I listened to a couple of podcasts this week and, combined with my own experience, my limited consensus says that Week 3 was a rough week for most people. The Sports Gambling Podcast hypothesized that it was because our conclusions are formed but are based on iffy data. You have 2 weeks of data, and in reality you have no idea how to read it, but because you have 2 weeks of data, you think it’s locked.

For example, consider the Arizona at Buffalo game. The Cardinals had just whipped the Bucs. The Bucs beat the Falcons, the Falcons beat the Raiders, the Raiders beat the Saints. Meanwhile, the Bills lost to the Ravens and to the Jets, neither of whom are highly regarded this season outside their core fan bases. Additionally, all of the money was on the Cardinals. All signs pointed to the continuing trend.

So of course, the Bills smacked the Cardinals around and won outright. But this wasn’t the only one; There were a lot of surprising results in Week 3.

So what happened?

Well, it’s hard to say in any kind of precise way, but what is really interesting is whether crowdsourcing the scores would have indicated something more granular afoot. For example, when people base decisions off assumptions, it gets harder to put numbers on their predictions. In other words, in the Cardinals-Bills game, you assume that Arizona would score 40 points again and the Bills would lay down, but if you put actual numbers against it, maybe it doesn’t seem so likely. You then start thinking through it a bit more, and maybe the crowdsourced results get closer to what actually happened than what all the signs pointed to happening.

At the end of the week, maybe the crowdsourced scores don’t say much different from the actuals, but it would be fascinating to see if maybe there was some sort of indicator that it was going to be a weird week.


How Can Crowdsourcing Scores Work?

One of the earliest forms of pushback that we received regarding the idea of crowdsourcing game scores was simply that there would not be enough expertise available to predict accurately.

“There are too many variables.”

“No one will every have enough information.”

“You’ll never get it right.”

To be honest, when we heard such gut-reaction pushback, it made us all the more excited to try the experiment. But beyond that, it’s important to recognize that those objections are addressed precisely by the Wisdom of the Crowds hypothesis.

Additional Reading – Wikipedia

One of the first experiments testing the WotC was at a county fair in the early 20th century estimating the weight of an oxen. The result of the 1000+ guesses was an estimate within 1% of the actual weight. I think we can agree that there weren’t 1000 ox-weighing experts attending a single county fair, so how did they get so close?

Here is where the WotC model makes so much sense. It relies on 4 key pillars:

Criteria Description
Diversity of opinion Each person should have private information even if it’s just an eccentric interpretation of the known facts.
Independence People’s opinions aren’t determined by the opinions of those around them.
Decentralization People are able to specialize and draw on local knowledge.
Aggregation Some mechanism exists for turning private judgments into a collective decision.

Each person at the fair brought their own expertise about how much things weigh. Even though it’s unlikely that many people could easily identify objects that weighed over 1000 pounds, they likely could identify objects that weighed 100 or 200 pounds, and they made their best educated guess based on that experience (“My son weighs 50 pounds and the ox looks like 20 of him, so it’s probably 1000 pounds”). When all the results were tallied, those who underestimated were balanced by those who overestimated, and the final result was almost exactly right.

What we are assuming is that the principles that allowed a group of people to predict the weight of an ox would apply to game scores as well. Specifically, by building a user base of independent, decentralized users from all over the country, we will have a group that provides little pieces of information with each prediction. CSS aggregates the result of that information in the form of final scores to generate predictions, and as a result, we expect that the final tallies will reflect a conclusion based on a wide variety of information.

Let’s look at a specific example: the Week 2 49ers-Panthers game in which the Panthers won by a score of 46-27.

In this example, let’s assume something that we expect is not realistic: our users come from 4 primary locations – Charlotte, San Francisco, New York City, and Dallas (4 metropolises, 2 associated to the specific teams, 2 independent).

Let’s also assume that the users in SF and Charlotte weigh their team’s performance more highly than the independent cities – the 49ers score more and their defense allows fewer points, for examples, in predictions submitted by users in San Francisco. The local cities reflect the biases of their local media. In the independent cities, the average prediction is more evenly distributed based on non-local media. So that is one way that the scores could balance out. You could also add predictions that believe that Blaine Gabbert is bound to have a great game and Cam Newton will feel the effects of the hits he took in Week 1. Those would be balanced by the people who didn’t believe Gabbert was very good and thought that Newton would play like a madman and go nuts. Some individuals would weigh ESPN’s sources more heavily, others would weigh’s more heavily, and everywhere in between. Some would look at DVOA from and consider that the definitive statistic for determining their prediction, others would look at a variety of data to figure out what the result will be.

The point is that there are many, many data points that feed perspectives, and the WotC is a very powerful aggregator.

When the aggregation is completed, what we expect to see is an even spread that reflects information from a wide variety of sources weighed by individuals. In the end, the final submission for each of the users is the result of their investigation and weighing of data to form a conclusions, and the wisdom of the crowd assumes that, across all of the different pieces of information from all of the users aggregates to a more accurate prediction than any single person could muster on their own.

Accounting for Expected Divergence

It seems unlikely that the consensus opinion for the 49ers-Panthers game would predict a final score of 46-27. As a general rule, a score this high is pretty rare, and we do not expect that the WotC would successfully predict it.

However, we are optimistic that the WotC would be accurate about two pieces of data that are really critical to our core value proposition:

  1. The game total of 45: it is unlikely that the median prediction would land on 73, but we expect that the median would predict whether the game total is too high or too low.
  2. The spread of 13 1/2: we expect that the WotC would indicate whether the spread was too high or too low independent of the actual scores.

Therefore, whereas when guessing the weight of an ox should be precise, we think that the true value of CSS lies in users accurately predicting whether the lines provided by the sportsbooks have value. In our world, it doesn’t matter whether the final totals beat the lines by 1 point or 20. Either case is a winning proposition for our users.

On the other hand, who knows? Maybe there were enough people out there predicting 50-point blowouts by the Panthers. We’ll see over time.