How Can Crowdsourcing Scores Work?

One of the earliest forms of pushback that we received regarding the idea of crowdsourcing game scores was simply that there would not be enough expertise available to predict accurately.

“There are too many variables.”

“No one will every have enough information.”

“You’ll never get it right.”

To be honest, when we heard such gut-reaction pushback, it made us all the more excited to try the experiment. But beyond that, it’s important to recognize that those objections are addressed precisely by the Wisdom of the Crowds hypothesis.

Additional Reading – Wikipedia

One of the first experiments testing the WotC was at a county fair in the early 20th century estimating the weight of an oxen. The result of the 1000+ guesses was an estimate within 1% of the actual weight. I think we can agree that there weren’t 1000 ox-weighing experts attending a single county fair, so how did they get so close?

Here is where the WotC model makes so much sense. It relies on 4 key pillars:

Criteria Description
Diversity of opinion Each person should have private information even if it’s just an eccentric interpretation of the known facts.
Independence People’s opinions aren’t determined by the opinions of those around them.
Decentralization People are able to specialize and draw on local knowledge.
Aggregation Some mechanism exists for turning private judgments into a collective decision.

Each person at the fair brought their own expertise about how much things weigh. Even though it’s unlikely that many people could easily identify objects that weighed over 1000 pounds, they likely could identify objects that weighed 100 or 200 pounds, and they made their best educated guess based on that experience (“My son weighs 50 pounds and the ox looks like 20 of him, so it’s probably 1000 pounds”). When all the results were tallied, those who underestimated were balanced by those who overestimated, and the final result was almost exactly right.

What we are assuming is that the principles that allowed a group of people to predict the weight of an ox would apply to game scores as well. Specifically, by building a user base of independent, decentralized users from all over the country, we will have a group that provides little pieces of information with each prediction. CSS aggregates the result of that information in the form of final scores to generate predictions, and as a result, we expect that the final tallies will reflect a conclusion based on a wide variety of information.

Let’s look at a specific example: the Week 2 49ers-Panthers game in which the Panthers won by a score of 46-27.

In this example, let’s assume something that we expect is not realistic: our users come from 4 primary locations – Charlotte, San Francisco, New York City, and Dallas (4 metropolises, 2 associated to the specific teams, 2 independent).

Let’s also assume that the users in SF and Charlotte weigh their team’s performance more highly than the independent cities – the 49ers score more and their defense allows fewer points, for examples, in predictions submitted by users in San Francisco. The local cities reflect the biases of their local media. In the independent cities, the average prediction is more evenly distributed based on non-local media. So that is one way that the scores could balance out. You could also add predictions that believe that Blaine Gabbert is bound to have a great game and Cam Newton will feel the effects of the hits he took in Week 1. Those would be balanced by the people who didn’t believe Gabbert was very good and thought that Newton would play like a madman and go nuts. Some individuals would weigh ESPN’s sources more heavily, others would weigh’s more heavily, and everywhere in between. Some would look at DVOA from and consider that the definitive statistic for determining their prediction, others would look at a variety of data to figure out what the result will be.

The point is that there are many, many data points that feed perspectives, and the WotC is a very powerful aggregator.

When the aggregation is completed, what we expect to see is an even spread that reflects information from a wide variety of sources weighed by individuals. In the end, the final submission for each of the users is the result of their investigation and weighing of data to form a conclusions, and the wisdom of the crowd assumes that, across all of the different pieces of information from all of the users aggregates to a more accurate prediction than any single person could muster on their own.

Accounting for Expected Divergence

It seems unlikely that the consensus opinion for the 49ers-Panthers game would predict a final score of 46-27. As a general rule, a score this high is pretty rare, and we do not expect that the WotC would successfully predict it.

However, we are optimistic that the WotC would be accurate about two pieces of data that are really critical to our core value proposition:

  1. The game total of 45: it is unlikely that the median prediction would land on 73, but we expect that the median would predict whether the game total is too high or too low.
  2. The spread of 13 1/2: we expect that the WotC would indicate whether the spread was too high or too low independent of the actual scores.

Therefore, whereas when guessing the weight of an ox should be precise, we think that the true value of CSS lies in users accurately predicting whether the lines provided by the sportsbooks have value. In our world, it doesn’t matter whether the final totals beat the lines by 1 point or 20. Either case is a winning proposition for our users.

On the other hand, who knows? Maybe there were enough people out there predicting 50-point blowouts by the Panthers. We’ll see over time.


2 replies on “How Can Crowdsourcing Scores Work?”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s