The inner crowd or just statistics?
Synopsis: A recent journal article claims (literally) that second guessing ourselves gives better accuracy. I think this is wrong, and that what they’re seeing is just statistics. I’d like some opinions on this!
A recent paper in the journal Psychological Science (and much publicised by The Economist in an easier to digest article) makes some interesting claims. Basically the story goes that if you ask two people to estimate something (like the number of jelly beans in a jar, or the percentage of world airports in the USA) then taking two people’s guesses and averaging them gives you a better estimate than either alone (on average). It’s called the wisdom of crowds; extend it to a hundred people, and up to a point, the group guesses get better.
Fair enough, I could believe this, I think (more specifically, the group will average towards the “group bias”, and you’re assuming this is a “good” guess. I digress…) But the new paper goes one step further: it suggests that even one person can improve their guess by making two guesses and averaging them, improving accuracy by 10 percent. The article’s authors suggest:
Although people assume that their first guess about a matter of fact exhausts the best information available to them, a forced second guess contributes additional information, such that the average of two guesses is better than either guess alone. This observed benefit of averaging multiple responses from the same person suggests that responses made by a subject are sampled from an internal probability distribution, rather than deterministically selected on the basis of all the knowledge a subject has.
Translation: we don’t come up with one best guess straight off; instead, each “guess” comes from a range of possible values our brain has computed. Furthermore, they note that a delay of three weeks between the first and second guess improves the average, presumably by making the guesses more “independent”.
But something about this bugged me, so I did a little computational experiment myself: Generate a random number (between 0.0 and 1000.0, fractions allowed) which is the “right” answer and generate two more random numbers which are my guesses (also between 0.0 and 1000.0). Then, look at the difference between the first guess and the “correct” answer, and between the average of my two guesses and the “correct” answer. Repeat.
So to be clear: I’m randomly choosing the correct answer, and then I’m randomly making two guesses with no information other than a lower and upper bound. This would be perfectly reasonable for questions like the airport percentage above (which was mentioned in the article) where you know the answer is between 0-100. I did 1200 tests (in Excel; it’s not enough for the averages to be absolutely constant, but it doesn’t change the qualitative results.)
The results: On average, the first guess was off by 330 (which I’m sure you could argue theoretically, too). But for the average (mean) of my two guesses, I was only off by 290 - this is a 10% better guess than before! Here’s a summary:
| Average “answer” | 492 |
| First Guess average | 495 |
| Average of two guesses average | 499 |
| Deviation of Guess 1 (average) | 332 |
| Devation of AvGuess (average) | 293 |
| Percentage change | -12% |
| RMSE Guess 1 | 406 |
| RMSE AvGuess | 360 |
| Percentage change in RMSE | -11% |
Notice the averages are in about the right place; my data is pretty random, and yet simply taking the average gets me closer.
I’ve also included there the root mean square errors (RMSE), which were discussed in the paper [Disclaimer: I don’t really have a stats background: for the RMSE I took the difference between each guess and the corresponding correct answer, squared each difference and added the results, then averaged this sum and took the square root. I hope that’s right…] Unfortunately, I don’t know exactly the data numbers and ranges used in the article, so I’m not quite sure how to compare them directly, but the percentage changes are what’s important, I think, and they fit nicely with the paper’s predictions (of between 5-15%).
Obviously, I made some assumptions, in particular that no person had the faintest clue what the “right” answer should be. But that’s not ridiculous, and it shouldn’t be hard to redo the data with a normal distribution of guesses around the mean; I’d expect it to give the same result, but the conclusion is now trivial (average two deliberately normally distributed guesses, and you’re going to get closer to the mean, on average
).
I also did a couple of trials where I fixed the “correct answer” to the same thing for every person. If the correct answer was 0 (out of 1000), then, as you’d expect, either a single or an averaged guess both produced the same average deviation, 500. Interestingly, however, their RMSE were quite different, and the averaged guess again produced a 6% better guess by this metric. Same for a guess of 1000. Why? Averaging the two numbers produces a lower standard deviation around the average (500); the RMSE weights big differences more, so the values further away from 0 (or 1000, respectively) contribute much more, even allowing for those corresponding closer values. (Does that make sense?) So RMSE might not be a great measure here.
What about if we take 500 as the average? Then the effect is even more pronounced - the mean guess has a lower standard deviation, is closer to 500 on average, and contributes much more. 250 or 750 were similar, and I’d expect this to be true for the whole range.
In conclusion, I would argue that any benefit seen in this study is simply from statistics, not from some innate feature of our brain’s estimating abilities. The only thing they did get right is that the time delay probably allows the second guess to be more “random” which seems to be useful, on average!
Thoughts? If this makes sense, I’m going to write a rebuttal/similar, but I’d love some feedback first (particularly if I’m completely wrong and/or have missed something obvious!) The preprint article is missing a key figure, but I don’t think I’ve misunderstood or missed anything important (except for their actual data values). All comments welcome!
Addendum: I finally did some simple analytical calculations - when both guesses and the answer are chosen randomly and independently from 0-1000, the RMSE of one guess should be 410, compared to 355 for two averaged guesses, just like the simulations predict! I’ve also checked that the average absolute differences for one guess should be 1000/3, which looks right, but I haven’t yet slogged through the maths for the average guess case (does anyone know a shortcut for analytical expectation values of absolute values?!)
Addendum #2: I’ve now done the RMSE case for the absolute difference (I’m so slow sometimes…)Â and I predict 333 for the single guess or 291 for the averaged guess - right on the money! Thanks to Tim for useful comments about triangular probability distributions!