Sometimes we have to spell the basics out.
I’ve seen online trolls reacting to the news that Cambridge Analytica harvested 6,000 Facebook accounts from Malta speak like the proper idiots they are.
One retort is ‘why be too bothered about 6,000 accounts when Labour’s majority was of 40,000?’ The logic — oh what a mistaken use of that term — is that even discounting for 6,000 people manipulated by abusive use of data, assuming they all would have voted PN, Labour would still enjoy a majority of 28,000.
No one can make assumptions on the basis of the data available about the extent of success of the manipulation that has likely occurred. Though anecdotally the case made that elections have been swayed in the direction opposite the political discourse is irresistibly compelling.
But 6,000 are a sample, and an enormous one at that.
Many still don’t understand the basic rules of statistical probability. By determining the tastes and preferences of a group of people selected randomly you can forecast what the population they belong to are likely to do.
Now by understanding the wishes of a 100 people out of an entire population, we can measure how certain we can be that the responses of the sample can predict the behaviour of the population. That certainty decreases if we have a sample of 80, instead of a 100, and increases if we have a sample of 200, instead of a 100.
You’d think that’s obvious.
But then you see the other troll response making the rounds. They only collected 6,000 profiles from Malta. They collected more from other countries so they can’t have been that interested in Malta.
It is the ratio of sample as a fraction of the population that determines the level of certainty. The smaller population, the smaller the sample needs to be to retain the same level of certainty.
Of the samples being reported from the communication between the EU and Facebook the Maltese sample is the second largest in Europe. Almost ‘l-aqwa fl-Ewropa’ except that Cambridge Analytica’s work in the UK connected to the Brexit referendum may have had something to do with the British sample ranking number one.
Here’s a statistical ranking of sample sizes as a ratio of known statistics of Facebook users: