Thread: How Twitter?
View Single Post
  #43  
Old 07-20-2022, 07:33 PM
Ari's Avatar
Ari Ari is offline
I read some of your foolish scree, then just skimmed the rest.
 
Join Date: Jan 2005
Location: Bay Area
Gender: Male
Posts: XMCMLVII
Blog Entries: 8
Default Re: How Twitter?

:giggle:
How to take a random sample, according to Musk. “Toss out the first 1000 followers then pick every 10th, I’m open to better ideas.”

That he’s asking twitter instead of, ya know, an expert, really doesn’t bode well for the rest of this case.

JunkCharts goes into details as to why this is a poor idea,
The Musk Sampling Plan by JunkCharts So what happens when the first 1,000 followers are ignored, as Musk stipulated? Presumably he believes the earlier followers of any account are unrepresentative. It's not clear this exclusion does what he intends. The exclusion rule won't affect his own account as 1,000 is a rounding error. The set of Musk followers - with or without the first 1,000 - is essentially the same.

Applying this exclusion to other people's accounts may be problematic because 1,000 followers is a pretty high bar. 2838, 70, 133, 564, 3432, 608, 16K, 10K, 982, 4648. Those are the follower counts for the first 10 accounts shown on Musk's feed. Half of those accounts have fewer than 1,000 followers. After removing the first 1,000 from these smaller accounts, there is no follower left to pick from. The next three accounts have between 3000-5000 followers; excluding a third to a fifth of their followers from sampling seems severe. Thus, the Musk sampling plan introduces a large-account bias.

***

The second step of the Musk sampling plan is described as pick every tenth follower. So, take Musk's 90 million followers minus the first 1,000, and we'll get a gigantic sample of 9 million.

Here comes the hard part. What he wants to know is the proportion of those 9 million accounts that are spam accounts. Who is going to decide whether each of the 9 million accounts are "spam", and how?

Nine million is too big a sample to handle. He may be applying the "rule of thumb" that a random sample should be 10% of the population. That's a myth busted in any intro Stats class. We typically only need to interview 1,000 Americans to generalize the sample responses to the entire population of 300 million, which is far, far, far, far smaller than 10%.

The required sample size is not a fixed number. If one can tolerate a larger margin of error, the sample can be reduced. What's our tolerable margin of error? If the true proportion of spam accounts is 5%, as Twitter management asserted, we may want a rather precise estimate, within plus or minus 2%. Reviewing Stats 101, we learn that translates to a standard error of 1%, and a sample size of 475.

What that means is if we take a random sample of 475 of Musk's followers, and learn that 5% (24) of those are spam accounts, then we can conclude that 5% of his followers are spam accounts, plus or minus 2%. (Others may disagree.)
The Musk sampling plan, thought through - Big Data, Plainly Spoken (aka Numbers Rule Your World)
Reply With Quote
Thanks, from:
ceptimus (07-21-2022), Crumb (07-20-2022), Ensign Steve (07-20-2022), JoeP (07-20-2022), Stephen Maturin (07-20-2022), Stormlight (07-21-2022)
 
Page generated in 0.13027 seconds with 10 queries