Jumat, 28 November 2014

Scientists Warn About Bias In The Facebook And Twitter Data Used In Millions ...

Social media like Facebook and Twitter are far too biased to be used blindly by social science researchers, two computer scientists have warned.


Writing in today's issue of Science, Carnegie Mellon's Juergen Pfeffer and McGill's Derek Ruths have warned that scientists are treating the wealth of data gathered by social networks as a goldmine of what people are thinking - but frequently they aren't correcting for inherent biases in the dataset.


If folks didn't already know that scientists were turning to social media for easy access to the pat statistics on thousands of people, they found out about it when Facebook allowed researchers to adjust users' news feeds to manipulate their emotions.



The poorly handled research resulted in headlines screaming about 'secret experiments', while data watchdogs in Europe launched investigations and the Electronic Privacy Information Centre filed an official complaint with the US Federal Trade Commission.


The outrage was because Facebook had allowed the researchers to conduct their experiment without explicit permission (which would have biased the results of course), but the situation was exacerbated by the social network's failure to get in front of the story and hand out an apology before it was forced to it.


Pfeffer told Forbes.com that embedded researchers, like those working on this Facebook study were a big issue.


'Moral issues and privacy are big issues,' he said, adding that it was up to universities to ensure that their ethical review boards took care of these issues.


What's more concerning for Pfeffer and McGill from a scientific standpoint is the lack of context for the data that researchers are compiling. The scientists said that thousands upon thousands of papers are now produced every year based on data from social media, a source that wasn't even around five years ago.


' The amount of research done from Twitter is enormous!' Pfeffer said. 'For instance, search for 'Twitter' on Google Scholar and you will get 4.9 million results . This is more than almost every other keyword possible, e.g. 'Sociology' (2.5M).'


Although social networks seem like a fount of free data, they often have substantial population biases that prevents that data from being extrapolated to the general public.


'Instagram, for instance, has special appeal to adults between the ages of 18 and 29, African-Americans, Latinos, women and urban dwellers, while Pinterest is dominated by women between the ages of 25 and 34 with average household incomes of $100,000,' Pfeffer and McGill said.


Even worse, social sites use proprietary algorithms to create or filter their data streams, algorithms that they could change at a moment's notice and that the scientists know nothing about.


'How can anybody evaluate the results if neither the data nor the exact methods are allowed to be published?' asked Pfeffer.


Those lucky enough to get privileged access to this proprietary information, so-called embedded researchers, are creating a divided social media research community, making it difficult to compare competing outcomes or objectively analyse a paper derived from social data.


On top of all that, many of the 'users' on social media sites aren't real people at all - they're celebrity staff tweeting on behalf of their employer, or PRs promoting a company, or even fake accounts for people that don't exist at all. In fact, half of all Twitter accounts created in 2013 have already been deleted.


These fake accounts are often created by unscrupulous firms that will beef up your follower count in return for cold hard cash.


'Twitter is in the centre of public interest and politicians or companies are often ranked by number of followers or re-tweets or the like - so, there is a whole 'web optimisation' industry offering services to make you look better on Twitter - everybody can buy 10,000 followers for $5,' Pfeffer said.


Most scientists doing real social science are already aware of these issues and correct for the biases in social media data, Pfeffer said, using existing techniques used in epidemiology, statistics and machine learning. But if scientists want to keep mining social data, they may need to come up with new ways to manage analytic bias.


For more on social media in social sciences, and other science and tech news, follow me on Twitter and Google +.


Tidak ada komentar :

Posting Komentar