As we all know, one of the largest social networks currently available on the internet is Facebook, with approximately 600 million active users, and upwards of a billion links (as of June 2011). You could do a lot of interesting data mining if you could get your hands on Facebook’s data…and it seems you can. Apparently, almost 75% of Facebook users enable the default privacy that exposes their private data to web-crawlers. These web-crawlers can (and do) hop from link to link in Facebook, making it possible for them to navigate a large proportion of the entire Facebook listing. This is a dataset consisting of 44TB of subscriber and link information.
Once the data has been gathered, it is possible to examine the community structure within the network of connected Facebook users in more detail, and infer behavioural trends between similar users by applying various linear-time and heuristic community-detection algorithms.
One of the questions we have to ask here is: cui bono? Who benefits? These outside web-crawlers are not affiliated to Facebook and have not asked Facebook users for their permission to trawl and gather their data. Couple this with face-recognition software which is becoming quite sophisticated and it becomes apparent that our data is quite likely to fall into the wrong hands.
People underestimate the power of data mining and how much can be learnt about a person from the tiniest fragments of data. Careless users of Facebook are ideal targets for identity thieves as they give way too much information away and, once that information is out there, it is very difficult to pull it back.
The bottom line is: don’t ever underestimate the power of data mining. We all want to be better understood as customers, and data mining can really help in that regard, but we should always remember to value our data and to take steps to protect it in this digitally connected world.
 – Catanese S., De Meo P., Ferrara E., Fiumara G. and Provetti A., Crawling Facebook for social network analysis purposes, International Conference on Web Intelligence, Mining and Semantics, 2011.
 – Leung I., Hui, P., Li, P. and Crowcroft, J., Towards Real-time Community Detection in Large Networks, Physical Review E, 2009.
 – Catanese S., De Meo P., Ferrara E., Fiumara G., Provetti A. Extraction and Analysis of Facebook Friendship Relations, Computational Social Networks: Mining and Visualization, 2011.