Friday, December 5, 2014

Big Data and Social Network Analysis


Networked thinking!


Network analysis involves analyzing the interactions between similar or dissimilar entities in order to identify characteristics of these interactions. A network consists of nodes, which can be any entity such as a person, a computer and edges, which represents interaction between two nodes. There can also be an edge from a node to itself. This represents that there is an interaction from the user to itself. This can be the case when a user has commented on his own status update on Facebook, for example.



Networks can be created using GUI tools like Gephi as well as manually using network library like NodeXL. While NodeXL is much more customizable, tools like Gephi make it much easier to visualize networks and calculate metrics. There are also various customized layouts available in Gephi, which can be used to visualize the network in different ways. For example, if the network needs to be plotted on a geographical map, Gephi provides a GeoLayout which can be used to do so.

In addition, a network is associated with certain metrics- degree, betweenness centrality, closeness centrality and others. Betweenness centrality indicates the extent to which a node is ‘central’ to a network, whereas closeness centrality is high for a node that is close to the largest number of nodes in the network.

The following analysis of Cryptocurrency shows how social network data can be used to derive interesting insights. The data comprises of:


 1. Twitter users who are interested in Cryptocurrency
 2. Facebook pages data of Cryptocurrency businesses

The data includes interconnection information in the form of “Follower/ Following/ Likes” fields. One of the major challenges with social network data is cleaning and preprocessing, as the data generally contains a lot of inaccurate and missing records. Here data cleaning was performed using Excel and R-programming.

Social network analysis helps segmenting the users into subsets based on similar attributes. The network analysis of the Twitter users in this case gave the following six clusters:


In order to assess these clusters, word cloud was created for each of these clusters based on user profile description. Based on this information, the users were classified to understand their profiles. This segmentation shows that most of the users of Cryptocurrency are Entrepreneurs, Techies or Traders. This information can used for targeted marketing to optimize business strategies.


As can be seen from the above example, the use of network analysis can provide insights into the structure of a community and their interactions. Here it was intuitive to figure out what the different communities are and how the community has been divided. This was just one of the many applications of social network analysis. Clearly, social network analysis has emerged as a key technique in various fields.