Friday, December 5, 2014

Big Data and Social Network Analysis


Networked thinking!


Network analysis involves analyzing the interactions between similar or dissimilar entities in order to identify characteristics of these interactions. A network consists of nodes, which can be any entity such as a person, a computer and edges, which represents interaction between two nodes. There can also be an edge from a node to itself. This represents that there is an interaction from the user to itself. This can be the case when a user has commented on his own status update on Facebook, for example.



Networks can be created using GUI tools like Gephi as well as manually using network library like NodeXL. While NodeXL is much more customizable, tools like Gephi make it much easier to visualize networks and calculate metrics. There are also various customized layouts available in Gephi, which can be used to visualize the network in different ways. For example, if the network needs to be plotted on a geographical map, Gephi provides a GeoLayout which can be used to do so.

In addition, a network is associated with certain metrics- degree, betweenness centrality, closeness centrality and others. Betweenness centrality indicates the extent to which a node is ‘central’ to a network, whereas closeness centrality is high for a node that is close to the largest number of nodes in the network.

The following analysis of Cryptocurrency shows how social network data can be used to derive interesting insights. The data comprises of:


 1. Twitter users who are interested in Cryptocurrency
 2. Facebook pages data of Cryptocurrency businesses

The data includes interconnection information in the form of “Follower/ Following/ Likes” fields. One of the major challenges with social network data is cleaning and preprocessing, as the data generally contains a lot of inaccurate and missing records. Here data cleaning was performed using Excel and R-programming.

Social network analysis helps segmenting the users into subsets based on similar attributes. The network analysis of the Twitter users in this case gave the following six clusters:


In order to assess these clusters, word cloud was created for each of these clusters based on user profile description. Based on this information, the users were classified to understand their profiles. This segmentation shows that most of the users of Cryptocurrency are Entrepreneurs, Techies or Traders. This information can used for targeted marketing to optimize business strategies.


As can be seen from the above example, the use of network analysis can provide insights into the structure of a community and their interactions. Here it was intuitive to figure out what the different communities are and how the community has been divided. This was just one of the many applications of social network analysis. Clearly, social network analysis has emerged as a key technique in various fields.










Wednesday, November 12, 2014

BIG DATA AND ANALYTICS IN HIGHER EDUCATION

The time of Big Data has touched base in higher education as IT gets to be progressively implanted in the courses of action that include "attending a university, for example,course enlistment, classroom guideline, and understudy administrations. Information about student journeys, successes, and failures can be caught to enhance both individual and aggregate across all of higher education when given back to students in valuable ways.

Here is an interesting video about the possible applications of big data and analytics in higher education:


Importance of Big Data in Higher Education:

Why is big data important to higher education?


It permits institutions to use Facebook as a marketing platform on the grounds that Facebook advertisers are willing to pay for targeted advertisements based on user behaviour
It enables alumni and career services offices to utilize the potential of  LinkedIn’s enormous career and employment data repositories to network and enable connections for job seekers
It allows your conference hashtags to trend on Twitter. The trends and responses to the tags can be later analyzed and measured to derived valuable insights.

For example:
Big data through predictive modeling enables us to crack the problems with student recruitment , retention and job placements before they can start
Analytical data from sources such as Google Analytics for websites, provides huge potential to make strategic decision about how to enhance websites and retain visitors on the websites over time


How  can Big Data Help Higher Education:
Feedback: Learning data can be informative from a feedback and context perspective. Some students may often regularly underperform in a topic but have not idea why he is not doing well. It gets intriguing when the learner can look at himself, as well as all other individuals who have had similar experiences. This may enable him to gain insights and explanation  on what is going wrong  so that he is not frustrated and can use the knowledge to rectify mistakes and improve his chances to  succeed.
Motivation:If big data is implemented comprehensively, individuals will be convinced to invest in putting data into the process as they will be able to see the impact of how it works.
Personalization: Big data will help in changing the way course and learning plans are designed by enabling developers to customize and fit the courses according the the individual needs of the learners. This will allow learners to raise the standards  and have a more efficient learning environment.
Efficiency: Big Data can improve efficiency by saving us hours upon hours of time and effort with regard to the matter of understanding our objectives and the methods we have to accomplish them. Let's assume somebody needs to take work B, having done work A for a year. Big data would demonstrate, above all else, the number of  individuals who did work A and who then landed in B. Of the individuals who landed position B, what qualities or skill sets did they have? It likewise would demonstrate which learning projects were best, and what the timing was for when they decided to change to work B.


Tracking: Big data can enable institutions to comprehend the real patterns of students all the more successfully by permitting them to track a learner’s involvement in a course. By examining the digital footprints or ‘breadcrumbs’ learners leave behind, institutions can track the  learners trajectory throughout the learning experience.


Understanding the learning process: By monitoring big Data, we can see which parts of a task or exam were excessively simple and which parts were difficult to the point that a learner got stuck. Different parts of the flow  that can now be tracked and investigated includes, pages returned to frequently, sections prescribed to peers, favored learning styles, and the time of day when learning works best.  
Challenges of Big Data for Higher Education
What are the challenges that stop higher education institutions from utilizing the true potential of big data?
Data quality: The biggest problem is the lack of good data. The data should be well-structured and there should be clear policies defining the data structure and how it should be used.
Infrastructure: The systems should be able to work with big data. This includes software systems that can generate user-friendly actionable reports, tools to extract data from multiple data sources and a reliable database.
Dearth of resources: One must know what to do with the data in order to take some action. This requires resources who are proficient at interpreting data and cleaning up existing data sources.
Awareness: Decision makers should know why big data is worth having in the organization. They should know of the potential of big data. Management and leadership should be aware of the best sources of information on the topic.
Although there are challenges facing the higher education domain, we believe that with time and financial resources these can be overcome to utilize big data to its fullest.
References:

Wednesday, October 22, 2014

The Art of Visualization and Storytelling

Data visualization is the representation of data in a graphical format. Data by itself can be boring to look at. It is difficult to analyze and draw conclusions by just looking at rows of a table. Data visualization provides an easier, more convenient way for the end user to digest information and make inferences.

In this Information Age, it is absolutely critical for organizations to makes sense of the vast streams of data which are available to them. Understanding the underlying patterns and relationships within the data while tying it the business problem is the key to effective decision making. Data visualizations help the end users to do just that!

A classic example which highlights the potential of data visualization is the Periodic Table. The hidden relationships are well captured and the key chemical elements, their atomic numbers and other properties can be effortlessly understood.


Data visualization is used primarily for two reasons: Exploring and Explaining

  • Visualizations for exploring are very useful when you are not sure what the data is telling you. It can help establish relationships and patterns in the data. 
  • Visualizations for explaining are useful once you understand the data and are trying to communicate an idea to the audience.

Visualization research has traditionally focused on exploration of data. But as we use data more and more for driving decisions, it is important to focus more on the explaining or ‘story-telling’.

Al Shalloway, founder and CEO of Net Objectives, says, “Visualizations act as a campfire around which we gather to tell stories.” Senior executives are flooded with dashboards and scorecards with an overwhelming amount of analytics. As these managers do not understand the story behind the data, they struggle with data-driven decision making. Storytelling helps a user gain insight from the data through visualization that the data supports.

What are some of the essentials of storytelling?

Understand your dataset
It is essential to have a good understanding of your dataset. You must know the source of the data, the field, the target audience. Knowing your data gives you a sense of authority and credibility.

Find a story and create a good structure
Once you have a good understanding of the data, you need to find a story to tell. It is important to have a narrative that is compelling and engaging.

Guide, don’t push
Your story must be a guide to the user experience. The visualizations must encourage users to understand the facts and draw their own conclusions which are meaningful to them. This kind of experience is more trustworthy and more personal which in turn makes it more memorable.

Keep it simple
It is easy to get carried away by the data and present the user with an overwhelming amount of information. Hence, it is essential that you prioritize and focus on keeping the story simple. Only provide those statistics which help in creating a compelling narrative

In this blog post, we are trying to convey that “how you say it” is just as, if not more important than “what you say”.  Here is an interesting story about storytelling which summarizes our thoughts!


References:

What is data visualization?
http://visual.ly/what-is-data-visualization

Why data visualization matters
http://radar.oreilly.com/2012/02/why-data-visualization-matters.html

10 Quotes on Data Visualization
http://blog.fusioncharts.com/2014/05/10-quotes-on-data-visualization/

Storytelling: The Next Step for Visualization
http://kosara.net/papers/2013/Kosara_Computer_2013.pdf

Tell a Meaningful Story With Data
http://www.thinkwithgoogle.com/articles/tell-meaningful-stories-with-data.html

Visual Storytelling: Why Data Visualization is a Content Marketing Fairytale
http://www.searchenginejournal.com/visual-storytelling-data-visualization-content-marketing-fairytale/92513/

Wednesday, October 1, 2014

Is the Internet of Things really here?

The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing internet infrastructure. In IoT, objects and people are assigned unique identifiers and are responsible to transfer data related to activities without requiring human-to-computer interaction. 

Brendan O’Brien, Chief Architect at Aria Systems says, “If you think that the internet has changed your life, think again. The IoT is about to change it all over again!” And we don’t think it is an exaggeration. Each “thing” that exists in the world will have the capability to be connected, communicated and operated automatically in a system that bridges the gap between digital and physical worlds.

There are three key factors to be considered for IoT:
  • What are the different types of “Things” are getting connected?
  • What kind of data is being collected?
  • How to analyze and monetize the collected data?

An interesting implementation of IoT is GE’s Industrial Internet, which is basically the integration of complex physical machinery with networked sensors and software. The industrial Internet is bringing an insightful transformation by connecting intelligent machines, advanced analytics, and people at work. The following video explains Industrial Internet in detail:


While we have commonly heard of thermostats and fitness devices generating data for analysis, the above video represents much larger ‘things’- ships, trains, windmills and so on generating data. Data analysis of thermostat data affects our personal life. The effect of data analysis of ships, trains, power plants, windmills has macro consequences for nations, and for the world. It can result in saving lives by predicting the failure of aircrafts, trains and power plants; saving precious natural resources resulting in both, saving them for the future, and reducing the amount of carbon transmitted into the atmosphere.

The internet of things also represents some security concerns. All devices being connected to the internet makes them vulnerable to cyber attacks. One example of vulnerabilities is using Shodan. Shodan is a search engine for finding devices connected to the internet- webcams, baby monitors and others. Shodan’s founder, John Matherly, revealed in an interview that Shodan has found vulnerabilities in control panels of power and utility systems, a giant hydroelectric dam in France, crematoriums and even a particle accelerator. Further, an HP study revealed that 70% of devices connected to the internet of things are vulnerable in some respect.

Although security concerns pose a serious threat, the internet of things represents significant opportunities. Given the rate of growth in embedded technology that can communicate, Internet of Things could become widely available in the market as early as 2020. There are significant hurdles to overcome from the security perspective for the internet of things to be commercialized, but the internet of things has arrived and will come into its full existence in a few years.

References

[1] Internet of Things - Wikipedia (J. Höller, V. Tsiatsis, C. Mulligan, S. Karnouskos, S. Avesand, D. Boyle: From Machine-to-Machine to the Internet of Things: Introduction to a New Age of Intelligence.)

[2] Is Shodan really the world's most dangerous search engine?

[3] 70 Percent of Internet of Things Devices Vulnerable to Attack

[4] Internet of Things Installed Base Will Grow to 26 Billion Units By 2020

Wednesday, September 10, 2014

Datafication: Applications and Implications

“Wal-Mart is able to take data from your past buying patterns, their internal stock information, your mobile phone location data, social media as well as external weather information and analyze all of this in seconds so it can send you a voucher for a BBQ cleaner to your phone - but only if you own a barbeque, the weather is nice and you currently are within a 3 miles radius of a Wal-Mart store that has the BBQ cleaner in stock.” - Bernard Marr

Everything that was previously invisible can be quantified and measured today, thanks to Datafication. Datafication is a process of transforming a process or activity into data that can be tracked, monitored and analyzed, leading to new opportunities in business intelligence, making it a key contributor to the Big Data revolution.

In a person’s daily life, there are many examples of datafication that can be seen. For instance, exercise activities are tracked by wearable fitness devices, tracking calories burnt, time and intensity of workout. They also suggest new routines and nutrition based on one’s workout history.




The GPS application in one’s phone suggests an optimal route based on data such as location, traffic and weather conditions. As one browses the internet, behavior, such as the number of clicks, the web pages visited and time spent on a page are analyzed to improve user experience of websites.

Social networking websites generate humongous data. A person’s thoughts, through tweets and status updates on social networks, are being datafied. Sentiment analysis, social network analysis, etc. are being performed on this data to understand public emotions, identify communities of closely-connected people and derive statistics on social causes and health issues. Other websites such as LinkedIn, Foursquare and Spotify are tracking and quantifying professional connections, location and music preferences respectively and using it in predictive analysis.

This data is often used for targeted advertisements and marketing by organizations. Consumers are not aware of the legal contracts they are agreeing to simply by using mobile applications and social networks. While targeted advertisements are useful, the consumer does not have complete control and awareness over the permissions being given by him. The recent Facebook social experiment where users’ news feeds were manipulated created an uproar among users because they believed that the experiment invaded their privacy. However, Facebook stated that signing up for Facebook itself gives Facebook the authority to use data for their own purposes. In other words, there was a clear misconception among users about the permissions they had given Facebook.

This kind of business intelligence is useful in many ways, but at the same time has privacy and security implications. Users must have more control over their own data. Further, Government and other regulatory bodies must come up with a standard framework for protecting data from misuse.

It is our team’s belief that while datafication is key to 'Big Data' revolution, there are legal ‘grey areas’ that need to be addressed by the regulatory bodies.

Please feel free to leave any comments or feedback.

Tuesday, August 26, 2014

Forty Two Zettabytes




42 is the answer to the ultimate question of life, the Universe, and Everything. Unfortunately no one knows the question. Zettabyte is a reference to the huge volume of data that is available on the internet. 

Forty Two Zettabytes is a group of nerdy data enthusiasts enrolled in MIS 586 at Eller College of Management. The crew members are an eclectic mix of six individuals looking for important questions to answer.

Introducing the crew

The crew comprises of Akash, Amogh, Elton, Pradeep, Shajay and Sukhada. There is a respectable blend of technical and non-technical expertise including business acumen, creativity, critical thinking to complement the core technical skills in software development, data integration, data visualization and advanced analytics.  They believe that this composition will help them accomplish their ultimate goal of formulating the ultimate question.  

Big Data - Team’s Perspective

To Forty Two Zettabytes, Big Data has a rather simple meaning on Earth. They do not believe that the size of data is overly significant. Mere Earthlings think that bigger data is better and more useful, but then they still think digital watches are a cool idea, so what do they know? Sure, the more the better, but insights can be extracted from a data set regardless of its size. In fact, ‘bigger’ data is often noisier and may contain an invisible skew. Having said that, it must be noted that with the advent of social media like Twitter (a phase that characterizes the ‘Medieval Ages’ of this planet), we are inevitably dealing with big and potentially noisy data. Therefore, a larger portion of time and effort must be devoted to extracting as clean a data set as possible. This is what Forty Two Zettabytes will attempt to accomplish, and stay true to the golden words - ‘Don’t Panic’.

Where the team expects to be at the end of this class? 

The data explosion has led to the birth of so called "Big Data". It's an unknown mystery for the team and the team hopes that by the end of this course the mystery will divulge itself. They measure it in terms of 3 V’s , volume, variety and velocity. The team expects to touch each of these aspects during the project work. The team is excited to learn new techniques to excavate, synthesize and envision the "Big Data". Skiing together, the team is confident to conquer this uphill battle.

Stay tuned for more updates!