A sentiment analysis of a Premier League game: Man United vs Arsenal

A sentiment analysis of a Premier League game: Man United vs Arsenal

A sentiment analysis of a Premier League game: Man United vs Arsenal

 

One of the biggest rivalries in the English Premier League is the one between Manchester United and Arsenal. Less so in recent years since the retirement of Alex Ferguson, but still a hugely anticipated match between two of the world's biggest clubs.

One of the major outlets for fans of these two clubs to voice their opinions and feelings about upcoming games is Twitter. With a Twitter fanbase of 9.31M for Man United and 8.48M for Arsenal, it can be a great source of data about how the fans are feeling in the build-up to a game and their emotions in the hours after it.

On the 19th of November 2016, Man United and Arsenal went head to head in a game that, in all honesty, won't go down in history as being a great game. Although both teams are battling to stay in the running for the title, the game ended with just a 1-1 draw. Nothing all that exciting for the Twitter followers to remark on (with the exception of the equalising goal).

But we can still see some interesting results when we analyse the tweets being posted about the game by doing a sentiment analysis. A sentiment analysis essentially takes a piece of text and assigns emotions to the specific words being used. That overall text can then be determined to be positive or negative and we can work out the specific emotions being expressed. We can then plot these emotions on a graph and examine how they change over time.

For example, this tweet below can be classed as being an overall negative one. Each of the words being used “abysmal”, “gutless” etc. can be grouped into specific emotions, this helps us understand the feelings being expressed in the tweet.

screen-shot-2016-11-24-at-15-44-41

Below is a graph of the results. As you can see, tweets about this game started growing strongly an hour or two before the match, peaked towards the end of the match, and declined steadily until ten hours after the match. 

red-sqirl-premier-league-sentiment-2

Two interesting points worth highlighting from the results are the levels of surprise and trust:

red-sqirl-premier-league-surprise-2

Looking at the surprise, we can see a clear spike towards the end of the game, most likely caused when Olivier Giroud scored the equalising goal in the 89th minute.

screen-shot-2016-11-24-at-16-20-08

Analysing the trust is quite interesting, a huge number of people tweeting felt a lot of trust before the game kicked off, it then drops slightly after kickoff but starts to rise again half way through the game. Possibly at halftime with the score being 0-0, the fans felt it was all still all to play for.

It's clear to see the potential use cases for a system such as this, a complex analysis of a large constantly updating dataset, scheduled to run at predefined intervals. For example, we’ve used this previously to explore the sentiment on the US presidential election.

The difficulty with an analytical project like this is setting it up. Building the data pipeline that goes from gathering the data, to building the analysis workflow, to scheduling that workflow to run periodically and then to display the results, usually takes a lot of expertise and overhead. However, Idiro Analytics have developed a tool called Red Sqirl which can perform each of these steps in one intuitive interface.

Modern sports and data analytics now go hand-in-hand, it'd be hard to imagine a professional sports organisation that wouldn't be utilising data analytics in some form. And with data becoming more easily obtainable, it opens up so many more opportunities. With the right tools data analytics can be accessible to a lot more people.

Red Sqirl

Red Sqirl is a flexible drag-and-drop Big Data analytics platform with a unique open architecture.

Red Sqirl makes it easy for your analysts and data scientists to analyse the data you hold on your Hadoop platform.

For more information visit RedSqirl.com, and for a guide on how to build the entire process of analysing Twitter data using Red Sqirl, as outlined above, please read our detailed guide.

Title image courtesy of Premier League ©

What do the Irish think about Trump and Clinton?

What do the Irish think about Trump and Clinton?

What do the Irish think about Trump and Clinton?

 

Comparing Irish people's opinions to the rest of the world

 

It’s everyone's favourite subject right now, the US election. Unless you've somehow avoided consuming any form of media over the last few months, you'll have no doubt been exposed to a lot of opinions and "facts" about the two front runners for the US election Hillary Clinton and Donald Trump.

With such a media overload, it's hard not to form our own ideas about who should be elected and who shouldn't. It's a strange phenomenon, the world being so invested in an election for a nation we have no vote in. The American people will vote for an American president, and yet the rest of the world seems to feel like we're involved in the decision.

With this in mind, we here at Idiro Analytics decided to get a clearer understanding of the opinions of people here in Ireland surrounding the election. Do the opinions of the Irish people differ from those of the rest of the world?

To do this we chose to use Twitter as our source of public opinion to do an analysis on. We gathered thousands of tweets posted about the election over a 24 hour period in the days leading up to the election and ran a sentiment analysis on them.

This means we were able to break down each tweet and work out the sentiment (overall feelings) being expressed by analysing the types of words being used in each tweet. From this, we can then chart if the majority of tweets being posted about both Clinton and Trump are positive or negative and the general feelings behind each one.

First, let’s look at the sentiment for both Clinton and Trump worldwide:
idiro-analytics-what-irish-think-us-election-wh

idiro-analytics-what-irish-think-us-election-wt

One interesting note from the two charts above is the huge difference in the number of tweets being posted about each person. The number of people tweeting about Trump is over three times higher than the number of people tweeting about Hillary.

If we break this down further into just positive and negative sentiments, we can see that the majority of Tweets being posted worldwide about both Clinton and Trump are negative.

idiro-analytics-what-irish-think-us-election-pnh

idiro-analytics-what-irish-think-us-election-pnt

 

Now let's look at the sentiment of Irish people towards the two. (Note that in order to get a large enough sample to analyse, we used tweets posted by people in Ireland over 4 to 6 days leading up to the election)

idiro-analytics-what-irish-think-us-election-hti

From looking at the chart above, it's strange that even after all we've read about Trump over the past year, we're still surprised by him.

idiro-analytics-what-irish-think-us-election-hi

idiro-analytics-what-irish-think-us-election-ti

Although it’s not by a huge amount, we can see that the sentiment towards Hillary in Ireland is positive compared to the negative worldwide sentiment towards her, whereas Trump is still negative.

Lastly, let’s combine the worldwide sentiment for both Hillary and Trump versus the sentiment towards them in Ireland.

idiro-analytics-what-irish-think-us-election-hwi

idiro-analytics-what-irish-think-us-election-twi

From these last two charts we can see that the Irish people have a little more fear and anger about the future than the rest of the world. Is there something we know that they don't?

 


 

About Idiro
Based in Dublin, Ireland, Idiro Analytics is an award-winning provider of analytics to businesses around the world.

For an overview of Idiro’s analytics services, see our homepage www.idiro.com

Media contact information
Simon Rees, Clients & Marketing Director, Idiro Analytics.

simon.rees@idiro.com

+353 1671 9036

 



red sqirl
The data analytics work for this article was performed using Red Sqirl. From within Red Sqirl, we were able to build a data pipeline that gathered thousands of tweets, sorted each tweet, run multiple different analysis steps on the data and output results into visualisations in real-time. Visit the Red Sqirl website for more details

 


 

Can Ireland beat the All Blacks?  Irish people really believe they can

Can Ireland beat the All Blacks? Irish people really believe they can

Can Ireland beat the All Blacks? Irish people really believe they can

 

There's an excited buzz in the air for Irish rugby supporters right now. Tomorrow, once again, Ireland will take on the All Blacks, in an attempt to break a 27-time losing streak (and 1 draw), against unquestionably the greatest rugby nation in the world.

And you might think, what's the big deal, this isn't a major tournament, there are no records to be broken, it's being played in a country that doesn't have a love for the sport, nothing is really on the line. This is just one more attempt in a long 111 year losing streak to the All blacks.

And yet, the Irish never really look at things like that. When faced with impossible odds in any sport, when everyone has all but written us off, the Irish supporters always have the mindset of 'yeah but what if...'


But what if, they're just underestimating us

But what if, they slip up

But what if, they get overwhelmed by the Irish supporters

But what if ...

As such a small nation, we'll always be considered to be punching above our weight. But when it comes to rugby, we really do stand proudly up there with the best in the world. We truly believe that if everything goes right, we can beat any team.

The emotions felt for the Irish team is a difficult thing to put down on paper, people talk of the mood in the Irish camp, the atmosphere around the stadium before a game, the emotions of the supporters, it's not really something that can be drawn on a chart.

But what if there was a way?

If we were to use a data analytics technique called sentiment analysis, could we understand the overall emotions being felt about the game tomorrow?

Sentiment analysis is essentially taking a piece of text and by looking at the words being used, determining if that piece of text is overall positive, negative or neutral. Where this can become interesting is if we were to apply it to something like Twitter.

If we apply this technique to all of the tweets being posted by people in Ireland about the Ireland vs All blacks game, could we get a picture of the overall feelings towards the game?

So, looking at the weeks leading up to the game, we took all of the tweets from people talking about Ireland vs the All Blacks in Ireland and performed a sentiment analysis on them. We also decided to do the same with all of the tweets coming out of New Zealand about the game and we were able to plot them on the charts below.

screen-shot-2016-11-04-at-18-35-50

screen-shot-2016-11-04-at-18-35-32

 

What these two charts are showing us is the overall emotions being felt by the people in both countries about the game tomorrow. Now, we know this is far from definitive fact, these are only showing the feeling of the people talking on Twitter about the game, and that’s only going to be a fraction of the overall supporters. Interestingly, though, this would include any journalists posting about the game, the people that others look to help formulate opinions.

From looking at the results, we can see that a high proportion of the tweets from both countries have a sentiment of 'anticipation', which may seem obvious, but just stands to prove the concept of this technique.

The next highest sentiment from both countries would be a feeling of 'trust'. Again, may seem obvious from the people in New Zealand, of course they would have trust in their team, it's the All Blacks. But, this does bring us back to the point we made earlier, when up against a team we’ve never been able to beat, the Irish people still have trust that we can win.

Another interesting point to take from these tables is that, for New Zealand, there does seem to be some fear creeping in. We know and the All Blacks know that Ireland are a good team, there is the potential that they can actually win this game. And what may be weighing on the minds of the New Zealand supporters is that all of the pressure is on them. If Ireland lose, then they've lost to the best, but the All Blacks are the ones on the winning streak.

One last analysis we did was to get the overall feeling towards Joe Schmidt, with him committing his future to Ireland, we wanted to see what the Irish people thought. And as you can see from the chart below, we have complete trust in him.

screen-shot-2016-11-04-at-18-35-50

COME ON IRELAND!

 

Where Red Sqirl lies in the Big Data landscape

Where Red Sqirl lies in the Big Data landscape

Where Red Sqirl lies in the Big Data landscape

 

Today, Big Data is the platform of choice for storing, exploring, visualising and modelling your data.
 

In order to get where it is today, there have been a number of distinct generations of Big Data, each one advancing on from the one before it. The first generation simply gave us the ability to analyse Petabytes of data with tools like MapReduce. The next generation gave us more responsive tools for analysing data such as Spark, Impala and Prestodb. Now, this third generation sees the emergence of tools for moving data around such as Kafka, Kudu.
 

The Data Lake & Real Time Analytics

These tools have changed the way that data is analysed and how data solutions are now built. The emergence of Big Data has brought with it many new concepts, one of them being the Data Lake.

A Data Lake is a massive enterprise-wide data repository to which analysts can contribute to and cherry pick data they need, in a format best suited to the data. The Data Lake looks to solve the problem of data silos, eliminating dozens of independently managed data collections and creating one combined data collection. Data lakes have become essential to Big Data projects due to an increasing demand for data to be accessible and agile.

Another term becoming popular right now is Real-time analytics. Essentially, it means triggering an event that fulfils a prerequisite in real time. Although the term real-term is misleading, as you can’t actually analyse Big Data in real-time, but only act on it. Real-time analytics works by rather than analysing an entire base, the analytics instead relies on intelligently interacting with parts of the data lake, in order to perform actions on a user by user basis.

Real-time analytics relies heavily on periodic batch process analyses to continuously evaluate the impact of new data, and make the behaviour of each user's action evolve over time. Without these batch processes, the analytics being performed would not be on up to date data.
 

Data Pipelines

The key to any Big Data analytics job and an up to date Big Data warehouse are these periodic background processes, and if done well, a huge range of services can be built from them.

For example, you can perform ad-hoc analyses easily, you can maintain analytics jobs and upgrade/update them quickly etc. The method for creating these background processes is known as building data pipelines. Building data pipelines are an essential part of analysing data using Big Data techniques.

It's for this reason, that all the most popular Hadoop distributions - Cloudera, Hortonworks, MapR, all include a tool for periodic processing: Apache Oozie.

Apache Oozie is the tool that triggers processes based on time and data availability, Oozie supports any data format, language and is fault tolerant. Apache Oozie is, however, very difficult to use as there is a lot of overhead between implementing a process to run once and running it on a regular basis.
 

So we built Red Sqirl

Red Sqirl is a drag & drop analytics tool which can also build Oozie workflows in the background. With Red Sqirl you can build, deploy and maintain data pipelines easier than ever before using an intuitive drag and drop interface.

 

The counties with the most dangerous roads in Ireland ahead of the bank holiday weekend

The counties with the most dangerous roads in Ireland ahead of the bank holiday weekend

On Bank Holiday weekends we’re used to reading about people being killed on Irish roads. But which counties have the most dangerous roads?


Although Dublin and Cork have had the highest number of fatalities, does that mean they have the most dangerous roads in the country or do other factors need to be taken into account?

As with most bank holiday weekends, there is a heightened risk of driving over the next few days. This can mainly be attributed to higher volumes of traffic as many people visit family and friends and in doing so, undertake long road journeys.

The Road Safety Authority (RSA) have issued statements about taking extra care on the roads this weekend. And by looking at the numbers over the last 20 years, it's clear the RSA are succeeding in their goal to make our roads safer. Even though the number of fatalities this year are higher than the same time period in 2015, the overall trend is that our roads are becoming safer.

 

Idiro-analytics-Irish-dangerous-roads-fatalities-2015

 

In the interest of improving the safety of Irish roads, we here at Idiro Analytics wanted to shed some light on some of the details of road safety statistics that can usually be overlooked or misinterpreted, leading to the wrong conclusions.

In 2014 and 2015 the number of road fatalities in Ireland were 193 and 166 respectively. By studying the charts below, it's easy to see how the assumptions can be made that the two most dangerous counties for road accidents are Dublin and Cork. However, these figures don't show the full story, because there are a lot of other variables to take into consideration.

 

Idiro-analytics-Irish-dangerous-roads-fatalities

 

Other details that need to be taken into account are:

  • The length of road in each county
  • The number of vehicles on the roads
  • The average distance traveled
  • The total population sizes

 

Below you can see details from each of these different variables (note: summarized tables - not all information is contained):

 

Idiro-analytics-Irish-dangerous-roads-total-length-roads-population-in-ireland

 

When analysing all the information, we can see a clear picture can starting to form. Although Dublin and Cork may at first glance seem to have the most hazardous roads and will be noticed more in the national press, it is Longford and Monaghan that rank 1 and 2 respectively with the most dangerous roads.

 

Idiro-analytics-Irish-dangerous-roads-fatalities-per-km

 

Both Longford and Monaghan have low populations, low road lengths, a low amount of vehicles on the road and low average distance travelled, but it was found that they have a high proportional fatality rate averaged over 2014 and 2015.

  • Longford and Monaghan: 2 fatalities per 10,000 vehicles
  • Longford and Monaghan: 3 fatalities per 300 million km travelled
  • Dublin: 0 fatalities per 10,000 vehicles
  • Dublin: 1 fatality per 300 million km travelled
  • Cork: 1 fatality per 10,000 vehicles
  • Cork: 1 fatality per 300 million km travelled

 

Looking for a cause


With this in mind, we can now try to work out some of the possible causes and determine areas that may need further investigation.

Access to public transport could be one possible factor. Both Longford and Monaghan have a low number of public service vehicles (buses and taxis) per km per head of population.

Idiro-analytics-Irish-dangerous-roads-public-transport

Another factor leading to these insights could be found in a recent road surface survey carried out by the Department of Transport, Tourism and Sport (DTTAS) and the National Roads Authority (NRA) in 2011/2012.

The survey found that although Longford and Monaghan rank low on counties needing 'routine maintenance', 'surface restoration', 'road reconstruction', both counties are ranked number 1 and 2 for needing resealing & restoration of skid resistance.

Idiro-analytics-Irish-dangerous-roads-maintenance

We all know, from the information given to us from the RSA, that on a bank holiday weekend we need to be extra careful when travelling. And we also know that a lot of lives have been lost on the roads in both Dublin and Cork, a higher number than any other county.


But, one thing we need to be aware of is that although the number of road deaths in those two counties is high, they would not have the most dangerous roads in the country. Per km the roads in both Co Longford & Co Monaghan pose a greater risk and extra care needs to be taken.


Therefore, be careful out on the roads this weekend & especially so in Cos Longford & Monaghan.

In order to make this article more accessible, we've only included summaries of the overall data that we analysed. But, we invite anybody who finds an interest in these figures to contact us if you have questions or would like to discuss any part in detail. We'll be happy to discuss the findings with the hope that the information can lead to safer Irish roads.


About Idiro
Based in Dublin, Ireland, Idiro Analytics is an award-winning provider of analytics to businesses around the world.

For an overview of Idiro’s analytics services, see our homepage www.idiro.com

Media contact information
Simon Rees, Clients & Marketing Director, Idiro Analytics.

simon.rees@idiro.com

+353 1534 30 34

Mayo still to beat Dublin by one point in the All-Ireland replay

Mayo still to beat Dublin by one point in the All-Ireland replay

Mayo to beat Dublin by one point
in the All-Ireland replay

 

In the two weeks following an All-Ireland final containing 2 own goals and a draw that nobody predicted, a trend seems to be forming among GAA supporters; Dublin underperformed and Mayo may have missed their chance. The consensus being that for the replay, Dublin will ‘click into gear’ and play ‘their’ game and come out on top.

But is this really the case?

Most seemed to think the 2016 title belonged to Dublin before they even stepped into Croke Park on that rainy Sunday afternoon, but by looking at the numbers, we found that the pundits’ confidence was unfounded.

If you read our last article, you'll have seen that Dublin were not the predicted winners of the final, we had Mayo to win by one point. And although we admit predicting the exact score of a game isn't really possible, we ended up being pretty close.

We had set ourselves the challenge of working out a model to predict who would be the 2016 champions (and get an edge on the bookmakers). We did this by looking at the information available to us on both the Dublin and Mayo teams’ performances over time. With key areas being goal difference between Mayo/Dublin, point differences between them, regular differences in finals, average goals and points that season, differences between average goals/points that season and the finals etc.

We came up with the prediction of Mayo to win by just one point.

Now, even if we were to analyse every single data point and statistic since the GAA was formed in 1884, we still wouldn't have been able to predict two own goals in an All-Ireland final. But the fact that our prediction seemed to go against the general opinion of Dublin being favourites and the game ending up being so close, we thought it might be worth trying again.

The difference, this time, is that we now have more data to work with. Not only have both teams played another game that we can factor into our original predictive model, but we now have more data on how each team performs against each other.

 

Results of previous fixtures

 
idiro-all-ireland-2016-prediction-results
Over the past four years, Dublin and Mayo have now played each other four times. The particular details of those matches play a key role in predicting the outcome of this Saturday's match, with a higher weighting on the most recent games as they are the most relevant to each team's current form.

 

GAA Football All-Ireland Senior Championship final 2016

 
idiro-all-ireland-2016-prediction-stats3
With these extra details in mind, we were able to refine our original prediction and develop a new one.

 

The Idiro Analytics official prediction for the All-Ireland final replay

Mayo 1:13 - 1:12 Dublin

Mayo to win by just one point

 

Now there’s no doubt that the weather did have a major effect on the performance of both teams on that error-filled Sunday two weeks ago. But with the weather forecasted to be a lot milder this weekend, we should see a much-improved display by both teams. Although looking at the numbers, we still stand by our original analysis that these two teams are more evenly matched than people seem to think.

 

About Idiro
Based in Dublin, Ireland, Idiro Analytics is an award-winning provider of analytics to businesses around the world.

For an overview of Idiro’s analytics services, see our homepage www.idiro.com

Media contact information
Simon Rees, Clients & Marketing Director, Idiro Analytics.

simon.rees@idiro.com

+353 1534 30 34

Analysis performed by Eduards Vanags

Mayo to beat Dublin by one point

Mayo to beat Dublin by one point

Mayo to beat Dublin by one point

 

All the Dubs and Mayo people will give you an answer, of course, but can data analytics predict the outcome of this weekend's All-Ireland final, or more impressive yet, the score?

Predicting the outcome of a single game is a difficult task, predicting a winner of a league competition would be a much safer bet.

In a league competition, teams would play a lot of games, diminishing the impact of losses on their overall performance. And although they may falter a few times, underperform, fail to capitalise on chances etc. over the course of a season, it is usually the best teams who come out on top. But in a knockout competition, anything can happen. Which is an argument for why Leicester City winning the English Premier League is a bigger achievement than Portugal winning the Euros. In the knockout competition, Portugal were crowned champions by only winning four games, only one of those in 90 minutes and one on penalties. How much of that success was down to luck, and if it was a league style competition, would they have still won?

But let’s say we want to work out who’s going to win the All-Ireland final this weekend, Dublin or Mayo, is it even possible? The short answer, no! But let's give it a shot anyway.

Now, some other sports (e.g. soccer) have the luxury of huge pools of data and statistics. With such sports, we can base the predictions for who will win games in the Euros and the World Cup with huge weightings on player performance rankings, and comparing performances when they’ve played against the same teams. But the GAA isn’t quite there yet in terms of individual player data. Also, the way the league is structured means that rarely do both Mayo and Dublin come up against the same teams on a regular basis (over the last four years, Dublin have played Kerry just twice and Mayo have also played Kerry twice).

What we do have to work with is the performance of both teams over time. Our data analysts broke this down and looked into key areas such as goal difference between Mayo/Dublin, point differences between them, regular differences in finals, average goals and points that season, differences between average goals/points that season and the finals etc.

idiro-results

By excluding any emotional bias and purely looking at the history and current form of both teams, Idiro Analytics have calculated a prediction of:
 

Mayo 1:15 - 2:11 Dublin

Mayo to beat Dublin by one point

 
Again, predicting the result of a single game is definitely not an exact science. That’s especially true with such a fast paced high scoring sport, where one misplaced pass or slip could sway the game one way or the other. But interestingly, by only focusing on the numbers and not the emotional elements of the game, our prediction seems to go against the general consensus of Dublin having the edge on Mayo.

If you were to base your opinion on who would win by just looking at the odds set by the bookmakers, you may be led to believe that Dublin are 8 times more likely to win. But the thing to keep in mind here is the relative number of people making the bets. The population of Dublin is roughly ten times more than the population of Mayo - and with matches like this, many punters bet with their hearts, not their heads - meaning the odds may look disproportionate. Another thing to remember here is that bookmakers set the odds solely with the intention of making a profit no matter who wins. So although Dublin may look like they have this all wrapped up, that might not be the case.

Idiro-analytics-population-dublin-mayo

Our predictive model has Mayo to win by a margin of one point, which at first glance may not seem like such a big deal considering how evenly matched these two counties are (by looking at their results over the last number of years).

But for Mayo to be so close to Dublin really is a major achievement, again when we take into consideration the relative populations of each county.

According to the most recent Irish Sports Council’s monitor report, the percentage of people actively playing Gaelic football in Connacht is 3.7%, whereas in Dublin county it’s just 0.6%. But adjusting for population size, the number of active players the Dublin team could potentially choose from is roughly 8070 with Mayo only having 4014 players.

actively-playing-gaelic-football

Now, if Mayo had the same population as Dublin (1 345 000 people), with an active player percentage of 3.7%, they would have a pool of players to choose from of 50 000, compared to Dublin's 8070.

The Mayo players will know that looking at the history it’s too close to call, but looking at how well they’ve played given the disproportionate advantage Dublin have in terms of population, they may just feel they deserve it more.

Dublin supporters might not want to be too confident.

 

About Idiro
Based in Dublin, Ireland, Idiro Analytics is an award-winning provider of analytics to businesses around the world.

For an overview of Idiro’s analytics services, see our homepage www.idiro.com.
Media contact information
Simon Rees, Clients & Marketing Director, Idiro Analytics.

simon.rees@idiro.com

The benefits of playing Pokemon Go

The benefits of playing Pokemon Go

The benefits of playing Pokemon Go 

 


 

There are plenty of articles about Pokemon Go around the internet, some outlining the frustrations of trying to go about your day without having to sidestep someone staring at their phone, others showing the "mass hysteria" caused by the sighting of a Charizard, and many more about the accidents people have gotten themselves into while playing.

But few people are discussing why this game has become such a huge success and embraced it for what it is, a fun trending topic which has potential benefits.

So to understand this rise in popularity let's cast our minds back to January 2014, and the sudden rise of another game, Flappy Bird. If you're unaware, Flappy Bird was a side-scroller game where the player tries to control a small bird through obstacles using very limited controls.

If you're unaware, Flappy Bird was a side-scroller game where the player tries to control a small bird through obstacles using very limited controls.

flappy-bird

The idea of Flappy Bird becoming one of the most successful mobile games of that time defied all logic to professional game designers. It had little to no original game mechanics or design, it was even openly criticised for plagiarism from other game designers, and yet it took off to become the most downloaded free game in the IOS App Store when it was released.

Others have written about the addictive nature of Flappy Bird and that people liked how it was fresh and simple when so many games were becoming overly complicated, but simply, it became popular because everyone's friends were playing it.

 


 And now it's the same story with Pokemon Go.

Pokemon Go is essentially a simplified version of the game Ingress, (both games were created by Niantic), but created to appeal to diehard Pokemon fans. Pokemon Go and Ingress have a lot of the same features and use the same game mechanics, but one just has some Pokemon thrown into the mix. But, for Pokemon Go things have escalated dramatically.

Pokemon Go started to build momentum, a few people started playing and then more, it got noticed and talked about on sites like Reddit, and the more it was talked about, the more successful it became.

Every once in a while something like this happens, and it doesn't necessarily need to be a game, think of the rise of Tinder (100 million downloads as of March 2016). If people were interested in dating apps, there were plenty of options available, but it became so popular because everyone knew someone who was talking about it.

It becomes the thing to be a part of, to stay in the loop, to stay cool. If everyone playing Pokemon Go was a huge Pokemon fan, they would have already been playing one of the many other Pokemon games out there.

What makes Pokemon Go interesting, compared to other mobile games/apps that have had their moment in the sun, is that non-players can't avoid interacting with players. Step outside and look around and you'll most likely spot someone playing the game.

People may complain, but then again, people tend to complain about everything. The game brings people out to interact with real world places, making it more difficult for others to ignore. But anything that encourages exercise and gets people of all ages to get on their feet and move around is a great thing.

So, being data analysts, (and nerds), we wanted to encourage the exercising aspect of Pokemon Go by working out specific numbers to help players understand the physical benefits of playing. We wanted to help players justify their Pokemon hunting habit by having solid data to back up the 'it's good for you to keep playing' argument.

So let's break it down and look at what you would need to do in order to get to level 20 in the game, of course, people can go higher than level 20, but we’ll just set that as a nice target for now.

 


 

First, let's break down some of the numbers:

 

idiro-analytics-pokemon-go-table1

 


 

To reach level 20, you would need to gain 210,000  experience points overall.

 

idiro-analytics-pokemon-go-image

 

 

To reach that goal by only focusing on catching Pokemon, would involve one of the following:

 

idiro-analytics-pokemon-go-bonus-image

Catching 1909 Pokemon with a curveball bonus / Catching 1909 Pokemon with a nice! throw bonus / Catching 1400 Pokemon with a great! throw bonus / Catching 1050 Pokemon with Excellent! throw bonus
 
 

 
 

But from the chart above, we know that catching Pokemon isn't the only way to gain experience points (XP). 

Another way is to incubate eggs in order to build up your XP. If you have an egg and place it into an incubator, you can hatch that egg and earn XP points. The egg will hatch after you travel 5km and the speed at which you travel that 5km will determine the amount of XP points you earn.

 

idiro-analytics-pokemon-go-eggs-table

Walking 1 egg(5km) - you can earn 8 XP per min / Jogging 1 egg(5km) - you can earn 15 XP per min / Running 1 egg(5km) - you can earn 24 XP per min

 


 

We decided to run and experiment in order to get a benchmark. We started our experiment at a beginner level 5 and played Pokemon Go for 94 minutes in a city centre. 

These were our results:

idiro-analytics-pokemon-go-experiment

 


 
 
So if we take these numbers as the base, in order to reach level 20 we would need to play Pokemon Go for a total of 67 hours, travelling a distance of 202 km,  along the way catching 562 Pokemon.
 
In other words, playing Pokemon Go for just less than three days straight without stopping, and travelling the distance of the London marathon almost 5 times, that's not so bad, right?
 
In terms of calories burned while walking that distance (we'll assume we won't be running those three days), we would burn 27259  calories*
*The number of calories burned here is calculated based on our particular weight and average speed walking.
 
 

 
 
idiro-analytics-pokemon-go-experiment-table-2 

Other less fun activities we would need to do in order to burn that many calories would be*:

 idiro-analytics-pokemon-go-experiment-table-3 

 


 

Now, our numbers here are based on our particular experiment, we would need a larger dataset in order to get more solid results. But, there is no reason why you can’t use this a baseline reference when arguing with friends and family about whether playing Pokemon Go is a waste of time.

We've also run a more complex experiment using the data analytics tool Red Sqirl. We used advanced predictive data analytics techniques to work out where Pokemon will appear in the game. You can read more about this experiment here on hack.guides()


 

 

When it comes to buying houses, people in Dublin are clearly superstitious

When it comes to buying houses, people in Dublin are clearly superstitious

When it comes to buying houses, people in Dublin are clearly superstitious


 

Who would have thought that in this day and age, the Irish people would still be suffering from this ancient affliction? The terrible problem of Triskaidekaphobia, or the fear of the number thirteen.

The Irish people, as a nation have achieved many great things, we’ve become one of the biggest technology capitals in Europe, we’ve produced some of the world's greatest athletes and sports stars, we’ve lead the way in giving equal rights to every citizen, not to mention the musicians and actors that other countries would love to claim as their own, but we know they’re Irish in their hearts.

But alas, we still have trouble shaking the quaint “luck of the Irish” image that American tourists hope to see when they step foot in temple bar. The image of a superstitious nation who base decisions on old wives tales and mythology.

We may say to ourselves that this isn’t the case, that it’s just how the Irish people are portrayed on tea towels found in Carroll’s. But like everything in life, you can only really find the truth by looking at the data.

So that’s what we here at Idiro Analytics did. We are experts in data analytics for business. We looked at the data, to prove how far we’ve come as a nation, that we base our decisions on reason and logic and not on whether or not our palm itches (so we know we’ll be coming into some money). But unfortunately, the data showed us our true colours.

We looked at the price of houses in Ireland over the past six years. We took the data from the Property Services Regulatory Authority, showing every house sold in the Republic of Ireland since January 1st, 2010. We analysed housing data from every corner of Ireland, looking at the values, the locations, the house names etc.

And we found that when it comes to a large decision, such as buying a house, a lot of our nation are still as superstitious as ever. The value of properties sold in counties such as Dublin, Cork, Kildare, Cavan and Longford is significantly lower if the house is a number thirteen.

When analysing the average prices of houses we can see the drop in value for houses numbered 13 compared to their neighbours 12 and 14. It seems the Dublin population are slightly more superstitious 4.01% than the people from Cork 3,46%. In Longford, this drop in value is as much as 23.8%.

So all that hard work done by Brian O'Driscoll, all of those times he put his body on the line to dispel the unlucky nature of the number 13, it appears, have all been in vain.

This isn’t the case for the entire country though, the west of Ireland can be proud that they have bucked the trend. With counties Galway having an 8.67% increase in value for houses numbered 13 over their 12 and 14 neighbours, Mayo having a 3.28% increase, and Sligo having a massive 20.22% increase.

Some other insights we’ve pulled from the data are, that houses with particular words in the names have a higher average value. If you’re looking to buy a house with “Mara” in the name (refers to the sea) in Dublin, you might have to be willing to pay up to 115.18% on average more than houses named “An Tigin” (The Cottage).

The two most popular saints in Ireland to name a house after are St. Patrick and St. Mary, although we probably could have guessed that one. With the choice of over 10,000 named saints (it’s difficult to get a definitive ‘headcount’), the Irish people prefer to keep it traditional.

Idiro Analytics provide Big Data Analytics solutions to businesses across Ireland. We help businesses gain a better return on investment by helping them understand and use the data they already have.

- Ends -
About Idiro
Based in Dublin, Ireland, Idiro Analytics is an award-winning provider of analytics to businesses around the world. For an overview of Idiro’s analytics services, see our homepage www.idiro.com.
Media contact information
Simon Rees, Clients & Marketing Director, Idiro Analytics.
087 240 5999
simon.rees@idiro.com

housing

Three and a half degrees of separation?

Three and a half degrees of separation?

Three and a half degrees of separation?

 

Last month a team of researchers at Facebook posted an article where they update the "mean degree of separation" of Facebook users.  You have most probably heard of the "Six Degrees of Separation" legend: between you and me, as between anyone in the world, there is a chain of acquaintances that connect us; this chain is at most 6 steps long. In other words, you know somebody, who knows somebody, ..., who knows me! Apparently, this idea dates back to Frigyes Karinthy, a Hungarian writer from the first half of the 20th century, but it was then investigated by social scientists and, with the arrival of social networks and Big Data, people have started using online social networks to test it experimentally.

In 2011, researchers at Cornell, the Università Degli Studi di Milano, and Facebook computed the mean degree of separation across the 721 million people using Facebook at the time and found that it was 3.74. Here the separation is defined in terms of intermediate individuals between a given pair, instead of the number of steps. The news is that Facebook users grew to 1.59 billion and the mean degree of separation shrank to 3.57. If you visit the page, a fast algorithm calculates your own mean degree of separation.

However, one may ask how representative is Facebook of real-world social acquaintances.  Maintaining a real-world social relationship is expensive in terms of time and energies while Facebook "friendship" comes almost for free.  Therefore, one can argue that Facebook connectedness overestimates the real connectedness of individuals.

It is reasonable to believe that some of those links are so weak to be non-meaningful. Also, social contacts change with time, while one may expect that most Facebook users wouldn't regularly prune their inactive links.

Finally, albeit large, the Facebook world is still a sample of the whole humankind, and it is certainly not a random sample.  Just to mention a few reasons, access to the Internet in Africa is still much more difficult than in the other continents (even if things are changing fast), the population on Facebook is less represented for elderly people, etc.

As a result, it is possible that the Facebook sample has a lower mean degree of separation than the world population as a whole.  But it is still a very large sample.

All these remarks are quite intuitive, but the network scientist Duncan Watts has criticized both the work and the approach, putting in evidence the counter-intuitive behaviour of the so-called "small-world networks".  In a famous paper written with Steven Strogatz in 1998, they proposed a very simple model that showed how adding a few "shortcuts" in a network (links connecting random pairs of individuals) quickly reduces the shortest path length (i.e. the degree of separation) and, quite interestingly, that doesn't get much shorter if you keep adding random links after this initial drop.  The argument, therefore, is that the world has already become small decades ago, and it's quite unlikely to shrink much further.

This, in my opinion, opens the question if a mathematical model could quantitatively fit the measured reduction in Facebook's mean degree of separation from 2011 to 2016. It could help us in understanding better which topological features of the network are relevant in the process.

Anyway, there are many more details in a real-world social network that still need to be understood besides the degrees of separation.  In a recent paper, the sociologist Robin Dunbar has investigated the relationship between the number of "friends" reported in Facebook and the number of the ones personally perceived by individuals.  In some earlier papers, he and his co-workers had identified progressively self-contained layers of closeness in human acquaintances.  They showed that, typically, human layers of social closeness approximately contain 5, 15, 50 and 150 individuals, plus two external layers of 500 and 1500 alters.  In his recent paper and references wherein, Dunbar shows that online social networks can realistically approximate not only the 150-friends layer but also the two most internal layers.  He also claims that the number of online contacts is usually not larger than the one of offline contacts.

Indeed, there is a minority of users who report a larger set of online friends with respect to the offline world, but it is hypothesised that whose extra online connections are weak acquaintances, that in online social networks cannot be typically distinguished from close friends.  This can only be seen by investigating the traffic among individuals by counting the direct posts on Facebook or replies on Twitter.

Besides online social networks, mobile phone networks are a valid alternative to measure social acquaintances.  Phones are still diffused in areas with low Internet connectivity, communication often carries a cost and it is possible to measure the traffic between individuals.

In Idiro we investigate these problems every day and are able to detect layers of social acquaintance to improve digital marketing strategies and provide value for our customers.