Microsoft Power Query – Quick & Easy Data Visuals

As an example of what can be achieved quickly with Microsoft Excel Power Query and Pivot Charts, I created a simple but effective visual alternative to the English Premier League table using some web-based match data.  Here is how it looks after twelve matches into the 2018/19 season.

Premier League Positions Graphic

The green-shaded area part of the chart shows the points accumulated so far for each team.  The values are interpreted from the Y-axis to the right of the chart.  The teams are sorted in descending order according to the points they have achieved (i.e. league table position).  Man City are topping the league (32 points) and Fulham are propping things up with 5 points.

The stacked-columns show matches won (blue part), drawn (yellow part) and lost (green part).   The value for these columns follow the scale on the Y-axis on the left hand side of the chart.  Because the teams are shown in descending league table position, as you would expect, the size of the blue columns gets progressively smaller whilst the the green columns grow in size – that is, teams at the top of the league have won more games and those at the bottom have lost more.  The visual is very effective – at a glance it is easy to see that three teams – Man City, Liverpool and Chelsea – have not lost any matches as they have no green areas in their stacks.  Only one team has no draws so far this season – Tottenham.  At the other end of the table, although Fulham have won as many games as Southampton, they have lost more whilst Southampton have managed more draws leaving them a little higher up the table.

I also added some lines to the chart  – for goals scored and goals conceded.  And a nice extra one for clean sheets – that is, the number of matches in which the team didn’t concede any goals.  These all also use the scale on the Y-axis shown on the right hand side of the chart.  As you would expect, the league leaders have the most number of clean sheets but, some of the teams lower down the league are doing pretty well – Newcastle and Southampton for example.  They just need to start scoring a few more goals.  Fulham have the most leaky defense – they are the only team who haven’t managed a clean sheet so far into the season.  Observe how the lines for goals scored and goals conceded cross at Wolves thus splitting the teams nicely into two equal groups of ten – those whose goal difference (goals scored minus goals conceded) is positive versus those whose goal difference is negative.

This is an excellent example of a custom combo chart in Excel.  It combines three chart types – stacked column, area and line charts.  It uses two Y axes, ideal for when the range of values of constituent data wouldn’t work with just one Y-axis.  For example, in this chart the maximum number of games played is just twelve so the Y-axis on the left has a maximum value just beyond that whereas the maximum number of goals scored is 36 so the Y-axis on the other side of the chart has a scale going right up, and just beyond that value.  If all the parts of the charts used just that latter scale the stacked columns would be very small and the chart would not be very visually pleasing at all.

So, how was the visual made?  As I mentioned at the beginning I used Microsoft Power Query to grab the data from a web resource.  Here is how Microsoft themselves describe Power Query

Power Query is the Microsoft Data Connectivity and Data Preparation technology that enables business users to seamlessly access data stored in hundreds of data sources and reshape it to fit their needs, with an easy to use, engaging and no-code user experience.

The data came from a web source – primarily the historical data section of football-data.co.uk (“Free football betting odds & results data archive, live scores, match statistics, odds comparison, betting advice & football news”).  I won’t go into the details of how to get the data into Excel – that is documented elsewhere (docs.microsoft.com/en-us/power-query/).

Once the data was in it just needed a small amount of modelling.   The data needed to be team-based rather than match-based so I referenced the original query twice to create new queries; one for home games and the other for away games.  I then added custom columns so that home teams, away teams and season names could be identified – I didn’t need all of that specific information for now but, in a full reporting or dashboard solution, I certainly would – it’s quick and easy to do so I went ahead ready for the anticipated use later on.  Finally, the home games and away games queries were appended to create a single query for all games in the data model.

Actually, because I also knew that users would be interested in cutting this data by season, although again not used here, I imported some data from another web source – the very interesting kaggle.com (“Kaggle is the place to do data science projects”).  A data-set is available there in which all the historical data from football-data has already been combined by another user.  With a little more modeling I was able to append this to the rest of my data.

The final part of the modelling process was to add “Measures” for the games played, won, drawn, lost etc.   Measures (also known as calculated fields) are one of the most powerful features in Power Pivot. You will use them frequently in your data analysis. They are calculations you create for measuring a result relative to other factors pertinent to your analysis, such as total sales calculated for time, geography, organization, or product. (Create a Measure in Power Pivot).

And now for the visualization – PivotCharts.  Sometimes it’s hard to see the big picture when your raw data hasn’t been summarized. Your first instinct may be to create a PivotTable, but not everyone can look at numbers in a table and quickly see what’s going on. PivotCharts are a great way to add data visualizations to your data.  (Create a PivotChart).

And one final but significant point to add.  Excel Power Queries are refresh-able.  The latest data can be imported without having to recreate the query.  I’ll click the refresh button in a few weeks to see how things look then.  I’ll append the new visual to this post.