Excel Add-in & User Defined Function

User Defined Functions are really useful.  With a little bit of VBA you can really streamline your workflow and by putting them into an Add-in they are permanently available every time you open Excel.

In this post, I’ll give a very simple example to give a taste of what can be achieved.  Imagine you are working on an Excel-based version of Scrabble – the crossword game (https://scrabble.hasbro.com/en-us).  Let’s say, in your version of the game, the player types his word, one letter per cell, like this example:

Let’s not worry about vertical words for the purpose of this basic version!  What we would like now is a way to calculate the value of the word in Scrabble.  S is worth 1 point, C is worth 3 points, R is worth 1 point, A is worth 1 point, B is worth 3 points, L Is worth 1 point and E is also worth 1 point.  So the total value of the word SCRABBLE is 14 points (also no special squares for now!).  Wouldn’t it be nice if there were an Excel formula that would do that calculation?  Something like this:

=WORDVALUE(C3:J3)

With a User Defined Function you can achieve this.  I won’t show any ways this can be done using standard Excel formulas – suffice to say, for each letter in the word, you would probably need a lot of nested IF formulas or, if you had the foresight to make a lookup table of letter values, a lot of nested VLOOKUP formulas.  All the letter values would then have to be summed for the whole word (which could vary in length).  It doesn’t take much thinking about to realise that the resultant formula would be very complex and error-prone.

Let’s simplify this with a User Defined Function.  The best place to start is with a table of letter values – we’ll definitely need that.

I’ve also given the sheet a new name – “Letter Values”.  Next, in the Visual Basic editor, let’s add a new function called WORDVALUE, that looks like this:

Function WORDVALUE(Word As Range)
    Dim Score As Integer
    Score = 0
    For I = 1 To Word.Columns.Count
        Letter = LCase(Word.Cells(1, I))
        LetterValue = Application.WorksheetFunction.VLookup(Letter, Sheets("Letter Values").Cells.Range("A:B"), 2, False)
        Score = Score + LetterValue
    Next I
    WORDVALUE = Score
End Function

As you can see, it is a small block code.  It is easy to see how the score is calculated for the Word that is passed to the function as an Excel Range of cells.  Please note though – the final version will be more complex than this as some error handling code should be added (what if the Range is not a single row?  What if one of the cells is not a letter?  And so on.  And, of course, we haven’t catered for the special Scrabble tile – BLANK).

Now, back in the Worksheet, type the word SCRABBLE, in cells C3 through H3 and the formula =WORDVALUE(C3:H3) in another cell and that cell will show the expected result – 14!

Now, for the final touch – to make the function permanently available in Excel by saving the function into an Add-in.  First of all, update the VLOOKUP in the VBA code by adding the workbook name:

And then use “Save as…” to save the workbook as an Excel Add-in, called “scrabble.xlam” (as preempted in the VBA code above).  Now, as long as you have that Add-in activated in your installation of Excel, the WORDVALUE function will always be available.

Adding Run Time Parameters to Excel Power Query

The Benefit of Parameters

Take the scenario where you have a workbook with generic data connections that needs to be distributed to several different users, each with their own unique set of run time requirements meaning that the generic connections need tweaking for each one.  That is fine if there are only one or two connections – the queries can be edited, one by one, in each copy of the workbook, before sharing them.

However, what if the parameters could be declared in the workbooks and by choosing the Data | Refresh All option the entire workbook would then be good to go?  No problem!

A Simplistic Example

For the example, here is a data query that gets a simple data set (a subset of the American Trends Panel, courtesy of http://www.pewsocialtrends.org):

If you look at the steps that went into getting this data in the Power Query Advanced Editor, you would see something familiar like this:

let
    Source = Csv.Document(File.Contents("C:\Documents\American Trends Panel Wave 28\ATPW28.dat"),[Delimiter=",", Columns=139, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Use First Row as Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Removed Other Columns" = Table.SelectColumns(#"Use First Row as Headers",{"QKEY", "Device_Type_W28", "LANGUAGE_W28", "FORM_W28", "EXPECTLIFE_W28"}),
    #"Change Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"QKEY", type text}, {"Device_Type_W28", Int64.Type}, {"LANGUAGE_W28", Int64.Type}, {"FORM_W28", Int64.Type}, {"EXPECTLIFE_W28", Int64.Type}}),
in
    #"Change Type"

Now, let’s say we need two users – each to work on the data for one of the languages in the “LANGUAGE_W28” column (which could be “9” = English or “10” = Spanish) so we need to share only the relevant language data sets with them.  To update the query prior to sharing the workbook, of course, the following step could be added:

= Table.SelectRows(#"Change Type", each ([LANGUAGE_W28] = 9))

It can be seen, as the example stands, that the English speakers data set would be selected.  It would be necessary to modify the value being compared in the query to 10 to get the Spanish speakers data set.

Take a look at this more generic filter:

= Table.SelectRows(#"Change Type", each ([LANGUAGE_W28] = language_parameter))

All we need now is a way to get language_parameter to be set – on the fly – to  the value we would like to be compared to and upon which to filter the data set.  First of all, create a lookup table in the workbook:

Next step, is to allow the user to select the language they would like to filter on – by name rather than code is my preference, being much more user-friendly and less prone to error.  Create a new table – Table_Parameters – in the workbook for the run-time parameters):

The second column in the table uses a VLOOKUP to fetch the code from our languages table.  That way the user chooses the language in the drop down and the code is updated.   The drop down is enabled through Data Validation options in Excel – I’ll be covering that in more detail in a future post.

The final step then is to get Power Query to grab the language code from the table during execution.  In the Advanced Editor for the query, create a reference to the table:

Workbook_Parameters = Excel.CurrentWorkbook(){[Name="Table_Parameters"]}[Content],

Make sure that the Language_Parameter is going to be the same data type as the column in the original data set:

parameters = Table.TransformColumnTypes(Source,{ {"Language_Parameter", Int64.Type}})

And – finally – set the language_parameter variable that is going to be used in the filtering later on:

language_parameter = parameters{0}[Language_Parameter],

in the line above, language_parameter is being set to the single value that is in the Language_Parameter column (part in square brackets) of the first record (part in curly braces) in the parameters table.  Records in tables start from record zero, so the {0} seen above means the first record.

Now, the desired final data set is easily obtained by choosing the language in the drop down and then pressing the Data | Refresh All menu option.  No need to edit the Power Query ever again – even if additional language options are added in the future.

Here is the final Power Query in full:

let

Workbook_Parameters = Excel.CurrentWorkbook(){[Name="Table_Parameters"]}[Content],
parameters = Table.TransformColumnTypes(Workbook_Parameters,{{"Language_Parameter", Int64.Type}}),
language_parameter = parameters{0}[Language_Parameter],

Source = Csv.Document(File.Contents("C:\American Trends Panel Wave 28\ATPW28.dat"),[Delimiter=",", Columns=139, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Use First Row as Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Removed Other Columns" = Table.SelectColumns(#"Use First Row as Headers",{"QKEY", "Device_Type_W28", "LANGUAGE_W28", "FORM_W28", "EXPECTLIFE_W28"}),
#"Change Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"QKEY", type text}, {"Device_Type_W28", Int64.Type}, {"LANGUAGE_W28", Int64.Type}, {"FORM_W28", Int64.Type}, {"EXPECTLIFE_W28", Int64.Type}}),
#"Filtered Rows" = Table.SelectRows(#"Change Type", each ([LANGUAGE_W28] = language_parameter))
in
#"Filtered Rows"

Another thing that can be achieved with parameters like this is to add a form of dynamism, within Power Query scripts, to column names in tables – something that isn’t usually possible with Power Query.  I will cover this, with examples, in a future post.

How The Hotel Luxuria started to use Excel Tables

The Hotel Luxuria is a fictitious boutique hotel with ten rooms.  The owner wanted to start analysing the bookings a bit more so he asked the Manager to “put something together”!

The first thing is to understand the potential of the hotel – what the revenue would look like at full occupancy – the manager copied the current room rates into an Excel Worksheet

He knew that this data would be a lot more manageable if it were an Excel Table.  It is easy to convert the range of cells to a table as well as getting them nicely formatted at the same time by selecting the relevant cells and then choosing the Excel menu item “Format as table”, which is in the Home menu ribbon.  The drop-down presents a lot of table styles to suit every taste.

Format as table

Next thing is to sum all the room rates to show the revenue per night if the hotel were full.  With any cell in the table selected, on the “Design” menu ribbon, from “Table Style Options”, select “Total Row”.

Add total row

Excel adds the total row and the desired calculation.  However, as can be seen from the drop down now available in the table, this not limited to straightforward totals – many functions are available.  They can even be mixed and matched along the total row.

Total Row options

So far so good.  Now the manager wants to start looking at actual bookings.  He created a new table with all the bookings for the week, along with the discount given on the full room rate.  One row per night per room and no entry at all for nights where the room wasn’t used at all.

Hotel Luxuria Occupancy Rates

Notice how each of the tables was intuitively given meaningful header labels as we went along.  It is also a good idea to give meaningful names to entire tables as well.  This will make the entire solution much more user-friendly later on.  You need the Name Manager for this; it’s an option in the “Defined Names” drop-down in the Formulas menu ribbon.

Excel Name Manager

In the Name Manager, edit each of the tables in turn and give them names, like “Room_Rates” and “Occupancy_Rates”.

Now to start analysing the occupancy rates.  Back in the Room Rates table let’s add a new column to show how many nights each room was used for during the week for which we have data.  Right-click any cell in the last column and select the menu options Insert | Table Column to the Right.  Give the new column a meaningful name – like “Occupancy”.

Add Columns to an Excel Table

Now we are going to add a formula to show, for each room, how many nights (out of seven) the room was used.  In any cell in the Occupancy column you can enter the formula, which Excel will then apply to the entire column in the table body.  The formula needed in this case is

=COUNTIF(Occupancy_Rates[Room Number],[@[Room Number]])

This formula is probably a familiar one – COUNTIF – but the rest of the formula may look a little unfamiliar at first, but not too difficult to comprehend after a few moments.  This readability is thanks to the meaningful table names and column header names.  I will go into all the detail around these kinds of formulas in a future article.  Select the sum function in the drop down in the Total Row for the Occupancy column as well to show the total room-nights occupancy.

Room Occupancy for week

With the techniques already covered, it wouldn’t be a huge leap to start summing the revenue received and cross-referencing that against expected incomes, taking into account discounts given and so on.  Obviously, there are many extras that the manager will be adding to this data model – Room Service, Restaurant and Bar bills, Gym and Health facilities, Excursions etc.

Room Occupancy for week - enhanced

 

Microsoft Power Query – Quick & Easy Data Visuals

As an example of what can be achieved quickly with Microsoft Excel Power Query and Pivot Charts, I created a simple but effective visual alternative to the English Premier League table using some web-based match data.  Here is how it looks after twelve matches into the 2018/19 season.

Premier League Positions Graphic

The green-shaded area part of the chart shows the points accumulated so far for each team.  The values are interpreted from the Y-axis to the right of the chart.  The teams are sorted in descending order according to the points they have achieved (i.e. league table position).  Man City are topping the league (32 points) and Fulham are propping things up with 5 points.

The stacked-columns show matches won (blue part), drawn (yellow part) and lost (green part).   The value for these columns follow the scale on the Y-axis on the left hand side of the chart.  Because the teams are shown in descending league table position, as you would expect, the size of the blue columns gets progressively smaller whilst the the green columns grow in size – that is, teams at the top of the league have won more games and those at the bottom have lost more.  The visual is very effective – at a glance it is easy to see that three teams – Man City, Liverpool and Chelsea – have not lost any matches as they have no green areas in their stacks.  Only one team has no draws so far this season – Tottenham.  At the other end of the table, although Fulham have won as many games as Southampton, they have lost more whilst Southampton have managed more draws leaving them a little higher up the table.

I also added some lines to the chart  – for goals scored and goals conceded.  And a nice extra one for clean sheets – that is, the number of matches in which the team didn’t concede any goals.  These all also use the scale on the Y-axis shown on the right hand side of the chart.  As you would expect, the league leaders have the most number of clean sheets but, some of the teams lower down the league are doing pretty well – Newcastle and Southampton for example.  They just need to start scoring a few more goals.  Fulham have the most leaky defense – they are the only team who haven’t managed a clean sheet so far into the season.  Observe how the lines for goals scored and goals conceded cross at Wolves thus splitting the teams nicely into two equal groups of ten – those whose goal difference (goals scored minus goals conceded) is positive versus those whose goal difference is negative.

This is an excellent example of a custom combo chart in Excel.  It combines three chart types – stacked column, area and line charts.  It uses two Y axes, ideal for when the range of values of constituent data wouldn’t work with just one Y-axis.  For example, in this chart the maximum number of games played is just twelve so the Y-axis on the left has a maximum value just beyond that whereas the maximum number of goals scored is 36 so the Y-axis on the other side of the chart has a scale going right up, and just beyond that value.  If all the parts of the charts used just that latter scale the stacked columns would be very small and the chart would not be very visually pleasing at all.

So, how was the visual made?  As I mentioned at the beginning I used Microsoft Power Query to grab the data from a web resource.  Here is how Microsoft themselves describe Power Query

Power Query is the Microsoft Data Connectivity and Data Preparation technology that enables business users to seamlessly access data stored in hundreds of data sources and reshape it to fit their needs, with an easy to use, engaging and no-code user experience.

The data came from a web source – primarily the historical data section of football-data.co.uk (“Free football betting odds & results data archive, live scores, match statistics, odds comparison, betting advice & football news”).  I won’t go into the details of how to get the data into Excel – that is documented elsewhere (docs.microsoft.com/en-us/power-query/).

Once the data was in it just needed a small amount of modelling.   The data needed to be team-based rather than match-based so I referenced the original query twice to create new queries; one for home games and the other for away games.  I then added custom columns so that home teams, away teams and season names could be identified – I didn’t need all of that specific information for now but, in a full reporting or dashboard solution, I certainly would – it’s quick and easy to do so I went ahead ready for the anticipated use later on.  Finally, the home games and away games queries were appended to create a single query for all games in the data model.

Actually, because I also knew that users would be interested in cutting this data by season, although again not used here, I imported some data from another web source – the very interesting kaggle.com (“Kaggle is the place to do data science projects”).  A data-set is available there in which all the historical data from football-data has already been combined by another user.  With a little more modeling I was able to append this to the rest of my data.

The final part of the modelling process was to add “Measures” for the games played, won, drawn, lost etc.   Measures (also known as calculated fields) are one of the most powerful features in Power Pivot. You will use them frequently in your data analysis. They are calculations you create for measuring a result relative to other factors pertinent to your analysis, such as total sales calculated for time, geography, organization, or product. (Create a Measure in Power Pivot).

And now for the visualization – PivotCharts.  Sometimes it’s hard to see the big picture when your raw data hasn’t been summarized. Your first instinct may be to create a PivotTable, but not everyone can look at numbers in a table and quickly see what’s going on. PivotCharts are a great way to add data visualizations to your data.  (Create a PivotChart).

And one final but significant point to add.  Excel Power Queries are refresh-able.  The latest data can be imported without having to recreate the query.  I’ll click the refresh button in a few weeks to see how things look then.  I’ll append the new visual to this post.