Pages

Tuesday, November 27, 2012

Classifying and Solving the Events in Progress Problem in PowerPivot

Last week, I was going through the PowerPivot forums as usual and was answering a question on How to handle count of data with “valid to/from” dates in DAX. This was a classic case of the “Events in Progress” or finding the number of active events between a start date and an end date (In case you haven’t read, Chris Webb (blog | twitter) has got some great posts on the Events in Progress solutions using both DAX and MDX). I replied back with a possible solution and soon, Marco Russo (blog | twitter) came up with a much more elegant solution. But more than the solution, what caught my eye was that our results were not matching and after a few email exchanges with Marco (and a big thanks to Marco for taking time off to answer my questions), we came up to the conclusion that it was because of the difference in our interpretations of the Event in Progress. That is when I thought of classifying the different interpretations of the “Events in Progress” as well as extending Marco’s solution for the same.

image

In a typical Events in Progress problem, there are fact rows with a start and end date and each row denotes an event. The requirement would be to find the number of events that were in progress on any given date. At a date level, there is no ambiguity as the number of events in progress for a particular day, say D would be the number of those events whose start date is less than or equal to D and the end date is greater than or equal to D (Actually, you can further sub-categorize this into whether the start dates and end dates are inclusive or exclusive i.e. whether the start date is less than or less than or equal to D and whether end date is greater than or greater than or equal to D. For the sake of simplicity, we are assuming that both start and end dates are inclusive). However, the moment you start analysing on a level above the dates, say a month or a year, the question arises on which date of the month to use for comparing the start and end date. Should we be using the first date of the month? Or the last date of the month? Or should we be using a combination of both? There is no wrong answer here and the right definition completely depends on the business requirement. So let us look at the four possible combinations that are possible:-

image

To give an example, say we are analysing the events in progress for January 2012. For Scenario S2, this would mean the number of events which have a From Date (or start date) less than or equal to January 1 2012 (which is the minimum date for January 2012) and To Date (or end date) greater than or equal to 31 January 2012 (which is the maximum date for the January 2012). Here also, we can further sub-categorize whether the start and end dates are inclusive or exclusive but this is not done for the sake of simplicity and also because our solution can easily be modified by removing the equal to symbol.

It might be easier to understand the relationship between the four scenarios with the help of a Venn Diagram.

image

As you can see, Scenario S3 is the superset and includes any event that was active on any date in the selected time period. Scenario S1 requires that the event started before the time period but can end during any day on or after the time period. Conversely, scenario S4 requires that the event started on or before the time period and ends on any day after the time period. Scenario S2 is the intersection of the scenarios S1 and S4 and requires that the event starts before the time period and ends after the time period. This means that those events starting or ending in the selected time period will not be counted.

Lets have a look at the four scenarios from a different visualization perspective.

image

Now that you have got a good idea of the different scenarios, let us look at some sample data that we can use for testing the solution. Given below is a simple fact table which contains Order Numbers with a From Date and To Date and also the Order quantity.

image

The PowerPivot data model that we are going to use is given below

1 PowerPivot model

I just have a simple calendar table (where I have just populated days from 2011 to 2013, again for simplicity) and the above Orders fact table. It is interesting to note that we will not be creating any relationships between the two tables as our solution does not require them. Now follow the steps below to get the solution for the four scenarios:-

1) Before we start, it might be a good exercise to review the original formula that I wrote and why we are moving to a different solution. The original formula I wrote was

OldMsr:=Calculate (
                  CountRows ( Fct ),
                 
Filter (
                           Fct,

                         
CountRows (
                                          Filter (
                                                  Values ( Calendar[DateKey] ),
                                                  ( Calendar[DateKey] >= VALUE ( Fct[FromDate] )  ) && ( Calendar[DateKey] <= VALUE ( Fct[ToDate] ) )
                                                  )
                                            )
> 0
                              )
                      )

To explain this formula, I am dividing it into 3 parts. The red highlighted area gives us the count of the fact rows which is the count of events in progress, provided the correct set of rows in the Fact table is passed as a table filter. Now to pass the correct table filter, we need to filter the fact table which is the part highlighted in orange and the condition to filter it is highlighted in yellow.  Since we don't have any relations between the tables, what I am doing is to filter all the fact rows which has at least one row where there is a date greater than the From Date but less than the To Date. When we are analysing by month, what this will mean is that even if there is a single day in that month which satisfies the condition, include that fact row. This is exactly similar to our scenario S3.

Now some of the problems associated with this:-
- Formula is difficult to understand and write
- Formula can’t be easily changed to that of other scenarios
- Formula doesn’t perform well

2) Now let us look at the formula which was given by Marco in the forums

S3:= Calculate (
  CountRows ( Fct ),
  Filter (Values ( Fct[FromDate] ), Fct[FromDate] <= Max ( Calendar[DateKey] )  ),
  Filter (Values ( Fct[ToDate] ), Fct[ToDate] >= Min ( Calendar[DateKey] )  )
                 )

You can already see how simple it is to understand. We are just passing the From dates which are less than the max of the time period and those To Dates which are greater than min of the time period as a table filter to the calculate statement and there is no need to filter the fact table explicitly. Not only that, it performs much better as it doesn’t have to parse through the entire fact table and just has to parse through two tables having the distinct values of From Date and To Date respectively (Always remember, it is faster to parse through two smaller tables than one big table). Now the best part is that this can be easily extended to the other scenarios as shown below

S1:= Calculate (
CountRows ( Fct ),
Filter (Values ( Fct[FromDate] ), Fct[FromDate] <= Min ( Calendar[DateKey] ) ),
Filter (Values ( Fct[ToDate] ), Fct[ToDate] >= Min ( Calendar[DateKey] ) )
)

S2:= Calculate (
CountRows ( Fct ),
Filter (Values ( Fct[FromDate] ), Fct[FromDate] <= Min ( Calendar[DateKey] ) ),
Filter (Values ( Fct[ToDate] ), Fct[ToDate] >= Max ( Calendar[DateKey] ) )
)

S4:= Calculate (
CountRows ( Fct ),
Filter (Values ( Fct[FromDate] ), Fct[FromDate] <= Max ( Calendar[DateKey] ) ),
Filter (Values ( Fct[ToDate] ), Fct[ToDate] >= Max ( Calendar[DateKey] ) )
)

Wasn’t that simple?

3) Now let us have a look at the results to make sure everything is coming correctly.

image

I have filtered for January 2012 and you can see that the dates in that table are showing the same values for all the five measures. Let us pick 10 January 2012 as an example. Only 11001, 11003 and 11004 suffice the conditions and hence the measures are all 3. The results come up correctly even if we look at a month or year level.

image

Let us take January 2012 as an example, so the max will be 31 Jan 2012 and minimum will be 1 Jan 2012. The order numbers which match the given conditions are given below and matches with the measure counts

S1 = 11001, 11002, 11003
S2 = 11001, 11003
S3 = 11001, 11002, 11003, 11004, 11005, 11006
S4 = 11001, 11003. 11006

4) Another benefit of this pattern is that it can easily be applied to other problems where you might need to find the sum or distinct count or any other aggregations between a from date and to date. Let us say, if I want to find the sum of Order Quantity between the from date and to date according to scenario S1, all I need to do is as follows

S1Sum:= Calculate (
Sum ( Fct[OrderQty] ),
Filter (Values ( Fct[FromDate] ), Fct[FromDate] <= Min ( Calendar[DateKey] ) ),
Filter (Values ( Fct[ToDate] ), Fct[ToDate] >= Min ( Calendar[DateKey] ) )
)

If you want to try this on your own, feel free to download the completed excel file from here.

Friday, November 23, 2012

Changing PivotTable Names in Excel 2013 + Bug Alert

Today is Black Friday here in the US and I don’t have time to write a regular post (as I am scourging for deals online and planning to get out for shopping as soon as the insane lines outside the stores dwindle in numbers). So this is just going to be a quick tip for those who have tried upgrading their PowerPivot models with linked tables from Excel 2010 to Excel 2013 and found out that their PivotTable names don’t match anymore.

1 Changing PivotTable names in Excel 2013

For those who are not familiar, let me give an account of the issue. If you upgrade a PowerPivot model which has linked tables from Excel 2010 to 2013, you will notice that the table names in the field list don’t match the table names in your model. As per my testing, this doesn’t break the measures that you might have created using the old table names (as the measures still use the table names in the data model and this name change looks to be more of a front-end display name) but might be a real inconvenience. Follow the steps below to reproduce and solve the issue:-

1) For the purpose of this demo, let us use the PowerPivot file that I created in Excel 2010 for my last post - Measure Selection using Slicers in PowerPivot. You can download it from this link. If you open it in Excel 2010, you can see the original table names in the field list as well as the linked tables used to create the model above.

2 Table names in Excel 2010

2) Now close the file and then re-open in Excel 2013. Julie Koesmarno (blog | twitter) has put a step by step post to upgrade this workbook from 2010 to 2013 here. Also have a look at this post – Upgrade PowerPivot Data Models to 2013 for any potential issues while upgrading your workbook. Now once you have done that, you will see that all the table names have changed in the field list.

3 Changed table names in Excel 2013

However, there is no change to the table names in the underlying PowerPivot data model.

4 Name in PowerPivot model

3) To change the name in the PivotTable field list, all you have to do is to select a cell in the respective linked table, click on Design and then enter the original / required name in the Table Name cell as shown below

5 Rename table 

4) You can see that the changes are reflected in the PivotTable field list now.

6 Name change in Excel 2013

You could avoid doing these steps in Excel 2013 if you had renamed your linked tables in Excel 2010 using the same way before upgrading. If not, it will just take the name of the linked table when you upgrade.

Apparently, this is a known issue and will be fixed in the next release as per this thread.

BUG ALERT!!!

If you paid close attention to the file that you just upgraded, you can notice that the slicer for measure name is not working. Now this seems to work for people who have installed the Office Professional Plus (v 15.0.4420.1017) but for people like me who have installed the Excel Preview (v 15.0.4128.1025), the slicers don’t work. Turns out the reason was that FORMAT function doesn’t play well within a measure expression. All you have to do is to change the measure expression for MsrValue from

MsrValue:=switch (TRUE,
Min ( Msr[MsrId] ) = 1,
FORMAT ( Sum ( Fct[Sales] ), "$#,#0" ),
Min ( Msr[MsrId] ) = 2,
format ( Sum ( Fct[Quantity] ), "#,#0" ),
Min ( Msr[MsrId] ) = 3,
format ( Sum ( Fct[Profit] ), "$#,#0" )
)

to

MsrValue:=switch (TRUE,
Min ( Msr[MsrId] ) = 1,
Sum ( Fct[Sales] ),
Min ( Msr[MsrId] ) = 2,
Sum ( Fct[Quantity] ),
Min ( Msr[MsrId] ) = 3,
Sum ( Fct[Profit] )
)

Of course this does mean that you lose the formatting. Now you can see that the slicers are working. This clearly is a bug as it works in Excel 2010 (as well as in the Professional Plus edition of Excel 2013). I have raised a Connect issue here, please vote for it so that it can be rectified in the next release. Ok, time to run for the Black Friday shopping now!

Wednesday, November 21, 2012

Measure Selection using Slicers in PowerPivot

As a Business Intelligence professional, I find it really fascin(/frustr)ating working with Excel users. You get exposed to a completely different way of thinking when you work with them, not to say that they are extremely smart and demanding. From the moment I started working with PowerPivot and interacting with Excel users, I knew it was not going to be easy tackling their questions and requirements. You simply cant escape by saying that the required feature is not available because they will push you till you find a workaround. This was the case when one of my users came back to me asking for a slicer which will dynamically help them to select the measures. I replied saying that the slicers are not intended for that purpose and should be used for filtering the data (and not for just selecting or deselecting values like Measures, which can be easily done from the Field List pane). However, the user pushed back saying that he would like to create a dashboard for the top management and didn’t want them to be using the Field List pane. That is when I had to put on my thinking cap.

1 Measure selection using slicers in powerpivot

For the purpose of this post, I am going to use some simple mocked up data which involves three measures (Sales, Quantity and Profit) for country and year.

2 Data model and sample data

Follow the steps below to recreate a slicer which can be used to select or deselect the measures in a pivot table:-

1) Create a table which will have the list of measure names needed in the slicer in the MeasureName column and their representative integer ids in the MsrId column. Import this table into PowerPivot using Linked Tables and name it as Msr.

3 Create Linked table in powerpivot

There is no need to make any relations with the existing tables, and Msr table should be a standalone table.

2) Now make a new measure called MsrValue with the formula given below

MsrValue:=switch (TRUE,
Min ( Msr[MsrId] ) = 1,
Sum ( Fct[Sales] ),
Min ( Msr[MsrId] ) = 2,
Sum ( Fct[Quantity] ),
Min ( Msr[MsrId] ) = 3,
Sum ( Fct[Profit] )
)

4 New measure

This measure will check the value of the MsrId column, and then use the appropriate measure. However, it should be noted that when there are more than one value of MsrId (or if more than one measures are selected), only the measure having minimum value of MsrId would be displayed. So how will we display more than one measures on the pivot table? Read on.

3) With this very simple setup, we are ready to view the results in the Pivot table. Drag and drop MeasureName column into the Slicer, Year into the Column Labels and Country in the Row Labels. Then drag and drop MsrValue in the Values and we should have something similar to the image below. I have also included the fact rows above the pivot so that we can compare.

5 Drag column to pivot table and insert slicer

Now there are a couple of problems with this pivot table. You would have noticed that the pivot table works correctly if only one measure is selected, but if there are more than one selections for the measure name, it will only show the measure with the minimum of the MsrId (eg, Sales in the image above). Also, we would like to see the name of the measure in the column above for clarity. To solve both of these problems, follow the next step.

4) Just drag and drop the MeasureName column in the Column Labels above the Year.

7 Field list pane

Now you can see that multiple measures are displayed and also that the measure names are displayed in the column above.

6 Measure getting selected in pivot table as per the slicer selection

You can also change the formatting of the measures if needed

MsrValue:=switch (TRUE,
Min ( Msr[MsrId] ) = 1,
FORMAT ( Sum ( Fct[Sales] ), "$#,#0" ),
Min ( Msr[MsrId] ) = 2,
format ( Sum ( Fct[Quantity] ), "#,#0" ),
Min ( Msr[MsrId] ) = 3,
format ( Sum ( Fct[Profit] ), "$#,#0" )
)

8 formatted measures

Luckily, that wasn’t as hard as I thought (which means that I could still use my evening free time to focus on more important problems like whether the chicken or egg came first into the world!). If you don’t have such important problems to focus on like me, you can download the completed file from this link and test it out on your own!

Sunday, November 18, 2012

PASS Summit 2012 Hangover

November has always been my favourite month of the year though I don’t really have a clear cut reason for that. Maybe it is because that my birthday falls in this month and I look forward to opening all those lovely gifts that I usually get. Or maybe because it is the time of the year when you get to see nature in her splendid fall colours and her transition to winter. Or could it be the simple reason that you get 50% off Halloween candy? (in case you didn’t know, I have a BIG sweet tooth) Anyways, I always look forward to November eagerly. Moreover, this year’s November was extra special -  I was going to attend my first PASS Summit.

Summit2012_header_passlogo

I started my baby steps in Microsoft BI six years back although I was blissfully ignorant of whatever community activities was happening then. I have always been an inquisitive mind (or a heckler, depends on whom you ask) and used to ask a lot of questions. Usually, my technical doubts were answered by the senior consultants in my company, and it was by chance that I got redirected to the MSDN forums (the truth is, they were tired of my constant questions). I think that was when I started getting cognizant of the SQL world and the community. I asked more questions, started recognizing the big names in the BI world, read more blogs by them and before I knew it, I was hooked onto this technology. I started answering questions in the forums apart from just asking them and I really enjoyed the kick that you get out of helping people. It also fuelled my personal growth and soon, I decided to share my learning by starting a blog. Fortunately for me, that was the time I moved to London, UK and I got a first real taste of user groups. I started attending the bi-annual SQLBits conference and it sort of inspired me to start speaking. However, I had never given a talk before (not even in my company) and my bouts of stage fright were legendary in my friend circles. But when you wish for something really hard, the entire universe conspires to make it happen. I decided to mail Chris Webb (Blog | Twitter) asking him if I could speak on one of the user groups and he set me up for my first talk (I don’t think I would have ever had the guts to ask again if I was rejected then). He also encouraged me to submit my abstract for SQLBits (which eventually got picked) and I slowly started getting a little more confident about public speaking. This was one of the many pushes forward that I got from my #sqlfamily (oh yes, that is what they are to me now, not just a community). When I moved to the US, I started submitting to the SQL Saturdays with the encouragement of many people in the community, and I am proud to say that the Washington DC one is going to be my 10th SQL Saturday this year (and this is apart from the user group, code camp and virtual chapter talks that I gave). For a person who would have his knees shake if he was standing and speaking in front of 4 people, this definitely was a big achievement and I have my #sqlfamily to thank for. For me, this cooperation and mutual encouragement was best epitomised in SQL Saturday New York City. I was the next speaker after Kevin Kline (Blog | Twitter) and just when I was setting up my laptop, I realized that I didn’t have my HDMI-to-VGA connector for projecting (and I could have sworn that I had checked it before leaving the speaker room, must have dropped it somewhere). I was totally freaked out - none in the speaker room had a converter and I had the demos only on my laptop. I was almost on the verge of cancelling my session. As a last try, I asked Kevin, who was on his way out, whether he could lend me his laptop so that I could try to use his laptop (which had a VGA port) for setting up a live meeting between mine and his laptops, and then projecting the demos from his laptop. Not only did he agree, but he came to my room to calm down my nerves, helped me in setting up the laptops and was throughout my side for the entire session. The session went great and I had lots of my attendees saying that it was really good. If it wasn’t for Kevin, I would have had an embarrassing situation and he saved my day (and to put things to perspective, he barely knew me!!!). I couldn’t even say thanks properly to him because he was already late for some other appointment and had too rush off. That is how selfless people in the #sqlfamily are and I really wanted to be a part of the biggest family reunion of the #sqlfamily – the PASS Summit.

I reached Seattle on Tuesday afternoon and straightaway went to the Chapter Leaders meeting where we discussed on how we can make our user groups more effective. It was very useful to hear from other chapter leaders on how they are organizing their user groups in terms of finance, attendees, marketing, speakers, etc. I also made sure to attend the #sqlFirstTimers meeting where all of the first time attendees were given advice on how to network and make the most of the PASS Summit.

PASS Photos1

Over the next three days, all the attendees were presented with a rich selection of sessions across all tracks and difficulty levels. I chose the Intermediate to Advanced topics for Business Intelligence and was truly happy with the results. It was great to attend sessions by Chris Webb, Marco Russo, Alberto Ferrari, Stacia Misner, Allan Folting & Akshai Mirchandani, Paul Turley, Melissa Coates, etc and I learned a lot from these great speakers. It was very difficult making a choice between all the great sessions and I hope to catch up with the rest of the sessions through the recordings.

PASS Photos2

As I mentioned before, the PASS Summit was also a place to meet my #sqlfamily. There were a lot of people I had met over twitter and it was a fabulous opportunity to associate faces to those twitter handles. I was also able to reconnect with my friends from UK and India and it was so much like meeting your family again after a long interval.

PASS Photos3

The various parties after the sessions kept me busy in the night too. It was awesome getting my first dose of #sqlkaraoke and I was up all the 4 nights – thanks to the Quizball event by SQL Sentry, Hard Rock Cafe event by Pragmatic Works, the Experience Music Project event by Microsoft and of course, the traditional #sqlkaraoke event at Bush Gardens.

PASS Party Photos

I also made sure to stay a couple of days after the event to explore the city as it was my first time in Seattle. I must say the weather gods took mercy on us and we had some nice sunshine, so unlike Seattle Smile.

PASS Photos4

All in all, it was a great trip and it was hard to bid goodbye to the city when it all ended. People usually say that when you go somewhere with a lot of expectations, you usually end up with disappointment. However, the PASS Summit 2012 was above my expectations and I really savoured every moment of it. Guess it is time to end my long post. Hope to see all of you guys for the next PASS Summit, which will be held in the beautiful city of Charlotte. It will be much easier for me as I am based out of Charlotte and I will be there to welcome all of you guys to our annual family reunion!

Monday, November 5, 2012

SQL Saturday Charlotte Journal

Was it fun? Oh yeah, it was. Was it perfect? Hmm, maybe not. But was it awesome? Definitely yes! Let me tell you, it was an amazing experience being a part of the whole process. SQL Saturday Charlotte was my 9th SQL Saturday of this year, but there was a big difference from the other ones – this time, I was wearing the Organizer hat and not the Speaker hat.

logo1

It all started around March of this year when me and Javier Guillen (blog | twitter) were seriously contemplating on starting a BI user group in Charlotte. We pulled in our common friend, Melissa Coates (blog | twitter) to this plan and she in turn informed us that Rafael Salas (blog | twitter) was also trying to do the same. We joined forces and this marked the start of the evening #BeerAndBI rituals, where we started to come up with ideas and plans to start our new user group.

Beer and BI

In one of those sober (?) rituals, the SQL Saturday idea was introduced and then there was no turning back from there. We informed Karla Landrum (twitter) of our intentions and she was extremely supportive. We started with checking venues around June, and I must say we were really lucky to have Alberto Botero of Central Piedmont Community College as our friend. He acted as a bridge between the college and us, and ensured we had a venue to start off with. We confirmed the dates for SQL Saturday with Karla by June end and started our preparations. Having a big team has its own advantages, it was easier to split tasks without over-burdening anyone. Rafael was in charge of the sponsors and food section, Javier was in charge of finance and venue related section, Melissa was in charge of social media, website, emails, general coordination, etc and me in charge of recruiting speakers and making the schedule. Even though we were assigned specific tasks, it was great to see everyone pooling in their efforts wherever they could. We also got Shannon Lowder (blog | twitter) to help us coordinate all the volunteers. And when the day finally came for setting up the venue on the day before, there were lot of unforgettable moments, for example stuffing the bags on the eve of the event with Javier, Rafael setting off the fire alarm when we were already running late for the speaker dinner, setting up the tables and the cute little raffle boxes made by Melissa, last minute printing with the help of Shannon’s iPhone, etc. The speaker dinner was well attended and was fabulous, we even managed to have a few shots before we left.

Event pics

On the D-day, we reached the venue around 6:45 am in the morning and from that time on till the afternoon, we were so busy that we couldn’t even speak much with each other. I was always of the mentality that we had planned more than necessary and things will flow pretty easily during the event. I couldn’t have been more wrong and it really took a lot of effort from our as well as our volunteers’ side to make sure everything went well. This made me appreciate the great work done by all our volunteers and organizers of past SQL Saturdays all the more. At the end of the day, it felt great to hear all the positive responses about the event from our attendees. A big thanks to all our sponsors, speakers and volunteers who made this possible.

Some of the volunteers of #sqlsat174

It was fun and although it took me around a week to recover completely, I sort of miss all the madness involved in planning a SQL Saturday right now. Anyways, I am sure there will be much more to it with the start of the new BI group in Charlotte.

CBIG

Time to pack up my bags for the PASS Summit and reward myself with a relaxing week at Seattle!