Pages

Thursday, June 7, 2012

LastNonEmpty in Tabular mode : Part 1

My experience as a consultant has mostly been in the manufacturing sector and no wonder, I had frequent encounters with the notorious LastNonEmpty aggregation type in SSAS. I wouldn’t be exaggerating if I say that I had struggled quite a bit to understand and adapt this aggregation type to my needs when dealing with inventory, stock and status related data. I had dealt with varying scenarios but never thought of classifying these findings until a chance discussion with Javier Guillen (blog | twitter) prompted me otherwise. Together, we went through the scenarios and tried classifying them as well as applying them to the tabular mode and this post will be the first part of that effort.

image

For the sake of simplicity, we adapted our discussions to the manufacturing scenario and the key dimensions used in the explanation below would be Time, Store and Product. However, this can be easily translated to other scenarios where the LastNonEmpty might be required. The LastNonEmpty (LNE) scenarios can be broadly classified into 2:- the typical Last Non Empty behavior of SSAS (where you would want to find the last non empty value within the selected time period) and the Last Ever Non Empty (credit for term usage - Chris Webb in this post) behavior (where you would want to find the last non empty value from all preceding time periods instead of just the selected time period). These types can be further broken down to at an overall level or at a store level as is shown in the chart below.

 image

Let me briefly try to explain the four scenarios with an example.

image

Assume that S1 - S8 are stores and C1, C2, C3, C4 are their parent groups as shown in the figure. The stock for each of the stores as well as the date in which they were recorded is also shown above. Now, the 4 LNE scenarios are given below:-

1) Last Non Empty(All Stores)

for C1 in 2012: 5+5 = 10 (last non empty date is 31-May-2012)
for C2 in 2012: 25 (last non empty date is 12-May-2012)
for C3 in 2012: None (no value in 2012)
for C4 in 2012: None (no value in 2012)
for Total in 2012: 5+5 = 10 (last non empty date is 31-May-2012)

This is equivalent to the LastNonEmpty aggregation type in SSAS multi-dimensional, where the last non empty date is calculated across all the stores within the selected time period and the values in that date are summed up. 

2) Last Non Empty(Each Store)

for C1 in 2012: 5+5 = 10 (last non empty date is 31-May-2012 for both stores)
for C2 in 2012: 10+25 = 35 (last non empty date is 12-Jan-2012 for S3 and 12-May-2012 for S4)
for C3 in 2012: None (no value in 2012)
for C4 in 2012: None (no value in 2012)
for Total in 2012: 5+5+10+25 = 45 (sum of all the last non empty dates for each store)

Here, as shown in the example, the last non empty value is calculated for each store within the selected time period and then summed up. The equivalent of this aggregation type in SSAS multi-dimensional has already been explained by Teo Lachev in this post.

3) Last Ever Non Empty(All Stores)

for C1 in 2012: 5+5 = 10 (last non empty date is 31-May-2012)
for C2 in 2012: 25 (last non empty date is 12-May-2012)
for C3 in 2012: 15 (no value in 2012, but there is value in 12-Dec-2011 for S6)
for C4 in 2012: 10 (no value in 2012, but there is value in 12-Dec-2011 for S8)
for Total in 2012: 5+5 = 10 (last non empty date is 31-May-2012)

Here, the last non empty date is calculated across all the stores from the very first date in the database to the last date of the selected time period and the values in that date only are summed up. Note that the last non empty date need not belong to the selected time period here, as is seen from the values for C3 and C4. The equivalent of this aggregation type in SSAS multi-dimensional has already been explained by Chris Webb in this post.

4) Last Ever Non Empty(Each Store)

for C1 in 2012: 5+5 = 10 (last non empty date is 31-May-2012)
for C2 in 2012: 10+25 = 35 (last non empty date is 12-Jan-2012 for S3 and 12-May-2012 for S4)
for C3 in 2012: 10+15 = 25 (no value in 2012, but there is value in 12-Nov-2011 for S5 and 12-Dec-2011 for S6)
for C4 in 2012: 5+10 = 15 (no value in 2012, but there is value in 12-Nov-2011 for S7 and 12-Dec-2011 for S8)
for Total in 2012: 10+35+25+15 = 85 (sum of all the last non empty dates for each store)

Here, the last non empty date is calculated for each store from the very first date in the database to the last date of the selected time period and the values in that date are summed up for each store. Note that the last non empty date need not belong to the selected time period here, as is seen from the values for C3 and C4. It is not possible to have a really well performing method in SSAS multi-dimensional for this aggregation type without insanely extrapolating the fact data; this leaves us with only the extremely slow choice of making a calculated measure.

DAX is very powerful for performing such calculations and by the end of this series, you will see the performance gains that comes with having these scenarios in SSAS tabular. Let me go ahead and give the solutions for the first two types of Last Non Empty in DAX. I am using the Contoso database for my examples and they can be downloaded from this location.

1) Last Non Empty(All Stores)

An example of this scenario is when the product stock for all stores are recorded on a regular interval on the same date (daily, weekly, monthly, etc). As long as all the stores have data for the same dates, this formula will work in calculating the last non empty value. This is the simplest calculation of all the four and can be done using the DAX formula below

LNE_1:= Calculate (
Sum ( FactInventory[OnHandQuantity] ),
Filter ( DimDate, DimDate[DateKey] = Max ( FactInventory[DateKey] ) )
)

or

LNE_2:= Calculate (
Sum ( FactInventory[OnHandQuantity] ),
LastDate ( FactInventory[DateKey] )
)

Performance wise, the results of both of them came back pretty fast and at almost the same time.

Perf_LNEAllStores

It is interesting to note that we are using FactInventory[DateKey] to get the last non empty date at which data is present. If we substitute the date key from the fact with the dimension in the above formula, we will get the DAX formula equivalent to the LastChild aggregation type in SSAS multidimensional.

LastChild:= Calculate (
Sum ( FactInventory[OnHandQuantity] ),
LastDate ( DimDate[DateKey] )
)

2) Last Non Empty(Each Store)

Now, this scenario is used when the stock is recorded for all stores on a regular interval but not necessarily on the same date for all the stores (However, it is assumed that the stock for all the products in a store would be taken on the same day). For eg, the stock for all the stores may be recorded monthly, but the stock would be recorded for some stores on the first week of the month, and some may be on the third or last week of the month. There are different ways in which you can write this calculation, and some of the ways I could find is written below:-

LNE_St_1:=
SumX (
Values ( DimStore[StoreKey] ),
Calculate (
Sum ( FactInventory[OnHandQuantity] ),
LastNonBlank (
FactInventory[Datekey],
1 )
)
)

or

LNE_St_2:=
SumX (
Values ( DimStore[StoreKey] ),
Calculate (
Sum ( FactInventory[OnHandQuantity] ),
LastDate ( FactInventory[Datekey] )
)
)

If you had read these posts (1 and 2), you might notice the recurring use of the pattern SumX(Values()) explained in those posts in both of the above formulas. You can also substitute the SumX(Values()) pattern with a SumX(Summarize()) pattern as shown below

LNE_St_3:=
SumX (
  Summarize (
    DimStore,
    DimStore[StoreKey],
    "LNE_P",
   Calculate ( Max ( FactInventory[DateKey] ) )
  ),
  Calculate (
    Sum ( FactInventory[OnHandQuantity] ),
    Filter (
      All ( Dimdate ),
      DimDate[DateKey] = [LNE_P]
    )
  )
)

The performance of all the formulas listed above were really good (all of them came back in 1 second). But the moment you add one more dimension to the formula (for eg, you need to find the last non empty for each store and product combination), there seems to be significant differences between the performance and I am currently doing some more testing on them (Well, might be a topic for another blog!). Meanwhile, you can go ahead and read the next part of this series in Javier’s blog.

Reference & Interesting Reads

1) Chris Webb - Last Ever Non Empty – a new, fast MDX approach
2) Teo Lachev - Last Non Empty Affairs
3) Jason Thomas - Changing Granularity of Leaf Level Calculations in SSAS Tabular
4) Javier Guillen - SCOPING at different granularities in DAX (Part I)
5) Paul te Braak - The Last Non-Empty Dilemma TABULAR Style
6) Hilmar Buchta - Sparse Snapshots in DAX / BISM Tabular

20 comments:

  1. Thanks!
    You helped me solve Debts Closing Period problem!
    http://social.msdn.microsoft.com/Forums/en-US/sqlkjpowerpivotforexcel/thread/a9904962-ed5d-4c7c-9ee4-88b506732e5c
    Michael Shparber

    ReplyDelete
  2. If more facets of the SSAS/SSDT world were explained like this, the world of data mining would be a much happier place... Thanks!

    ReplyDelete
  3. Hello,
    Any idea how to get the value for the lastnonempty "previous day"? For example, let's say I want to compare the current day to the previous day only if there is a value for the previous day, i.e. exclude holidays and weekends. I'm using the following DAX to get the previous day:
    Previous Day Net Fees:=Calculate([Net Funded Fees],PREVIOUSDAY(DimDate[FullDate]))

    Tried adding a firstnonblank but it didn't work.

    Thanks,
    Ben

    ReplyDelete
  4. Hi Jason,

    The DAX I used below only sums the last value for accounts that have records in the pivot date context. Hence, the sum is equivalent to the sum without a derived DAX calculation.
    Any thoughts?
    SUMX (
    VALUES ( DimAccount[CPAccountID] ),
    CALCULATE (
    SUM ( vwCPAccountRevenueRecognition[DeferredIncomeCP] ),
    LASTNONBLANK (
    vwCPAccountRevenueRecognition[RecognitionDate],
    CALCULATE ( COUNTROWS ( vwCPAccountRevenueRecognition ) )
    )
    )
    )

    ReplyDelete
    Replies
    1. If you want the sum outside the PIVOT date context, you will need the pattern for Last Ever Non Empty (link at the end of the blog). It is important to understand the difference between the 4 different patterns and determining which pattern you need for your scenario.

      Delete
    2. Thanks, but how do you do it with DAX? Chris Webb's solution uses MDX.

      Delete
    3. As I mentioned before, Part 2 of this blog is written by Javier - https://javierguillen.wordpress.com/2012/06/09/lastnonempty-in-tabular-mode-part-2-last-ever-non-empty-calculations-in-dax/

      This should give you the Last Ever Non Empty pattern in DAX. Maybe I should make the Part 2 links more obvious :)

      Delete
  5. Yeah, when I try his solution I get “table of multiple values was supplied where a single value was expected” even though I have the same data structure, but using accounts instead of stores. The irony is that it's a simple query with T-SQL.

    ReplyDelete
    Replies
    1. You shouldn't get that. Can you paste the DAX formula that you are using (with the way that Javier explained)?

      Delete
  6. Here's the calculation that does work ONLY if I select "year" for the row values. Previously I was using a calendar hierarchy so that makes sense. But I need the lastevernonemptydate (ultimately value) at the day level.
    LastEverNonEmptyDate:=IF(HASONEVALUE(DimAccount[CPAccountID]) , MAXX(
    SUMMARIZE
    (
    DimAccount,
    DimAccount[CPAccountID]
    ,"LastEverNonEmptyDate", CALCULATE( MAX( vwCPAccountRevenueRecognition[RecognitionDate])
    , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear])
    ) )
    , [LastEverNonEmptyDate] ) )

    Thanks,
    Ben

    ReplyDelete
    Replies
    1. Ultimately, I don't need to group it by accounts but I need the aggregate value of all accounts at a daily grain. So the final pivot would have a calendar hierarchy populated with the total balance from all accounts based on the lastevernonemptydate. And then if necessary the specific account could be filtered by using a slicer. Looks like Javier's solution is designed to include the dimension attribute, aka stores, included in the pivot (rows).

      Delete
    2. I think you can avoid the error by putting MAX instead of VALUES (if you are not planning to select a year). I couldn't understand the full requirements from the information that you have given. Which pattern are you looking at - LENE for All stores or Each store? LENE for each store should give you the aggregated value, wouldn't it? That would not need the Stores to be on the Pivot as the SUMX will do it in memory.

      Delete
    3. Thanks, I'll try that and get back to you. Appreciate your feedback!

      Delete
  7. Hi Jason, I did get it to work exactly the way Javier's post explains. However, I still need to get it to work with multiple calendar grains simultaneously, so that I can drag a calendar hierarchy into the pivot and have the aggregated closing balance populated for each day, month, quarter, and year.
    In any case, really appreciate your feedback since the solution is working the way that you and Javier intended!
    Thanks,
    Ben

    ReplyDelete
  8. Hi Jason, I did find a DAX solution (see below) but it only works for a unique date filter. Any thoughts on how to make it work with a date (or date hierarchy) as a row attribute in a pivot?

    =SUMX(

    ADDCOLUMNS(

    ADDCOLUMNS(

    VALUES(DimAccount[CPAccountID])

    ,"LastEverNonEmptyDate", CALCULATE( MAX( vwCPAccountRevenueRecognition[RecognitionDate] )

    , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear]) )

    )

    , "EndingDeferredIncomeCP", CALCULATE ( SUM( vwCPAccountRevenueRecognition[EndingDeferredIncomeCP] )

    , FILTER(ALL(Dimdate), DimDate[FullDate] = [LastEverNonEmptyDate] ) )

    )

    , [EndingDeferredIncomeCP] )

    Thanks,

    Ben

    ReplyDelete
  9. Hi Jason,
    I've read a number of articles about semi-additive measures, but none have addressed the problem I'm facing. Imagine that in your example C1-C4 are customer ans S1-S8 are products. We assume that if a customer has reported inventory for a particular day, any products they did not report on that day have an inventory of zero. So, the total I'd want to see for C3 2011 would be 15, not 25, because we assume S5 had zero stock on Dec 12 2011 due to the reporting of inventory for S6 on that same date.

    To be clear, the inventory fact table is by product, customer, and date. The qty recorded is the inventory balance for the given product/customer/date. The inventory is recorded sparsely, with customers reporting on different dates (some once a week, some twice a week, not necessarily on the same days every week). If a customer has no inventory for a particular product/date, instead of sending a row with 0 qty, they do not send a row.

    I imagine we could fill in the missing dates in the fact table, but that would severely increase the size of the fact table (this is currently in Excel Powerpivot, not SSAS Tabular).

    So, I am looking for the inventory qty, per customer & product, for the last day that a customer reported inventory for any product, in the current date context. I would also hope that this measure would sum properly to higher level aggregates (ie: Product Family, Customer Region, Calendar Month).

    ReplyDelete

  10. I just came up with this, which is close, but will only show data for an aggregate date period (week, month, etc.) if there is data in the last day of that period.

    Inventory Qty (last):=CALCULATE(
    SUM(Inventory[Qty]),
    TOPN(
    1,
    FILTER(Inventory, Inventory[Key_Date] = MAX(Calendar[Key_Date])),
    Inventory[Key_Date],
    0
    )
    )

    ReplyDelete
    Replies
    1. I might have not understood completely, but based on what you say, doesn't option 3 fit what you need? One question I will still have is - how do you differentiate a customer-date combination where inventory was taken that day and inventory was not taken that day? Also, if different customers record inventories on different days, how do you want the total? For eg, if A records on Jan 1, and B records inventory on 2, do you want the inventory on 2 Jan to be A+B (because that is the last known value) or do you still want only A?

      Delete
    2. TLDR; Since we require customers to report at least once weekly, not daily, I should probably limit viewing of inventory to a weekly granularity. And if I'm limiting this to week granularity, then Option #1 still wouldn't work as the C3 2012 value should be none instead of 25, and Option #3 also wouldn't work because C2-C4 2012 should all be none. However, the measure I posted, changed for week instead of day granularity should work fine, or perhaps even something else more simple.

      Below is my ramblings about this as I through through the problem, maybe they'd be helpful to someone else.

      Thanks much for asking the questions to get me thinking about this properly.

      -Jeff




      >>Also, if different customers record inventories on different days, how do you want the total?
      At first I jumped at the answer A+B, but I think I'm now seeing that this is at odds with what I'm looking for at week/month/etc. aggregate levels, as that would cross the current date context of Jan 2. If the lowest granularity was week, the answer would be B, and this is because we require customers to report at least once weekly.

      Perhaps the measure should function differently depending on how many days are in context...if only one day in context, then open up the date context to the start of the week, but if more than one day in context don't go outside the date context.

      But that is probably getting overly complicated and I should limit viewing inventory to a weekly granularity, as that is more valuable to the business users.

      Delete
  11. So, the difference between what I need and #3 is that #3 crosses the current date context. In your example the value for C3 2012 is 15 and C4 2012 is 10, I'd want these to be zero since 2011 is not in the 2012 date context.

    So, you'd think that #1 would be more appropriate. But the difference between what I'm looking for and #1 is that looking across "All Stores" needs to stay bounded by the customer. The values for C1 2012, C2 2012, C3 2012, and C4 2012 all match what I need (10, 25, none, none). But the Total 2012, instead of 10 (C1:5+5), would be 35: (C1: 5+5) + (C2: 25). ie: a sum of the measure values for C1-C4 2012.

    ReplyDelete