Pages

Friday, November 5, 2010

The Curious Case of Joins while Cube Processing

One of those days, I was just chatting with a colleague of mine about some database designs and he was showing me some exquisite piece of work in the database. That is when I happened to glance upon a stored procedure which apparently seemed to materialize an already existing account dimension view as a table. This stroked my inherent sense of curiosity and I unleashed a barrage of questions at him regarding it’s relevance. My colleague started explaining that somehow the dimensions were getting joined with all the fact tables during cube processing and hence the processing takes a very long time to complete if the dimension view was used as such. To add some justice to his words, the dimension view indeed was doing some complex logic and it looked natural that the cube processing would take time if the view was used in the joins instead of a table. But what didn’t look natural to me was that the dimension views were being use din the joins. After all, I was under the assumption that it was just a ‘select from fact table’ that gets executed or at max, the fact views if some logic is used. So I decided to open this curious case of joins for investigation.

Even though it wasn't because I didnt trust my colleague, I just had to see with my own eyes to believe, the doubting Thomas that I am. So I asked him to process a particular fact table and I looked up the query that was being used.

With materialized Reference relation

Sure enough, the dimension view was being used in the joins. I was dumbfounded and decided to take a break to think. Adding tobacco to my pipe, I put on my thinking cap and stared aimlessly at the setting sun. Random thoughts started racking my brain and I even ventured far to think that it could be a bug in analysis services. Suddenly, something clicked in my mind and I started running towards my colleague and delivered in true Sherlock Holmes style - “Come, Watson, come! The game is afoot. Not a word! Into your clothes and come!”. My colleague also decided to humour me and tagged along.

Me : Chance has put in our way a most singular and whimsical problem, and it’s solution is it’s own reward.

Colleague : Can you please cut the crap and let me know why this happens?

Me : Watson, you know I can’t resist a touch of the dramatic. I have no desire to make mysteries, but it is impossible at the moment of action not to enter into long and complex explanations. Let’s go step by step and have a look at the query once again.

Colleague : Duh, ok!

Me : Perhaps when a man has special knowledge and special powers like my own, it rather encourages him to seek a complex explanation when a simpler one is at hand. The importance of careful observation can never be emphasized enough. If you would remember the way that you explained the problem to me, you mentioned that all the dimension views were getting joined in the fact table. But if you look at the query (especially the highlighted part), what do you observe?

Colleague : Hmmm, just the account dimension is getting joined.

Me : How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? We know now that all the dimensions are not getting joined. We also know that the account dimension is getting joined for some reason. What could be the reason?

Colleague : <light dawning on him> It could only mean that we have defined the usage of this dimension differently.

Me : Elementary my dear Watson! So let us open the dimension usage tab and check.

Dimension Usage

Colleague : Yes!!! There is a reference relationship between the Geography dimension and the fact table through Account, that should be the reason!

Me : To let the brain work without sufficient material is like racing an engine. It racks itself to pieces. Why should a reference relationship induce a join? After all, it could also be just resolved at run time like the regular relationship. To understand more, click to view the relationship details.

materialized option

Colleague : Finally! It is because the Materialize option is ticked.

Me : Elementary, my dear Watson. When a reference relationship is materialized, the joining across dimensions is performed during processing and not in querying. Also, the attributes in the materialized reference dimensions follow the aggregation rules of standard dimensions. If you remove the tick in the materialize checkbox, you will notice that the joins would not be present anymore.

without Materialize

Colleague : OK, so this was it. So simple once we know the reason.

Me : There, Watson, this infernal case had haunted me for days. I hereby banish it completely from my presence.

Colleague : Could you please cut out the Sherlock Holmes part, it’s freaking me out!

So, here rests the curious case of joins while cube processing in my chronicles.

Note : Special mention for eQuotes for help with the Sherlock Holmes quotes. Also, to read on how the ticking of the Materialize option would affect your data, click on the blog below by Alberto Ferrari - SSAS: Reference materialized dimension might produce incorrect results

1 comment:

  1. Hi Jason,

    Had sort of similar problem in referenced dimension of missing data for certain reports.

    Bumped into an intresting paragraph in SSAS 2005 Performance guide which solved the issue.

    Posting it here for all SSAS Guys :

    To improve the query performance of reference relationships, you can choose to materialize them. Note that by default, reference relationships are not materialized. When a reference relationship is materialized, the joining across dimensions is performed during processing as opposed to querying. In addition, the attributes in the materialized reference dimensions follow the aggregation rules of standard dimensions. For more information on these rules, see The aggregation usage rules. Since the join is performed during processing and aggregations are possible, materialized reference relationships can significantly improve query performance when compared to unmaterialized relationships.
    Some additional considerations apply to materialized reference relationships. During processing, the reference dimension is processed independently. At this time, if any row in the measure group does not join to the reference dimension, the record is removed from the partition.

    ReplyDelete