Pages

Wednesday, June 22, 2011

Let’s Go Spatial – Session slides for download

So that’s one objective struck off from my list for this year. Yes, I made a presentation on the mapping features in SSRS 2008 R2 for the London BI User Group on 21 June, 2010. The UG meeting was held in the Hitachi office near London Bridge and was attended by around 30 people. A big thanks to Hitachi Consulting for sponsoring the beer / pizza and of course, for allowing us to use their office for the meeting. Another round of applause to Chris Webb (blog)  for organizing this UG meeting and personally, my gratitude for giving me the opportunity to speak. Though my presentation was more focussed on demos, I am attaching the slides that I used for my session – ‘Let’s go Spatial’ along with this post. I will try to upload the rdls of the reports also but that might take some time. So keep checking or ping me if you have got some urgent need for the sample reports used in my presentation.

Click to download - Let's go SPATIAL

The demos included

  • Creating map reports from ESRI shapefiles
    • How to deal with deployment errors with large shapefiles
  • Creating map reports from Spatial Data
    • How to import shapefiles into your database
  • Visualizing your map reports using measures
  • How to drill up and down your map reports
  • How to do custom aggregation in your map reports
  • Creating bubble maps in your map reports

This was followed by an amazing session on Change Data Capture by the charismatic Duncan Sutcliffe (blog | twitter). Overall a nice day and looking forward for more BI user group meetings.

Thursday, June 9, 2011

When and How to Snowflake Dimension Sources : SSAS Design Part 3

In my last post, I had quickly hinted why I prefer using a star schema and also explained the second scenario in which I would snowflake the dimension sources for the DSV. 

Star to Snowflake Part 3

The three scenarios as well as the example product dimension are repeated below for quick reference

  • Multiple lower levels for the dimension (Scenario 1)
  • Multiple facts coming at multiple granularity (Scenario 2)
  • Multiple levels joining to the same parent (Scenario 3)

Facts at different granularity_

Scenario 3 – Multiple Levels joining to same Parent

Cyclical relationship in Product Dimension

Consider the above example where there are multiple levels (Local SKU and Nielsen SKU) joining to the same parent. Invariably, a “diamond” or cyclic relationship would be involved the moment you say that there are multiple levels joining to the same parent in SSAS dimensions (You could avoid this by splitting the dimensions and making referenced dimensions in SSAS but then you would lose the ability to make hierarchies between the levels in the two split dimensions). For Scenario 3 to be applied, an additional condition also needs to be satisfied – at least one of the intermediate attributes involved in the cyclic relationship needs to have fact coming in at that level and this attribute should not be the start point or end point of the cyclic relationship. Some examples of diamond relationship are given below

 

Examples of Cyclic Relationship 

In our example, both Local SKU and Nielsen SKU have fact coming in at their levels and they are neither the start point (Dummy Level) nor the end point (Retail SKU) of the cyclic relationship, and hence Retail SKU qualifies under Scenario 3. In this scenario, the following views would be made

a)      View for parent level – A view is made for the parent entity (Retail SKU in our example) which will contain all the attributes / entities till the next entity where one of the three scenarios’ happens.  For our example, one view is made.

// View for Retail SKU and attributes

SELECT DISTINCT

   // Scenario 2 at Brand Pack

Retail_SKU, Standard_SKU, BrandPack#FK

FROM

<table>

WHERE

   // <levels inside “diamond” at which fact granularity comes>

<level=Local_SKU_ID or level=Nielsen_SKU_ID>

 

Some things to note here

  • As mentioned in the note section of my previous post, there is a basic difference in the WHERE clause of the views in Scenario 3 and the rest. For the rest of the scenarios, the clause would be for the lowest level of the view. So if the same principle was applied to the above view, then the WHERE clause would have been for <level=Retail_SKU>. Consider a case where there are Retail SKUs which don’t have any Local SKUs or Nielsen SKUs. There would not be any facts associated with them, and would be a pain for the business users if they start appearing in the dropdowns for the reports. Having the Scenario 3 WHERE clause would prevent this from happening as only the Retails SKUs which have facts associated with them would be brought in the cube.
  • The reason why Brand Family doesn’t qualify for Scenario 3 is because apart from the lowest level (Brand Pack), there are no attributes in the “diamond” relationship which has fact coming at that level. Hence, all these attributes have come under the view for Brand Pack in Scenario 2.
  • In both Scenario 1 and Scenario 3, there is a cyclic relationship or diamond relationship present in the model. These views will only ensure that the master data is correct. Care should be taken to handle the cyclic relationships within SSAS by defining multiple hierarchies or by making the relationship linear. Else you might see the blue squiggly line in the dimension attribute relationship manager in SSAS warning you of the occurrence of cyclic relationships.

That said, this marks the end of my trilogy on When and How to Snowflake Dimension Sources. I have made an effort to make it as generic as possible, but there could be scenarios which I have missed. Feel free to throw bricks or debate on the above concepts, constructive criticism is always welcome!

Wednesday, June 1, 2011

When and How to Snowflake Dimension Sources : SSAS Design Part 2

As a response to my previous post, one of my readers asked me this question – “ why are you hell-bent on making a star schema? You yourself have told that a snowflake schema would give the best performance during cube processing and once the cube is processed, it doesn’t matter which schema was used – the query performance would be the same. So make the designer’s life easier by making all the dimension attributes in 3NF, obviously nothing can go wrong there! “

Well, I have been extremely lucky to work along with a highly technical support & maintenance team at the client site, who challenges and makes me give appropriate reasons for every design decision I take. We as a vendor develop our applications and pass it on to the support team for the final code review and acceptance (this is in addition to the UAT or data testing part which would be done along with the business users). This made me think of them as two separate customers with two different priorities for the same application. If I have to sell my application to them, I need to appease both of them and the only way I can do is by doing a fine balancing act. The Business Users would want the best performing system (both in terms of cube processing as well as query performance) while the Support Team would want a system which is easier to support and maintain. If I designed all my attributes in 3NF, the support team would straightaway reject the application as they wouldn’t make sense what is happening in the DSV with the hundreds of attributes and relations. Since my cubes are usually less than 50GB, the processing time difference between the two schemas is less than 15 minutes which my business users would not even notice. In this case, it makes sense to use a star schema and appease my other customer, the Support Team. Your support team would love you for the additional work you do in your DSV’s diagram organizer when you neatly show them the star schemas for every fact.

 

Star to Snowflake

In my last post, I had quickly summarized some of the best practices that I apply in the design of my dimension sources and also explained the first scenario in which I would snowflake the dimension sources for the DSV. The three scenarios as well as the example product dimension are repeated below for quick reference

  • Multiple lower levels for the dimension (Scenario 1)
  • Multiple facts coming at multiple granularity (Scenario 2)
  • Multiple levels joining to the same parent (Scenario 3)

Product dimension linked to facts

 

Scenario 2 – Multiple facts at multiple granularity

 

clip_image002

 

Consider the example in the above figure where there are facts coming in at multiple levels of the same dimension (at Prod Group, Brand Pack, Local SKU and Nielsen SKU levels). In this scenario, all the members of the levels at which the fact comes are needed and hence, the following views are made

a)    Individual views for each of the level at which fact granularity comes – For each level at which the fact comes, a view is made which will contain all the attributes / entities till the next entity where one of the three scenarios’ happen. Since there are 4 levels at which fact comes in this example, the following views are made

// View for Local SKU and attributes

SELECT

   // Scenario 3 at Retail SKU

Local_SKU, RSKU#FK

FROM

<table>

 

// View for Nielsen SKU and attributes

SELECT

   // Scenario 3 at Retail SKU

Nielsen_SKU, RSKU#FK

FROM

<table>

 

// View for Brand Pack attributes

SELECT

   // Scenario 2 at Product Group

BrandPack, BrandSF, LocalBrandSF, BrandFamily, RPCT, RPCTGrp, PG#FK

FROM

<table>

 

 

// View for Product Group attributes

SELECT   

ProductGroup, ReportingProductGroup

FROM

<table>

Some things to note here

  • If the entity for which the view is made is the lowest level of the dimension or if the entity’s immediate child is having a mandatory relationship with it, then there is no need to include the Unknown Member. Else, we would have to provide the Unknown Member for that entity. (A mandatory relationship indicates that for every occurrence of entity A there must exist an entity B, and vice versa)
  • There is an overlap of Scenario 1 and Scenario 2 for Local SKU and Nielsen SKU views but the resultant views are the same. These views are repeated in this post just for completeness of Scenario 2.
  • In the view for Brand Pack attributes, it might look like Scenario 3 applies at Brand Family level. But Scenario 3 needs an extra condition to be satisfied which is not done here. This will be explained in detail in the next post.
  • General rule for each individual view in Scenario 1 and Scenario 2 is to select all the attributes from the  table which has children at the lowest level of that view. For eg, in the view for Brand Pack attributes, all Brand Sub Families would be present which have Brand Packs. If there is a Brand Sub Family which doesn’t have a Brand Pack, it will not be present in the view. But that particular value of Brand Sub Family can be safely omitted as there are no facts coming in at a Brand Sub Family level and hence it would not be needed for analysis (in most of the cases). This general rule would not be applicable to Scenario 3 and would be explained in the next post.

Meanwhile, as usual, let me know till what level you agree or disagree on these points.