Pages

Wednesday, June 23, 2010

SSAS Dimension Attribute Properties : Part 2

Yesterday, I was reading through some articles and happened to glance through a very memorable quote by  Berthold Auerbach – “ The little dissatisfaction which every artist feels at the completion of a work forms the germ of a new work “. That was when I realized I haven’t yet completed my dimension attribute article and so here I am with the second part.

In SSAS Dimension Attribute Properties : Part 1, we reviewed the Advanced and Basic properties of dimension attributes. This article would be continuing with the Misc, Parent-Child and Source properties.

Dimension Attribute Property

The properties are explained below:-

MISC

1) AttributeHierarchyOrdered : This particular property specifies whether the members of the attribute are ordered or not. The values of this property can just be True or false. If the order of the members do not matter, setting this property to false for attributes where the attribute hierarchy is enabled or high cardinality attributes can significantly increase the processing performance.

2) GroupingBehavior :  This property is used to give a hint to the client application whether to encourage or discourage users to group on this attribute and does not affect any of the structures on disk. The values are EncurageGrouping and DiscourageGrouping.

3) InstanceSelection : This property also is used to give a hint to the client application’s UI on the recommended means of selection of a member from the attribute. The options available are

  • None - (Default) no selection used.
  • DropDown - Appropriate for situations where the number of items is small enough to display within a dropdown list.
  • List - Appropriate for situations where the number of items is too large for a dropdown list, but not large enough to require filtering.
  • Filtered List - Most useful in scenarios where the number of items is large enough to require users to filter the items to be displayed.
  • Mandatory Filter - Appropriate for situations where the number of items is so large that the display must always be filtered.

4) MemberNamesUnique : This property indicates whether the member names are unique across the attribute hierarchy and this property affects the way member unique names are generated. The available options are True or False.

Parent-Child

1) MembersWithData : In a parent-child hierarchy, some of the non-leaf members may also have data associated with them (unlike normal hierarchies, where the non-leaf members have a value equal to the sum of it’s leaf values). These members are called data members and are present only for parent-child hierarchies. This particular property is used to set the visibility of the data members in parent-child hierarchies and the available options are NonLeafDataHidden and NonLeafDataVisible. This MSDN article - Working with Attributes in Parent-Child Hierarchies does a very nice job of explaining this property with an example.

2) MembersWithDataCaption : This particular property is used as a naming template for the system generated data members. For eg, if we have the MembersWithData property set to NonLeafDataVisible, then a leaf member representation of the data member is added. For eg, if Jason is a data member, with Thomas and Tom as his leaf members, then there would be an additional Jason added as a leaf member. Now to differentiate between the leaf member and the data member (in this case, both are Jason) would be difficult and hence we can use a template like *(leaf member) as the value of this property. The asterisk symbol is a placeholder for the original name. So our example would become Jason for the data member and Jason(leaf member) for the leaf member.

3) NamingTemplate : This property specifies how levels in a particular parent-child hierarchy would be named. Click the ellipsis button (..) in this property’s cell and then you should be able to view a popup window as shown below:-

Level Naming Template

You can specify a name for the level by clicking on the Name column of the second row and entering for eg, Employee *. This will ensure that instead of Level 02, Level 03, etc., you will be getting Employee 02, employee 03 and so on. For more details, refer to the MSDN article - Defining Parent Attribute Properties in a Parent-Child Hierarchy.

4) RootMemberIf : This property is used to specify the criteria by which we can identify the members of the highest level (excluding the ALL level). Again quoting from an article of William Pearson -

The four selection options include the following:

  • ParentIsBlankSelfOrMissing - (Default) Only members that meet one or more of the conditions described for ParentIsBlank, ParentIsSelf, or ParentIsMissing are treated as root members.
  • ParentIsBlank - Only members with a null, a zero, or an empty string in the key column or columns are treated as root members.
  • ParentIsSelf - Only members with themselves as parents are treated as root members.
  • ParentIsMissing - Only members with parents that cannot be found are treated as root members.

The behavior of the RootMemberIf property in determining how the root or topmost members of a parent-child hierarchy are identified, is, therefore, dependent upon which of the selections above is made. The default, as noted above, is ParentIsBlankSelfOrMissing.

5) UnaryOperatorColumn : We can control how members on a hierarchy (usually parent/child, but not necessarily) aggregate up to their parents by defining unary operators. To do this, you must create a new column in your dimension table to hold the unary operator values and then set the attribute's UnaryOperatorColumn property to point to it. This property specifies the column which holds the unary operator. You will find a very good example under the Unary Operators and Weights heading of this article - Measures and Measure Groups in Microsoft Analysis Services: Part 1. The values are (none) and (new) for this property. On clicking New, a dialog box opens which will prompt us to select the binding type, source table and the source column.

Source

1) CustomRollupColumn : Unary operators do not give enough flexibility for rollup, and in such cases, we can write our own rollup formulas as MDX expressions. This property is used to specify a column which will contain the custom rollup formula. A valid expression will ensure that the aggregation logic defined in the AggregateFunction property of the measure would be overridden for this attribute.

2) CustomRollupPropertiesColumn : This property is used to contain the properties of a custom rollup column. Refer Custom Member Formulas heading of this article - Measures and Measure Groups in Microsoft Analysis Services: Part 1 to learn more about the above two properties.

3) KeyColumns : This property contains the column/columns that constitute the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. If the NameColumn property is not defined, the value of KeyColumns property would be used to display the attribute members.

4) NameColumn : In most of the cases, the key of the attribute would be a integer value, and this would not make any sense to the user who is viewing the attribute members. For this, we can specify a column in this property which will have the user friendly name of the attribute member.

5) ValueColumn : This property identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element.

In KeyColumns and NameColumn property, there are additional properties which can be got on expanding the plus symbol on the left. This article gives a pretty good overview on them.

With this info, you should be all set and ready to give that killer answer in your next interview. Godspeed and good luck to you! :)

Sunday, June 13, 2010

SSAS Dimension Attribute Properties : Part 1

There could not be any SSAS interview where at least one question is not asked about the dimension, attribute or measure properties. Many at times, I myself have kept on searching the web to find more information about a particular property only to find that the information is pretty scattered and you really have to spend some effort to consolidate all of them. Through this post, I will consolidate all relevant information regarding the attribute properties and try to put it under one roof.

A dimension attribute contains a limited list of key values. Each unique key is associated with a dimension member and these dimension members included in the attribute are called attribute members. In the multidimensional model, the attribute also contains properties that define different characteristics of the attribute members called attribute properties. All attribute members have the same properties. To see all the properties of an attribute, click on a particular attribute and press F4. On doing so, you would be able to see the property panel as shown below

SSAS Dimension Properties 

Before you go forward and read, you might want to understand some of the terms that are going to keep coming in the explanations below:-

  • Discrete Attributes and Contiguous Attributes : Quoted from Introduction to Attribute Discretization (William Pearson) - “ Whenever we work with attributes, we can expect to encounter two general types of values:
    • Discrete attributes: Discrete values stand apart distinctly, and have clearly defined logical “boundaries” between themselves. The possible values are naturally discrete for the lion’s share of attributes occurring in the business world.
      Example: The Gender attribute, within the Customer dimension of the Adventure Works sample UDM, is (at least for purposes of the sample cube) considered to have only one of two discrete values, female or male.
    • Contiguous attributes: Contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Contiguous values, especially within large populations, can have very large numbers of possible values. Information consumers can find it difficult to work effectively and efficiently within such wide ranges of values.
      Example: the Vacation Hours attribute, within the Employee dimension of the Adventure Works sample UDM, can have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many vacation days they can accumulate, and considerations of this nature. The member values are based directly upon the unique values contained within the VacationHours column of the DimEmployee table (with many of the unique values shared among multiple employees). The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.

The properties are explained below:-

ADVANCED

1) AttributeHierarchyDisplayFolder : This property is used to group attribute hierarchies into a particular folder. For eg, if you assign this particular property to Stocking for the attributes Class and Color in Product dimension, you would be able to see as shown below when you browse through the dimension

AttributeHierarchyDisplayFolder

2) AttributeHierarchyEnabled : The values of this property can just be True or False and is used to determine the absence or presence of attribute hierarchies. By default, all the attributes in the dimension would be assigned a value of true which would mean that attribute hierarchies would be enabled by default. From a performance point of view, this property is pretty important which is quite evident from the quote given below from OLAP Design Best Practices for Analysis Services 2005

Attributes add to the complexity and storage requirements of a dimension, and the number of attributes in a dimension can significantly affect performance.  This is especially of attributes which have AttributeHierachyEnabled set to True. Although SQL Server 2005 Analysis Services can support many attributes in a dimension, having more attributes than are actually used decreases performance unnecessarily and can make the end-user experience more difficult.

It is usually not necessary to create an attribute for every column in a table. Even though the wizards do this by default in Analysis Services 2005, a better design approach is to start with the attributes you know you'll need, and later add more attributes.  Adding attributes as you discover they are needed is generally better a better practice than adding everything and then removing attributes.

3) AttributeHierarchyOptimizedState : The values of this property can be FullyOptimized and NotOptimized (beats me why it shouldn’t have been a True or False instead) and is used to enable or disable the optimization of the hierarchy. Turning this property to a NotOptimized state would save resources when the hierarchy is being built during the processing and hence, should be turned off for those attributes which wouldn’t be used frequently for analysis. Turning this off would mean a slowdown in the requests that reference this attribute hierarchy which is the reason why we should just turn this off for less frequently used attributes.

4) AttributeHierarchyVisible :The values of this property can be True or False and is used to set the visibility of the attribute to a client application. If it is set as false, the client application would not be able to determine the presence of this attribute hierarchy (which means it would be invisible when the dimension is browsed through the application) but it would still be able to use the attribute hierarchy in queries. It is a best practice to make this property as False for those attributes which are also used in user defined hierarchies as anyways, you would be able to access that attribute from the user defined hierarchy and you can eliminate redundancy.

5) DefaultMember : This property can be used to specify the default member of the attribute. If no value is specified, then by default the ALL level is used. MDX expressions can also be used to set the default member of the attribute to any attribute member by clicking on the button besides the property which will open up the pop-up window as shown below

DefaultMember

6) DiscretizationBucketCount : Discretization is basically the process of grouping contiguous values into sets of discrete values.

Attribute Discretization 

Analysis Services supports several variations of attribute discretizations based on different algorithms and to enable it, DiscretizationBucketCount is one of the two properties that you have to set. As the name suggests, this property defines the number of groups that the values will be placed in.

7) DiscretizationMethod : This property basically determines which discretization algorithm should be applied. The values of this property are

  • None – Discretization would not be done.
  • Automatic – Automatically chooses the best grouping technique among the available values.
  • EqualArea – If the distribution of contiguous values are plotted as a curve, then the areas under the curve covered by wach range would be equal.
  • Cluster – Uses the K-Means algorithm to find ranges on the input values.

Apart from the above algorithms present in the Dimension Editor, it is also possible to specify a user defined discretization method through Data Definition Language (DDL) by defining the boundaries of every group.

8) EstimatedCount : This property specifies the estimated number of members in the attribute. This is usually populated when aggregations are designed (which could mean that when you change environments, the counts could be of the previous environment unless you count again) or specified by the user. This property is used when aggregations are being built and helps the server to make decisions on how to make the aggregations.

9) IsAggregatable : This property is used to determine whether the data associated with the attribute members can be aggregated. If True, then the system will define an ALL level which will have the aggregated data which will be used as the default member when queries which doesn’t reference a particular member of this attribute are executed. For the same reason, if this property is set to False, then a default member should be specified.

10) OrderBy : This property specifies the method by which the attribute members should be sorted and the options are Name, Key, AttributeName and AttributeKey.

11) OrderByAttribute : This property is used to select the attribute on which the ordering should happen if AttributeKey or AttributeName has been used in the OrderBy property.

BASIC

1) Description : This property specifies the description of the attribute and any free text can be entered.

2) ID : This property specifies the unique identifier of the dimension and is non-editable usually in the dimension editor.

3) Name : This property specifies the name of the attribute.

4) Type : I found the best explanation of this property from an article of William Pearson which is quoted below :-

“ The value of the Type property for an attribute determines the attribute type – and specifies the type of information contained by - that attribute. Within Analysis Services 2005, attribute types help to classify an attribute based upon its business utility or functionality. Many of the available options represent types which are used by client applications to display or support an attribute. However, some attribute types also have specific meaning to Analysis Services. Some attribute types identify attributes that represent time periods in various calendars for time dimensions.

Many attribute types for dimensions or attributes are set via the associated wizard that we use when defining these objects. Attribute types can also be set when we employ wizards, such as the Business Intelligence Wizard, to add functionality to dimensions within Analysis Services. A good example is the application of various attribute types to attributes in a dimension when we use the Business Intelligence Wizard to add Account Intelligence to a given dimension: the wizard applies several attribute types to attributes in the affected dimension, for example, to identify attributes that contain the names, codes, numbers, and structure of accounts within the dimension.

Attribute types in Analysis Services fall in into five categorized groups. These attribute type groups include:

  • General: These values are available to all attributes, and exist only to enable classification of attributes for client application purposes.
  • Account Dimension: These values identify an attribute that belongs to an account dimension.
  • Currency Dimension: These values identify an attribute that belongs to a currency dimension.
  • Slowly Changing: These values identify an attribute that belongs to a slowly changing dimension.
  • Time Dimension: These values identify an attribute that belongs to a time dimension. “

5) Usage : This property defines whether the attribute is a key attribute, an additional attribute for the dimension or a parent attribute.

(The rest of the properties would be continued in SSAS Dimension Attribute Properties : Part 2)

Monday, June 7, 2010

Calculating visit length by first and last visit time in SSRS

Writing a blog is an exhausting experience, most of the times you would be writing your blogs at the expense of your sleep or after a hard day’s work. But then there is a very fulfilling feeling, a joy that you get when you come to know that your work is helping people all around the world. I am sure most of the authors get a huge kick by the number of hits their blog receives and by the number of different countries visitors come from. For collecting such statistics, I use StatCounter and I rate it one of the best free add-ons for getting real-time web stats. There are a lot of features and one among the many is the Visit Length stat.

Let me quote the definition of Visit Length from the StatCounter site itself – “ The time between when a visitor accesses your first webpage of their visit, and when they access the last is what we call the 'Visit Length'. Unfortunately it is not possible to detect with the Standard StatCounter Project when they left your website for a different website. Instead we take the time they accessed your last webpage of their visit as their 'exit time'. Although this isn't 100% accurate it is very close and nonetheless provides a valuable insight into your visitors. Based on this information you can see how much 'pull' and 'interest' your website is generating for your visitors. If you have a low 'Visit Length' you may want to think up more ways to encourage your visitors to stay around for longer “. Now you might be wondering why I am advertising StatCounter here ( nopes, I don’t get paid by them, I am just a happy customer ). The actual reason for all this introduction was to introduce you to a scenario where we have data of the visits stored and we need to calculate the visit length in a SSRS report.

Let me first get the data before I explain the scenario. For getting the data having the blog visitor information, I will be using a random page of my blog’s StatCounter and then importing the data into the database. A sample page from StatCounter is given below:-

Visitor stats 

After importing the data, I created a report which will show the visitor info as shown below:-

Report showing Visitor data

Problem

Now that the source data is ready, I will explain the requirement. We need to find the first visit time and last visit time for the same visitor from the above data. A visitor is considered to be the same if he has the same Country, Place, Browser, Resolution and back to back entries. For eg, the first 2 entries for Coimbatore, India is considered to be the same visitor as it has back to back entries and also all the other fields except time is the same. Using the same logic, Mountain View, USA is not considered to be the same visitor in the last and third last rows as it has no back to back entries even though all the other columns are same. If there is no back to back entry, the last visit time would be the same as the first visit time. The required output would look like shown below:-

Expected Output

Solution

The very first thing that comes to our mind on seeing such a problem is to group by all the fields except time and then get the first and last Time for that group. However, we can’t do that here as

a) we need to consider visitors that do not have back to back visits as separate visitors. A Group By clause will group all the visitors together even if it doesn't have back to back entries. In the above example, it will group the visitor from Mountain View as one entry even though we require it as two.

b) it will group entries from different dates also as one entry. So suppose one visitor came on the first and last day of the month, he would be grouped as one entry and his visit length would become 1 month which would be incorrect.

After brainstorming and googling upon this for a lot of time, I stumbled upon a valuable post in the MSDN forums by Raymond Lee (moderator) in which he posts a custom code for achieving this type of grouping. Follow the steps below to apply his solution in our problem:-

1) Go to the Design section of the report and press Alt+R. Select Report properties from the menu and select the code tab. Paste the code given below there

Dim Address As System.Collections.Hashtable
Dim sumGroup as Integer
 
Function MyFunc(ByVal _Address  As Object) As integer
Dim flag as integer
If (Address Is Nothing) Then
Address = New System.Collections.Hashtable
End If
If (Not Address .Contains(_Address)) Then
Address .clear()
sumGroup = sumGroup + 1
Address .Add(_Address, nothing)
else
sumGroup  = sumGroup
End If
MyFunc = sumGroup
End Function

Paste vb code in Code section

2) Drag and drop a table into the design and add all the required fields except time into the table. Make a group for the fields with group expression as given below

=Code.MyFunc(Fields!Place.Value+Fields!Country.Value+Fields!Browser.Value+Fields!Resolution.Value)

Group Expression

The argument for the function would be all the fields on the basis of which we need to check if there is a back to back entry. In our case, it is all the fields except time and we are concatenating it as one field and sending it as input to the function.

3) Once this is done, we should be having the table in the design section looking as shown below

Table structure in Design 

Notice that there is only one group and the group name is Group2.

4) Now all we need to do is to get the first and last visit time for each entry which is easily done with the below expression

=first(Fields!Time.Value,"Group2")+" - "+last(Fields!Time.Value,"Group2")

Take care to substitute Group2 in the above expression with the group name of your table.

5) With a little bit of tidying up and colouring, we should be able to attain the results shown below

Final Output 

Wednesday, June 2, 2010

Display Total on top of Stacked Chart

It is a normal phenomenon that human beings take most of the simple things for granted. Take this case, one of my blog readers sent me a mail asking how to display the total on top of a stacked chart in SSRS. The first time I read it, I was almost going to blurt - just enable some property to display subtotals. But the more I thought about it, the more elusive the property became and that is when I realized there is no such property. So for the benefit of all, I thought about posting the workaround that I devised for this.

Suppose I have a simple stacked column chart with Month on Categories, Product Category on Series and Sales as the measure. The requirement is to display the total of all Product Categories for a month on top of the stacked column chart. The required output is shown in the below image.

Stacked Column Chart - Total

For this, follow the steps below:-

1) The original query was

SELECT  Month,
            Category,
            Sales
FROM    Sales

Modify it as follows

SELECT   Month,
             Category,
             Sales
FROM     Sales
UNION
SELECT   Month,
             'Total' AS Category,
             0.1 * SUM(Sales) AS Sales
FROM     Sales
GROUP BY Month;

This is done to include an extra row called Total for Product Categories. The value for Sales for this row would be 10% of the total Sales of that month.

2) Go to the Design tab, right click on the chart and enable the Show Data Labels option. Then select the series label properties by right clicking on the data labels.

Show Data Labels

3) Edit the Label data and enter the following expression

=iif(Fields!Category.Value="Total",sum(Fields!Sales.Value,"Chart1_CategoryGroup")-sum(Fields!Sales.Value),"")

Label Data Expression

The above expression displays blank if the Product Category is not “Total” and displays the sum of the entire Product Categories for that month (including the value for Total) – sum of the Total field.

P.S. : “Chart1_CategoryGroup” would have to be replaced with the category group name of your stacked column chart. To find this, click on the Month (category) button in the chart and press F4. You will find the Name of the group in the property panel that opens up as shown below.

Find category group name

4) Now we should be getting the following output when we click on the Preview tab.

Statcked column chart with Total

Go back to the Design tab and click on Product Category (series) in the chart and press F4 to open the property panel. Enter the below expression in the Label property

=iif(Fields!Category.Value="Total"," ",Fields!Category.Value)

Edit Label expression for series

This is done so that the Total value will not appear in the legend.

5) Right click on the stacked columns and select series properties.

Series properties

6) Go to the Fill tab and enter the following expression for the color

=iif(Fields!Category.Value="Total","Transparent","Automatic")

Color expression

This is done so that the Total value will be transparent and hence, would look like it is not present.

7) Preview the report and you should be getting the required output

Final result 

This report has been done with SSRS 2008 R2 but the concept should work in SSRS 2008 more or less the same way.

Update

For SSRS 2008 users, it has been noticed that all the bars become black when the above fill expression is used. That is because the Automatic colour can not be used in expressions. So as a workaround, follow the step below instead of the 6th step above

1) In BIDS, press Alt+R to open the Report Tab and select Report Properties. Then go to the code section and paste the following code

Private colorPalette As String() = {"Green", "Blue", "Red", "Orange", "Aqua", "Teal", "Gold", "RoyalBlue", "#A59D93", "#B8341B", "#352F26", "#F1E7D6", "#E16C56", "#CFBA9B"}
Private count As Integer = 0
Private mapping As New System.Collections.Hashtable()
Public Function GetColor(ByVal groupingValue As String) As String
  If mapping.ContainsKey(groupingValue) Then
    Return mapping(groupingValue)
  End If
  Dim c As String = colorPalette(count Mod colorPalette.Length)
  count = count + 1
  mapping.Add(groupingValue, c)
  Return c
End Function

Rep properties

Right now, there are only 14 colours in the palette but more colours can be added if you have more than 14 values in the series.

2) Use the following expression for colour in the fill tab of series properties.

=iif(Fields!Category.Value="Total", "Transparent",Code.GetColor(Fields!Category.Value))

This would do the trick :)

Tuesday, June 1, 2010

Collation conflict error

Yesterday, I was working on SQL Server 2008 R2 for a Proof of Concept regarding map reports. I was doing some querying on the database and hence, had to join a table in a SQL 2008 DB and SQL 2008 R2 DB. So for this purpose, I created a linked server and executed the following query.

SELECT *
FROM   mapdata AS a
       LEFT OUTER JOIN 
       CHAOS.diw.dbo.DIM_SALES_ORGANISATION AS b
       ON a.[Sales Rep] = b.[employee_name];

That was when I hit upon the following error

Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.

Collation Confliction Error

On further reading, I found out that the default collation in R2 is SQL_Latin1_General_CP1_CI_AS and in SQL 2008 is Latin1_General_CI_AS. Since the databases have different collation, it would not be possible to compare the fields from those databases directly.

To solve this issue, I just added the following keyword before the = operator like shown below

SELECT *
FROM   mapdata AS a
       LEFT OUTER JOIN 
       CHAOS.diw.dbo.DIM_SALES_ORGANISATION AS b
       ON a.[Sales Rep] COLLATE database_default = b.[employee_name] COLLATE database_default;