Pages

Monday, January 28, 2013

Scatter Line Charts in SSRS

Sometime during the last month, I saw an image of an Archimedean Spiral and was wondering whether I could recreate the same in SSRS. The Archimedean Spiral is actually a scatter line chart (a combination of scatter chart and line chart). I have over 6+ years of experience creating visualizations using SSRS, but the closest I had come to such a visualization was the Scatter Chart.

 Scatter Line charts in SSRS

A scatter line chart is used either to show the relationship among the items in several distinct series of data or to plot a line using x and y coordinates. The x and y coordinates of the data marker are determined by two data values. A scatter line chart draws attention to uneven intervals in data. This type of chart is often used to plot scientific data, and can highlight the deviation of collected data from predicted results. A scatter line chart can also be used to organize data chronologically (even if the data is not in chronological order). For eg, consider the Scatter Chart below

Scatter chart

A scatter chart doesn’t tell the full story as we don’t know the chronology in which the data was collected. 

For the same scatter chart, the scatter-line charts can be different

A scatter line chart can give that extra bit of information as you can see from the image above. To create a scatter line chart, you need to have at least two measures (for the X & Y coordinates) and a category group. Follow the steps below to create an Archimedean spiral using the scatter chart:-

1) Create a dataset with the two measures (X & Y coordinates) and a category. Since I am trying to replicate the Archimedean spiral, I used the formula given below to come up with the coordinates and category.

\, r=a+b\theta

Dataset query for Archimedean spiral

You can see that my category (the Pos field) is just a running number used for determining the order.

2)  Drag and drop a Smooth Line with Markers chart into the layout. Use Y as the value and Pos as the Category group.

Smooth line chart with markers

3) Once you are done with that, click on the values and press F4 to bring up the chart series properties. Under Data, expand DataPoint property and then expand Values. Now, type the following formula in the X property as shown in the image below

=sum(Fields!X.Value)

Modifying X axis properties

Note that X is the name of the field in our dataset and should be replaced by whatever is the field name that should be present in the X axis.

4) Preview it, and you should be able to see our Archimedean Spiral.

Archimedean spiral in ssrs

One thing to note here is that the lines are connected based on the sorting order of the Category group field. So in case you are using a name field like month (January, February, etc), ensure that the sorting order is based on the month number (else you might not get the right results). Feel free to download the report file from here. Though scatter line charts might not be as useful as sparklines, they have their place in data visualization and it is always good to know that you can make one using SSRS.

Wednesday, January 23, 2013

I’m speaking at the PASS Business Analytics Conference

The speaker lineup for the PASS Business Analytics Conference has been announced and I’m really excited to see my name in the list. In case you haven’t heard, PASS BA Conference is the premium conference for business analytics professionals. If you're a business analyst, data scientist, architect, or BA/BI practitioner, you can’t afford to miss this conference. Attendees will be exposed to an unparalleled range of information about business analytics—with over 60 sessions to choose from—covering data discovery, data exploration and visualization, predictive analytics, content management and architecture, collaboration, information strategies, and much more. You can register for the conference by clicking on this link.

PASS BA Conference I am speaking

I am really honoured to be presenting along with such an impressive list of speakers. My session details are given below

GeoSpatial Analytics Using Microsoft BI

Speaker(s)Jason Thomas

Duration: 60 minutes

Track: Data Analytics and Visualization

In this age of big data, location matters more than many people imagine. Almost every set of data has some geographic information tied to it, and it is all the more pertinent to understand and unravel the patterns and trends behind them. This session will start by explaining why GeoSpatial analytics is important in today's world and then continue by exploring how we can share, analyze, and visualize data using the comprehensive Microsoft BI toolset, including Power View, GeoFlow, SQL Server Reporting Services, and more. The session will also feature practical demos for each of these tools, during which we will work on some of the publicly available datasets to uncover some interesting facts.

Don’t forget to drop in and say a hi to me, in case you are attending the conference.

Thursday, January 17, 2013

Row Selection Using Slicers in PowerPivot - Part 2

This post is a continuation of the Row Selection Using Slicers in PowerPivot – Part 1 (so it might be a good idea to read that post before this). In our last post, we discussed the basic solution pattern for row selection using slicers in PowerPivot. In this post, we will discuss two other scenarios which are slightly more complex but still uses the same pattern.

Row selection using slicers in PowerPivot part 2

Scenario III - GroupNames in multiple dimensions
Our data model for this scenario will have a Product table, a Customer table and a Fact table which shows the sales by Product and Customer. The requirement is to select either Product or Customer in the slicer and display the appropriate rows in the pivot table.

Scenario 3 - data

1) As per the solution steps, we first have to create the Group table

Group table for scenario 3

Note that we need two grain columns here, as the GroupNames belong to two dimensions – Product and Customer. The GroupName will be Customer and Product, and the GroupValues will be the actual product or customer values depending on the GroupName. As with the previous posts, I use MS Query to generate this table.

MS Query scenario 3 

SELECT Expr1000 AS GroupName, Customer1 AS GroupValue, ProductKey1 AS Product, Customer FROM
(
SELECT 'Customer' , C.Customer1, `Sheet1$`.ProductKey1, `Sheet1$`.Customer
FROM `Sheet1$` `Sheet1$`
INNER JOIN `Sheet1$` C
ON `Sheet1$`.Customer=C.CustomerKey
UNION ALL
SELECT 'Product', P.Product, `Sheet1$`.ProductKey1 , `Sheet1$`.Customer
FROM `Sheet1$` `Sheet1$`
INNER JOIN `Sheet1$` P
ON `Sheet1$`.ProductKey1=P.ProductKey
)

2) Add this new table (Group) to the data model and then relate the Product & Customer column in this table to the ProductKey column  in the Product table & CustomerKey column in the Customer table.

Scenario 3 data model

3) Hide the original measure Sales, and then create a new measure with the formula given below

NewSales:=CALCULATE(SUM(Fact[Sales]), 'Group')

4) Check out the results by creating a pivot table with GroupValue on the rows, NewSales on Values and GroupName in Slicers.

Scenario 3 results

Download the Excel 2013 file with the solution from here.

Scenario IV – Hybrid Scenario 
In this scenario, we are going to combine all the different scenarios we discussed till now and see how to handle that. The data model for this scenario will contain a Fact table which shows sales by Product and Customer, a Product table with the Product and SubCategory information, a Customer table and a Promotions table which has a many to many relationship with Product. The requirement is to have SubCategory (Scanario I), Promotions (Scenario II) and Customer (Scenario III) on slicers.

Scenario 4 data

1) As usual, the first step is to generate the Group table.

Group table for scenario 4

Note that even though we have only 4 subcategories, we end up with 6 rows because we will have to ensure that the grain is at a Product and Customer level. Similarly, we end up with  9 rows for Promotions when we convert the granularity of the Promotions table from just Product to the Product and Customer level. Find below the MS Query way of generating this table

MS Query for scenario 4

SELECT Expr1000 AS GroupName, SubCategory AS GroupValue, ProductKey1 AS Product, Customer FROM
(
SELECT 'SubCategory',S.SubCategory, `Sheet1$`.ProductKey1, `Sheet1$`.Customer
FROM `C:\Users\Jason\Desktop\Blog\RowSelection\Scenario4_Hybrid.xlsx`.`Sheet1$` `Sheet1$`
INNER JOIN `C:\Users\Jason\Desktop\Blog\RowSelection\Scenario4_Hybrid.xlsx`.`Sheet1$` `S`
ON `Sheet1$`.ProductKey1=S.ProductKey
UNION ALL
SELECT 'Promotions',P.Promotions, P.Product1, S.Customer
FROM `C:\Users\Jason\Desktop\Blog\RowSelection\Scenario4_Hybrid.xlsx`.`Sheet1$` `P`
INNER JOIN `C:\Users\Jason\Desktop\Blog\RowSelection\Scenario4_Hybrid.xlsx`.`Sheet1$` `S`
ON P.Product1=S.ProductKey1
UNION ALL
SELECT 'Customer', Customer, ProductKey1, Customer
FROM `C:\Users\Jason\Desktop\Blog\RowSelection\Scenario4_Hybrid.xlsx`.`Sheet1$`
WHERE Customer IS NOT NULL
)

2) Add this new table (Group) to the data model and then relate the Product & Customer column in this table to the ProductKey column  in the Product table & CustomerKey column in the Customer table.

Scenario 4 data model

3) Hide the original measure Sales, and then create a new measure with the formula given below

NewSales:=CALCULATE(sum(Fact[Sales]), 'Group', Promotions)

Also note the use of the Promotions table (which is the many to many table) in the filter part of the calculate statement.

4) Check out the results by creating a pivot table with GroupValue on the rows, NewSales on Values and GroupName in Slicers.

Scenario 4 results

Download the Excel 2013 file with the solution from here.

With that, we come to the end of this 2 part series on how to change the rows using slicers in PowerPivot. As you would have seen, the bulk of the work lies in generating the Group table and the rest is pretty simple. Hope these scenarios will help you in case you ever need to do the same.

Tuesday, January 15, 2013

Row Selection Using Slicers in PowerPivot - Part 1

A couple of months back, I had written a post on ‘Measure selection using Slicers in PowerPivot’ and this turned out to be one of my most popular ones in the PowerPivot / DAX category. It was just a matter of time before someone asked how to do the same with rows or dimension attributes using slicers in PowerPivot. And the case supporting it is also becoming strong with more and more dashboards being built in Power View and Excel. Well, as usual, this functionality is not available out of the box but you could simulate it with a little change to your data model. Read on for the solution.

Row Selection using Slicers in PowerPivot Part 1

I will be first briefly explain the solution pattern and then apply it to a couple of scenarios.

Solution
1) A new table should be built and added to the existing data model. The new table should have at least 3 columns
- GroupName : the text displayed in the slicers (for e.g., Product, Customer, etc.)
- GroupValue : the actual values that should be displayed in the rows on selecting the slicer values (for e.g., if Product is selected in the slicer for GroupName, then GroupValues will be the Product values)
- Grain of the Tables
If all the GroupNames displayed in the slicer belong to the same dimension (for eg, Product, Category and SubCategory in the Product dimension), then only one column is required which would be the grain of the dimension (to be precise, the column of the table linked to the facts which is usually the grain of the dimension). If multiple dimensions are involved in the GroupNames (for eg, Product and Customer), then  the number of columns increases based on the grain of the participating dimensions.For each GroupName, all valid combinations of GroupValue and Grain columns should be generated.
2) The grain columns should then be related to the dimensions.
3) Create new calculated measures for each of the required measure by passing the new table in the filter part of a calculate statement and then use them in the pivot tables or charts.

Scenario I - GroupNames in one dimension and 1-to-Many Relationship
Our data model for this scenario will have a Product table and a Fact table which shows the sales by Product and Customer. The Product table has ProductKey, Product, SubCategory and Category columns. Our requirement is that we should be able to select Product, SubCategory or Category in the slicers and display the selected values on the row of the pivot table. 

Source tables

1) As per the solution steps, we will have to create a new table as shown below

Group table for Scenario 1

Since the slicers need to display the three levels, the GroupName values will be Category, SubCategory and Product. The granularity is at the Product level and hence that will be the third column. Note that for each value in GroupName, there should be a row mapping the GroupValue to the grain column (which is Product here). This sort of table is very easy to create if the source is a relational database. For this demo, I used linked tables and hence to automate the creation of this table, I had to use Microsoft Query as a datasource to pull data from the linked tables.

MS Query for Scenario 1

SELECT Expr1000 AS GroupName, Category AS GroupValue, ProductKey AS Product FROM (
SELECT 'Category' , 'Sheet1$'.Category , 'Sheet1$'.ProductKey
FROM 'Sheet1$' 'Sheet1$'
WHERE 'Sheet1$'.ProductKey  IS NOT NULL
UNION ALL
SELECT 'SubCategory' , 'Sheet1$'.SubCategory , 'Sheet1$'.ProductKey
FROM 'Sheet1$' 'Sheet1$'
WHERE 'Sheet1$'.ProductKey  IS NOT NULL
UNION ALL
SELECT 'Product' , 'Sheet1$'.Product , 'Sheet1$'.ProductKey
FROM 'Sheet1$' 'Sheet1$'
WHERE 'Sheet1$'.ProductKey  IS NOT NULL
)

(Please note that I am not a MS query datasource expert and I had to workaround getting the column alias names by bringing the outermost select statement. This should have been a simple UNION ALL in SQL Server). If there is any data change in the linked tables, the Refresh button should be clicked twice in the Excel window (so that the Query can populate the changed data in the new table as well as refresh the PowerPivot model with the changed data).

2) Add this new table (Group) to the data model and then relate the Product column in this table to the ProductKey column in the Product table.

Data Model for Scenario 1

3) Hide the original measure Sales, and then create a new measure with the formula given below

NewSales:=CALCULATE(SUM(Fact[Sales]), 'Group')

4) Check out the results by creating a pivot table with GroupValue on the rows, NewSales on Values and GroupName in Slicers.

End result for Scenario 1

Download the Excel 2013 file with the solution from here.

Scenario II - GroupNames in one dimension and Many-to-Many Relationship
Our data model for this scenario will have a Product table, Fact table which shows the sales which shows the sales by Product and Customer, and two tables – Sales Rep and Promotions which will have a many to many relationship with Product.

Source tables for scenario 2

1) The Group table should have the below format

 Group table for Scenario 2

To create the Group table, we again use Microsoft Query as a datasource to pull data from the linked tables.

MS Query for Scenario 2

SELECT Expr1000 AS GroupName, SalesRep AS GroupValue, Product1 AS Product FROM (
SELECT 'SalesRep', `Sheet1$`.SalesRep, `Sheet1$`.Product1
FROM `Sheet1$` `Sheet1$`
WHERE `Sheet1$`.Product1  IS NOT NULL
UNION ALL
SELECT 'Promotions' , `Sheet1$`.Promotions , `Sheet1$`.Product2
FROM `Sheet1$` `Sheet1$`
WHERE `Sheet1$`.Product2  IS NOT NULL
)

2) Add this new table (Group) to the data model and then relate the Product column in this table to the ProductKey column in the Product table.

Data model for Scenario 2 

3) Hide the original measure Sales, and then create a new measure with the formula given below

NewSales:=CALCULATE(sum(Fact[Sales]), 'Group', 'Promotions', 'Rep')

4) Check out the results by creating a pivot table with GroupValue on the rows, NewSales on Values and GroupName in Slicers.

End result for Scenario 2

Download the Excel 2013 file with the solution from here.

I have got two more scenarios to show but guess that will have to wait till the second part of this series – Row Selection Using Slicers in PowerPivot – Part 2.

Note
There are some unique concepts here which might raise questions like
- why didn't I use DAX queries to seed my Group table? Well, I couldn't find a easy way to do union in DAX and that is the reason why I used MS query. It might not be the best way but it sure is an option.
- Is there any other way to avoid the double refresh of the file? I couldn't find any other option to do so (actually, you will need to do that even if you are using DAX queries to seed the Group table). That is, if you are avoiding VBA. Javier Guillen has written a post on how to do it with a single refresh using VBA (check out the final part of his blog for the VBA code).

Monday, January 7, 2013

Group Multiple Rows to Single Delimited Row in PowerPivot

When PowerPivot was announced a couple of years back, I was one among the many who started feeling insecure on having to learn a new trade. I could have sat there and kept on cribbing, or picked myself up and started my baby steps towards the new technology. As Hillary Clinton said, the challenges of change are always hard. It is important that we begin to unpack those challenges and realize that we each have a role that requires us to change and become more responsible for shaping our own future. I am glad that I gave PowerPivot a chance and kept on learning (it sure helps sitting next to a passionate PowerPivot supporter like Javier Guillen!), else I wouldn’t have realized how much useful it is to me. Agreed that it has it’s fair sets of limitations, but as the community grows and the technology matures, we will see that many of it gets resolved or have workarounds. For eg, if you had asked me before on whether there was any way in PowerPivot to group all the row values into a single delimited row, I would have said no. But seeing this question in the PowerPivot forum, I decided to spend a little more time in the weekend and found a workaround, which I am sharing through this post.

Group Multiple Rows to Single Delimited Row in PowerPivot

Scenario
Let us say that we have a table which shows the sales by Country and State. The result should be a table which shows the sum of sales by country along with a delimited row of states as shown below

Scenario

Solution
There is only one function (atleast as far as I know) in PowerPivot that returns a delimited list from multiple rows – PATH(). But the issue is that PATH() works only with a parent child hierarchy. So the trick for getting this piece done would be to build an artificial parent child hierarchy. Follow the steps below to get the solution:-

1) Make a calculated column called Rnk which will generate a running number of the states within a country.

=RANKX (
  FILTER ( All ( Table1 ), Table1[Country] = EARLIER ( Table1[Country] ) ),
    Table1[State],    ,    1,   DENSE
  )

This will allow us to uniquely identify each state value within a country.

Rank States within country using RANKX

2) Make a calculated column called ParRnk with the formula below

=IF (
  Table1[Rnk] <> 1,
  Table1[Rnk] - 1
)

This will allow us to get the parent node id of the parent child hierarchy.

Parent Rank

3) Now, make a calculated column called ParState which will get the Parent State for the current row (based on the ParRnk column).

=CALCULATE (
FIRSTNONBLANK ( Table1[State], 1 ),
FILTER (
ALLEXCEPT ( Table1, Table1[Country] ),
    Table1[Rnk] = EARLIER ( Table1[ParRnk] )
  )
)

Get value of State for the parent

4) The final part of the solution is to make the delimited list column by using the path function where State is the child node and ParState is the parent node. But if we directly apply the PATH function in each row, we will not get the intended results as you can see in the image below.

PATH() without passing the relevant context

What we need is the delimited list value for the row with the maximum rank in each country applied across all other rows in that country. For that, make a calculated column called ConcatStates with the formula below

=CALCULATE (
PATH ( Table1[State], Table1[ParState] ),
CALCULATETABLE (
FILTER ( Table1, Table1[Rnk] = Max ( Table1[Rnk] ) ),
ALLEXCEPT ( Table1, Table1[Country] )
  )
)

Now the results should look like below

PATH() with the correct table context

Now make a new pivot table and drag and drop the Country, ConcatStates and Sales fields to see the intended results

solution

Wasn’t that a fascinating workaround? If you want to play around, you can download the excel file from here.

Thursday, January 3, 2013

2012 in a Nutshell

No, the world didn’t end in 2012 as many expected (looks like the Mayan developer was just damn lazy and decided to take a nap after entering till 2012 in the calendar. Guess who got fired after that!). It took a few days for me to get over the disappointment that I really have to pay off my car loan now, lucky I didn’t get a Ferrari thinking the world would end. Jokes apart, 2012 has been a good year for me and I thought of noting down a few things that happened over the last year and setting new goals.

image

 

1) New blog domain - www.SqlJason.com and Twitter handle – @SqlJason
I finally took the plunge and bought my blog a new domain. I had been thinking about this for a long time and finally did it in November. This also meant that my blog lost it’s existing page rank and my stats have fallen down, but I guess it is going to pay off in the long run. On the positive side, it’s much easier to communicate my blog url in presentations. So for all my readers, make sure that you update the links from www.road-blogs.blogspot.com to www.SqlJason.com. I have set up a redirect from the old url to the new one, so the existing links will also work.

I also changed my twitter handle from @de_unparagoned to @SqlJason. This means that I don’t get questions anymore on what de_unparagoned means. Also, makes communicating it easier.

2) Presentations and Talks
This year was crazy in terms of the presentations that I did. When I came to the US this year, I had a goal to do more presentations than I did in the previous year (which was 2 – one for the London UG and one for SQLBits). Well, I am happy to say that I presented in 8 SQLSaturdays, 1 Code Camp, 1 BI Virtual Chapter and 1 UG.

1.    Dec 08, 2012 - SQLSaturday #173 (Washington DC 2012)
2.    Sep 29, 2012 - SQLSaturday #149 (Minnesota 2012)
3.    Aug 04, 2012 - SQLSaturday #158 (New York City 2012)
4.    Jul 21,  2012 - SQLSaturday #122 (Louisville 2012)
5.    Jun 26, 2012 – PASS BI Virtual Chapter (online)
6.    Jun 09, 2012 - SQLSaturday #121 (Philadelphia 2012)
7.    May 12, 2012- SQLSaturday #129 (Rochester 2012)
8.    May 05, 2012- Carolina Code Camp (Charlotte 2012)
9.    Apr 28, 2012 - SQLSaturday #130 (Jacksonville 2012)
10.   Apr 21, 2012 - SQLSaturday #118 (Madison 2012)
11.   Apr 19, 2012 – Charlotte User Group (Charlotte 2012)

Thanks to all the organizers and attendees, I got a lot of love from all of you and it was a wonderful experience (in fact I got selected at all of the SQL Saturdays that I submitted to). I intend to keep on doing more of SQL Saturdays in 2013 though I am not sure whether I can match up to the previous year because of personal reasons. One of my other goals would be to prepare better and start trying for the big conferences like PASS Summit, PASS BA Conference, etc though I am not sure how that will turn up due to the immense quality of presentations and speakers.

3) Blogging
Blogging has also been pretty consistent and I wrote 33 posts - one more than I did last year. From a numbers perspective, I had 112,614 page views this year (which is a big jump as I just have ~178,000 page views totally) and this is excluding the views that I get from my syndicated posts at other sites like www.BeyondRelational.com, www.ssas-info.com, etc. Also, my feedburner reader count has almost doubled to 350+ which is good. My goal for this year would be to write at least one post more than the previous year.

4) Organized a SQLSaturday and started a new User Group
This year, I was able to get more involved with the community and I ended up organizing a SQL Saturday in Charlotte along with Javier Guillen, Rafael Salas and Melissa Coates. This was a great experience and all of us had a lot of fun doing that. We also started a new BI group in Charlotte - CBIG (www.CharBIGroup.com) and our first meeting will take place next week. If you live in or around Charlotte, make sure that you register and drop in for our fun meetings.

5) MSDN Forums
I started my community involvement through the MSDN forums 3 years back and that is where I discovered my passion for MSBI and community in general. This is one part where I would like to keep going and helping as far as I can. I had set a goal for myself to break the 10,000 points barrierthis year  and I am glad to say that I did it by participating in the SSRS, SSAS and PowerPivot forums. I got 2300+ points this year (total of 10,100+ points) from 140+ answers and though it is nowhere near to the points I racked in my first year, I think this is a reasonable target for next year due to the additional community involvement.

6) PASS Summit
Attending the PASS Summit was one of my goals ever since I started knowing about this community, and again, I am happy to say that I did it this year for the very first time, thanks to my employer – Mariner, who sponsored it for me. It was such a wonderful experience meeting #sqlfamily and attending sessions (in fact I registered for the coming PASS Summit also, which is going to be held in the city that I am based out of - Charlotte).

7) Certifications – MCSE in SQL Server BI 2012
Thanks to the beta invitations, I was able to complete my MCSE certification in BI 2012. There were 5 exams which included some basic DBA  topics (which I am not at all familiar with) but guess what, I ended up passing all of those exams in my first try itself. You can read more about my certification experience here.

All in all, 2012 was a great year for me and I hope 2013 is going to be kind on me… And also for all of you…