Pages

Tuesday, March 12, 2013

Unpivoting Data in Data Explorer

Data Explorer is generating a lot of positive buzz within the community and everyone I speak to seems to be really excited about this. (If you are wondering what Data Explorer is, you might want to quickly catch up on my previous post - Introduction to Data Explorer Preview for Excel). And just today, the Data Explorer team announced A new build of “Data Explorer”, and an Auto Update feature (must say I really like the idea of Update button). This was followed by a post from Jamie Thomson aka SSIS Junkie (blog | twitter) on the query language ( M ) in Data Explorer and you can read more about that here. All this made me really excited and I also decided to contribute something through this post.

Unpivot rows in data explorer

It is a pretty common requirement to unpivot data, especially when you are scourging the net for open data. Data Explorer currently does not have any feature to unpivot data but the query language looked pretty solid and I decided to give it a try (actually, this exercise started as a way to test the new query editing functionality in Data Explorer). Well, I did succeed though it might not look that easy (if you do find an easier way, please share it!). Read on for the solution:-

1) The source for my post is given below

source data

I just made a simple table in the excel spreadsheet which gives the Sales by State for the years 2010 to 2012. I then click on the From Table option in Data Explorer tab to get the query window as shown below.

Import data from excel

2) I add a new column called JCol with a value of 1 using the expression below

= Table.AddColumn(Excel.CurrentWorkbook(){[Name="Table37"]}[Content], "JCol", each 1)

Add join column to first table 

Now this is my first table, and I store the expression with me.

3) My next objective is to make a transpose of the original table. This can be done with the expression below

= Table.PromoteHeaders(Table.Transpose(Table.DemoteHeaders(Excel.CurrentWorkbook(){[Name="Table37"]}[Content])))

Transpose table

4) Now we need to rename the State Column to Year.

= Table.RenameColumns(Table.PromoteHeaders(Table.Transpose(Table.DemoteHeaders(Excel.CurrentWorkbook(){[Name="Table37"]}[Content]))), {"State","Year"})

Rename column

5) Now, we need to add a column to this resultant table called JCol (just as we did to the first table in Step 2).

= Table.AddColumn(Table.RenameColumns(Table.PromoteHeaders(Table.Transpose(Table.DemoteHeaders(Excel.CurrentWorkbook(){[Name="Table37"]}[Content]))), {"State","Year"}), "JCol", each 1)

Add join column to second table

6) You might be wondering at this stage why we created the JCol column in both the tables. The reason is that we need to cross join both of these tables so that we get extra rows, and the join column is going to be JCol. Since the values for the join column are all the same, we get a cross join. To do the join of both the tables, delete the existing expression and enter the expression below

= Table.Join(Table.AddColumn(Excel.CurrentWorkbook(){[Name="Table37"]}[Content], "JCol", each 1), "JCol", Table.AddColumn(Table.RenameColumns(Table.PromoteHeaders(Table.Transpose(Table.DemoteHeaders(Excel.CurrentWorkbook(){[Name="Table37"]}[Content]))), {"State","Year"}), "JCol", each 1), "JCol")

Join both the table expressions

7) Now add a new calculated column which will give you the value in NC column if State is NC, SC column if State is SC and FL column if State is FL.

Add custom column for Sales

The expression for the calculated column is given below

if [State] = "NC" then [NC] else if [State]= "SC" then [SC] else [FL]

custom column expression

Now your result should look like shown below

end result of custom calculation

8) Now all you have to do is to rename the Custom column to Sales and hide the unnecessary columns. Then you will be able to see the unpivoted data as shown below

rename and hide unnecessary columns

9) Also, have a look at how the actual query looks like in the advanced query editor

Advanced query editor

I would like to see a word-wrap option for the advanced query editor so that I don’t have to scroll over to the right to see long formulas. Apart from that, looks great! Don’t forget to share your comments as well as your posts / experiences with Data Explorer and it’s query language.

Update

I got quite some requests asking for the actual query. So here it is

let
    Source = Table.Join(Table.AddColumn(Excel.CurrentWorkbook(){[Name="Table37"]}[Content], "JCol", each 1), "JCol", Table.AddColumn(Table.RenameColumns(Table.PromoteHeaders(Table.Transpose(Table.DemoteHeaders(Excel.CurrentWorkbook(){[Name="Table37"]}[Content]))), {"State","Year"}), "JCol", each 1), "JCol"),
   InsertedCustom = Table.AddColumn(Source, "Custom", each if [State] = "NC" then [NC] else if [State]= "SC" then [SC] else [FL]),
    RenamedColumns = Table.RenameColumns(InsertedCustom,{{"Custom", "Sales"}}),
    HiddenColumns = Table.RemoveColumns(RenamedColumns,{"2010", "2011", "2012", "JCol", "NC", "SC", "FL"})
in
    HiddenColumns

Remember to change the name of the table name when you use it.

6 comments:

  1. Hi Jason,
    Nice post. One suggestion, instead of sharing a screenshot of the query why not copy/paste the query text into this blog post. The beauty of being able to access the textual query is that its easy to share it - if you put in on the blog anyone can paste it into their own Data Explorer query and straightaway they're partying on the same query.

    Regards
    Jamie

    ReplyDelete
    Replies
    1. Thanks Jamie :)
      And yes, I have posted the query now.

      Delete
  2. Alternatively, you can group the table by [State], and use "All Rows" as an aggregation. This will return a table with the original [State] column and a new table column.

    You can use Table.TransformColumns to modify the nested tables:
    - remove the [State] column
    - demote headeres
    - transpose

    Then expand the Table column.

    The process also works with minor changes to pivot data (except you are in trouble if some values are the same as one of your column headers).

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Even easier, get Tableau's free Excel data reshaping tool add-in. Put your cursor on the field where you want to start unpivoting and hit a button.

    ReplyDelete
  5. Jason, great stuff, thanks! How would you recommend adapting this to very large tables? Specifically, I'm working with csv files with hundreds of thousands of rows.

    ReplyDelete