{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating Subgroup Ranks in Power Query Using Table.AddRankColumn Function\n", "> Table.AddRankColumn is the latest addition to the M language. This post shows how you can use it to create subgroup rank column \n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- categories: [powerbi, M]\n", "- hide: false" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table.AddRankColumn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Table.AddRankColumn is the latest addition to the M language which lets you create ranks in Power Query directly. No more buffer-sort-add index column trick which was computationally expensive and could be less accurate. With this function you can specify one or more columns to rank by and also the order (ascending/descending). This is not a replacement for the `RANKX` function in DAX. If your rank order is not going to change or need it for validation purposes, this is a great function. \n", "\n", "Read the [official documentation](https://docs.microsoft.com/en-us/powerquery-m/table-addrankcolumn) for all the function parameters. But Reza Rad has the most [comprehensive overview](https://radacad.com/adding-pre-calculating-rank-in-power-bi-using-power-query) of the function with examples so definitely check it out." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">note: You need to upgrade to June 2022 version of Power BI to use it in Desktop. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subgroup Rank" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the example below, we have three products, their colors and the Units. `Rank1` column shows the rank by number of units for all the products in the ascending order. What we are looking for is the subgroup rank, i.e rank within each product type as shown in `Rank2` column. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProductColorUnitsRank1Rank2
1Product ARed454961
2Product AGreen5562102
0Product ABlack7635113
3Product AYellow15000124
4Product BBlack151321
5Product BRed296932
8Product BBlack386043
7Product BYellow540984
6Product BGreen549195
11Product CYellow60011
10Product CGreen426452
9Product CRed483973
\n", "
" ], "text/plain": [ " Product Color Units Rank1 Rank2\n", "1 Product A Red 4549 6 1\n", "2 Product A Green 5562 10 2\n", "0 Product A Black 7635 11 3\n", "3 Product A Yellow 15000 12 4\n", "4 Product B Black 1513 2 1\n", "5 Product B Red 2969 3 2\n", "8 Product B Black 3860 4 3\n", "7 Product B Yellow 5409 8 4\n", "6 Product B Green 5491 9 5\n", "11 Product C Yellow 600 1 1\n", "10 Product C Green 4264 5 2\n", "9 Product C Red 4839 7 3" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = (pd.read_clipboard()\n", " .assign(Units = lambda s:s['Units'].astype('int'))\n", " .assign(Rank1 = lambda s:s['Units'].rank().astype('int'))\n", " .assign(Rank2 = lambda s:s.groupby('Product')['Units'].transform('rank').astype('int'))\n", " .sort_values(['Product','Rank2'])\n", " \n", " )\n", "df\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is my walkthrough of how you can create subgroup ranks in Power Query.\n", "\n", ">youtube: https://youtu.be/ysDBHMbtXsk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Important Notes:\n", "\n", "1. As I mention in the video, this function is **not query foldable** so be sure to plan your steps to utilize folding first. If you don't know what is query folding, [watch my in-depth session](https://player.vimeo.com/video/714177719?h=466f2d1c1b) on query folding. \n", "\n", "2. When you create the rank column, it will always sort the table by the newly created rank column. This may not always be desired. For example if your table is sorted by date or a key column when imported, that sort order will be disrupted when you create the rank column. I am not sure if there is a way to prevent that from happening. This also has an unintended effect that vertipaq compression may not be optimal. As you know, cardinality of the column matters in [vertipaq compression](https://www.red-gate.com/simple-talk/databases/sql-server/bi-sql-server/vertipaq-optimization-and-its-impact-on-columnstore-compression/) but distribution of values matters as well. If the sort order is disrupted it may increase the size of the dataset. Be sure to check before and after. \n", "\n", "3. I haven't tested this on a large dataset but be sure to check the refresh time and dataset size after creating the rank column. \n", "\n", "4. Look at the `RankKind` [parameter](https://docs.microsoft.com/en-us/powerquery-m/rankkind-type) to specify how the ranking should be done." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }