{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating Subgroup Ranks in Power Query Using Table.AddRankColumn Function\n",
    "> Table.AddRankColumn is the latest addition to the M language. This post shows how you can use it to create subgroup rank column \n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- categories: [powerbi, M]\n",
    "- hide: false"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table.AddRankColumn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Table.AddRankColumn is the latest addition to the M language which lets you create ranks in Power Query directly. No more buffer-sort-add index column trick which was computationally expensive and could be less accurate. With this function you can specify one or more columns to rank by and also the order (ascending/descending). This is not a replacement for the `RANKX` function in DAX. If your rank order is not going to change or need it for validation purposes, this is a great function.  \n",
    "\n",
    "Read the [official documentation](https://docs.microsoft.com/en-us/powerquery-m/table-addrankcolumn) for all the function parameters. But Reza Rad has the most [comprehensive overview](https://radacad.com/adding-pre-calculating-rank-in-power-bi-using-power-query) of the function with examples so definitely check it out."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">note: You need to upgrade to June 2022 version of Power BI to use it in Desktop. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Subgroup Rank"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the example below, we have three products, their colors and the Units. `Rank1` column shows the rank by number of units for all the products in the ascending order. What we are looking for is the subgroup rank, i.e rank within each product type as shown in `Rank2` column.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Product</th>\n",
       "      <th>Color</th>\n",
       "      <th>Units</th>\n",
       "      <th>Rank1</th>\n",
       "      <th>Rank2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Product A</td>\n",
       "      <td>Red</td>\n",
       "      <td>4549</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Product A</td>\n",
       "      <td>Green</td>\n",
       "      <td>5562</td>\n",
       "      <td>10</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Product A</td>\n",
       "      <td>Black</td>\n",
       "      <td>7635</td>\n",
       "      <td>11</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Product A</td>\n",
       "      <td>Yellow</td>\n",
       "      <td>15000</td>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Product B</td>\n",
       "      <td>Black</td>\n",
       "      <td>1513</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Product B</td>\n",
       "      <td>Red</td>\n",
       "      <td>2969</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Product B</td>\n",
       "      <td>Black</td>\n",
       "      <td>3860</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Product B</td>\n",
       "      <td>Yellow</td>\n",
       "      <td>5409</td>\n",
       "      <td>8</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Product B</td>\n",
       "      <td>Green</td>\n",
       "      <td>5491</td>\n",
       "      <td>9</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Product C</td>\n",
       "      <td>Yellow</td>\n",
       "      <td>600</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Product C</td>\n",
       "      <td>Green</td>\n",
       "      <td>4264</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Product C</td>\n",
       "      <td>Red</td>\n",
       "      <td>4839</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Product   Color  Units  Rank1  Rank2\n",
       "1   Product A     Red   4549      6      1\n",
       "2   Product A   Green   5562     10      2\n",
       "0   Product A   Black   7635     11      3\n",
       "3   Product A  Yellow  15000     12      4\n",
       "4   Product B   Black   1513      2      1\n",
       "5   Product B     Red   2969      3      2\n",
       "8   Product B   Black   3860      4      3\n",
       "7   Product B  Yellow   5409      8      4\n",
       "6   Product B   Green   5491      9      5\n",
       "11  Product C  Yellow    600      1      1\n",
       "10  Product C   Green   4264      5      2\n",
       "9   Product C     Red   4839      7      3"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = (pd.read_clipboard()\n",
    "      .assign(Units = lambda s:s['Units'].astype('int'))\n",
    "      .assign(Rank1 = lambda s:s['Units'].rank().astype('int'))\n",
    "      .assign(Rank2 = lambda s:s.groupby('Product')['Units'].transform('rank').astype('int'))\n",
    "      .sort_values(['Product','Rank2'])\n",
    "           \n",
    "     )\n",
    "df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is my walkthrough of how you can create subgroup ranks in Power Query.\n",
    "\n",
    ">youtube: https://youtu.be/ysDBHMbtXsk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Important Notes:\n",
    "\n",
    "1. As I mention in the video, this function is **not query foldable** so be sure to plan your steps to utilize folding first. If you don't know what is query folding, [watch my in-depth session](https://player.vimeo.com/video/714177719?h=466f2d1c1b) on query folding. \n",
    "\n",
    "2. When you create the rank column, it will always sort the table by the newly created rank column. This may not always be desired. For example if your table is sorted by date or a key column when imported, that sort order will be disrupted when you create the rank column. I am not sure if there is a way to prevent that from happening. This also has an unintended effect that vertipaq compression may not be optimal. As you know, cardinality of the column matters in [vertipaq compression](https://www.red-gate.com/simple-talk/databases/sql-server/bi-sql-server/vertipaq-optimization-and-its-impact-on-columnstore-compression/) but distribution of values matters as well. If the sort order is disrupted it may increase the size of the dataset. Be sure to check before and after. \n",
    "\n",
    "3. I haven't tested this on a large dataset but be sure to check the refresh time and dataset size after creating the rank column. \n",
    "\n",
    "4. Look at the `RankKind` [parameter](https://docs.microsoft.com/en-us/powerquery-m/rankkind-type) to specify how the ranking should be done."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}