{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SV-Softmax math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.python.ops import array_ops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Support vector guided softmax loss - is a novel loss function which adaptively emphasizes the mis-classified points (support vectors) to guide the discriminative features learning. It makes it close to hard negative mining and the Focal loss techniques.\n",
    "\n",
    "Let's define a binary mask to adaptively indicate whether a sample is selected as the support vector by a specific classifier in the current stage. To the end, the binary mask is defined as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "I_k = \\left\\{\n",
    "        \\begin{array}{ll}\n",
    "            0, & \\quad \\cos(\\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) \\ge 0 \\\\\n",
    "            1, & \\quad \\cos(\\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) < 0\n",
    "        \\end{array}\n",
    "      \\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $\\cos(\\theta_{w_k}, x) = w_k^Tx$ is the cosine similarity and $θ_{w_k,x}$ is the angle between $w_k$ and $x$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Logits:\n",
      "[[ 2.   3.   1.  -1. ]\n",
      " [-1.   2.1  2.   6. ]\n",
      " [-2.   3.   4.  -2.1]]\n",
      "\n",
      "GT:\n",
      "[[0. 1. 0. 0.]\n",
      " [0. 0. 1. 0.]\n",
      " [1. 0. 0. 0.]]\n",
      "\n",
      "Binary mask:\n",
      "[[0. 0. 0. 0.]\n",
      " [0. 1. 0. 1.]\n",
      " [0. 1. 1. 0.]]\n"
     ]
    }
   ],
   "source": [
    "# placeholders\n",
    "logits = tf.placeholder(tf.float32)\n",
    "y_true = tf.placeholder(tf.float32)\n",
    "\n",
    "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n",
    "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n",
    "\n",
    "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n",
    "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n",
    "    y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n",
    "    binary_mask = (sess.run(I_k, feed_dict={logits: logits_array, y_true: y_true_array}))\n",
    "    \n",
    "print(\"Logits:\")\n",
    "print(logits_array)\n",
    "print('')\n",
    "print(\"GT:\")\n",
    "print(y_true_array)\n",
    "print('')\n",
    "print(\"Binary mask:\")\n",
    "print(binary_mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also define indicator function $h(t, θ_{w_k}, x, I_k)$ with preset hyperparameter t:\n",
    "\n",
    "$$h(t, θ_{w_k}, x, I_k) = e^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k}$$\n",
    "\n",
    "Obviously, when t = 1, the designed SV-Softmax loss becomes identical to the original softmax loss. Let's implement it in a naive way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "h with t=1.2:\n",
      "[[1.        1.        1.        1.       ]\n",
      " [1.        1.8589282 1.        4.055201 ]\n",
      " [1.        2.2255414 2.7182825 1.       ]]\n",
      "\n",
      "h with t=1.0:\n",
      "[[1. 1. 1. 1.]\n",
      " [1. 1. 1. 1.]\n",
      " [1. 1. 1. 1.]]\n"
     ]
    }
   ],
   "source": [
    "# placeholders\n",
    "t = tf.placeholder(tf.float32)\n",
    "s = tf.placeholder(tf.float32)\n",
    "logits = tf.placeholder(tf.float32)\n",
    "y_true = tf.placeholder(tf.float32)\n",
    "epsilon = 1.e-9\n",
    "\n",
    "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n",
    "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n",
    "\n",
    "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n",
    "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n",
    "\n",
    "h = tf.exp(s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k)))\n",
    "\n",
    "# Let's check\n",
    "logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n",
    "y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n",
    "    \n",
    "with tf.Session() as sess:\n",
    "    h_array_12 = (sess.run(h, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    h_array_1 = (sess.run(h, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    \n",
    "print(\"h with t=1.2:\")\n",
    "print(h_array_12)\n",
    "print('')\n",
    "print(\"h with t=1.0:\")\n",
    "print(h_array_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Full loss is formulated:\n",
    "$$\\mathcal{L} = -log\\frac{e^{s\\cos(\\theta_{w_y}, x)}}{e^{s\\cos(\\theta_{w_y}, x)}+\\sum_{k\\ne y}^Kh(t, θ_{w_k}, x, I_k)e^{s\\cos(\\theta_{w_k, x})}}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Softmax with t=1.2:\n",
      "[[2.4178252e-01 6.5723300e-01 8.8946812e-02 1.2037643e-02]\n",
      " [2.2175812e-04 4.9225753e-03 4.4541308e-03 2.4318731e-01]\n",
      " [6.9986947e-04 1.0386984e-01 2.8234750e-01 6.3326815e-04]]\n",
      "\n",
      "Softmax with t=1.0:\n",
      "[[2.4178252e-01 6.5723300e-01 8.8946812e-02 1.2037643e-02]\n",
      " [8.7725709e-04 1.9473307e-02 1.7620180e-02 9.6202922e-01]\n",
      " [1.8058796e-03 2.6801631e-01 7.2854382e-01 1.6340276e-03]]\n",
      "\n",
      "Pure softmax:\n",
      "[[2.4178253e-01 6.5723306e-01 8.8946819e-02 1.2037644e-02]\n",
      " [8.7725703e-04 1.9473307e-02 1.7620180e-02 9.6202922e-01]\n",
      " [1.8058793e-03 2.6801628e-01 7.2854376e-01 1.6340273e-03]]\n",
      "\n",
      "Maximum absolute error between our and tf sodtmax:\n",
      "5.9604645e-08\n",
      "\n",
      "Loss with t=1.2:\n",
      "4.366085\n",
      "\n",
      "Loss with t=1.0:\n",
      "3.5917118\n",
      "\n",
      "tf loss with t=1.0:\n",
      "3.5917118\n"
     ]
    }
   ],
   "source": [
    "# placeholders\n",
    "t = tf.placeholder(tf.float32)\n",
    "s = tf.placeholder(tf.float32)\n",
    "logits = tf.placeholder(tf.float32)\n",
    "y_true = tf.placeholder(tf.float32)\n",
    "epsilon = 1.e-9\n",
    "\n",
    "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n",
    "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n",
    "\n",
    "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n",
    "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n",
    "\n",
    "h = tf.exp(s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k)))\n",
    "\n",
    "\n",
    "softmax = tf.exp(s * logits) / (tf.reshape(\n",
    "                 tf.reduce_sum(tf.multiply(tf.exp(s * logits), h), axis=-1, keepdims=True), \n",
    "                 [-1, 1]) + epsilon)\n",
    "\n",
    "tf_softmax = tf.nn.softmax(logits)\n",
    "\n",
    "# Let's check softmax\n",
    "logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n",
    "y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n",
    "    \n",
    "with tf.Session() as sess:\n",
    "    softmax_array_12 = (sess.run(softmax, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    softmax_array_1 = (sess.run(softmax, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    tf_softmax_array = (sess.run(tf_softmax, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    \n",
    "print(\"Softmax with t=1.2:\")\n",
    "print(softmax_array_12)\n",
    "print('')\n",
    "print(\"Softmax with t=1.0:\")\n",
    "print(softmax_array_1)\n",
    "print('')\n",
    "print(\"Pure softmax:\")\n",
    "print(tf_softmax_array)\n",
    "print('')\n",
    "print(\"Maximum absolute error between our and tf sodtmax:\")\n",
    "print(abs((tf_softmax_array-softmax_array_1)).max())\n",
    "print('')\n",
    "\n",
    "# Full loss:\n",
    "softmax = tf.add(softmax, epsilon)\n",
    "ce = tf.multiply(y_true, -tf.log(softmax))\n",
    "ce = tf.reduce_sum(ce, axis=1)\n",
    "ce = tf.reduce_mean(ce)\n",
    "\n",
    "# tf loss:\n",
    "tf_ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits)\n",
    "tf_ce = tf.reduce_mean(tf_ce)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    loss_12 = (sess.run(ce, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    loss_1 = (sess.run(ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    loss_tf = (sess.run(tf_ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "\n",
    "print(\"Loss with t=1.2:\")\n",
    "print(loss_12)\n",
    "print('')\n",
    "print(\"Loss with t=1.0:\")\n",
    "print(loss_1)\n",
    "print('')\n",
    "print(\"tf loss with t=1.0:\")\n",
    "print(loss_tf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As your remember, $h(t, θ_{w_k}, x, I_k) = e^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k}$, so we can rewrite our loss in this way: $$\\mathcal{L} = -log\\frac{e^{s\\cos(\\theta_{w_y}, x)}}{e^{s\\cos(\\theta_{w_y}, x)}+\\sum_{k\\ne y}^Ke^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k+s\\cos(\\theta_{w_k, x})}}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss with t=1.2:\n",
      "4.3660855\n",
      "\n",
      "Loss with t=1.0:\n",
      "3.5917118\n"
     ]
    }
   ],
   "source": [
    "# placeholders\n",
    "epsilon = 1.e-9\n",
    "s = tf.placeholder(tf.float32)\n",
    "t = tf.placeholder(tf.float32)\n",
    "m = tf.placeholder(tf.float32)\n",
    "logits = tf.placeholder(tf.float32)\n",
    "y_true = tf.placeholder(tf.float32)\n",
    "\n",
    "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n",
    "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n",
    "\n",
    "# score of groundtruth\n",
    "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n",
    "\n",
    "# binary mask for support vectors\n",
    "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n",
    "\n",
    "# indicator function\n",
    "h = s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k))\n",
    "logits_h = tf.add(s * logits, h)\n",
    "ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits_h)\n",
    "ce = tf.reduce_mean(ce)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    loss_12 = (sess.run(ce, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    loss_1 = (sess.run(ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "\n",
    "print(\"Loss with t=1.2:\")\n",
    "print(loss_12)\n",
    "print('')\n",
    "print(\"Loss with t=1.0:\")\n",
    "print(loss_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same results as with our previouse code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SV-Softmax loss semantically fuses the motivation of mining-based and margin-based losses into one framework, but from different viewpoints. Therefore, we can also absorb their strengths into our SV-Softmax loss. Specifically, to increase the mining range, we adopt the margin-based decision boundaries to indicate the support vectors. Consequently, the improved SV-X-Softmax loss can be formulated as: $$\\mathcal{L} = -log\\frac{e^{sf(m, \\theta_{w_y}, x)}}{e^{sf(m, \\theta_{w_y}, x)}+\\sum_{k\\ne y}^Kh(t, θ_{w_k}, x, I_k)e^{s\\cos(\\theta_{w_k, x})}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where X is the margin-based losses. It can be A-Softmax, AM-Softmax and Arc-Softmax etc. The indicator mask I k is re-computed according to margin-based decision boundaries. Specifically:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "I_k = \\left\\{\n",
    "        \\begin{array}{ll}\n",
    "            0, & \\quad f(m, \\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) \\ge 0 \\\\\n",
    "            1, & \\quad f(m, \\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) < 0\n",
    "        \\end{array}\n",
    "      \\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's implement SV-AM-Softmax loss and done with it (for AM-Softmax we have $f(m, \\theta_{w_y}, x) = \\cos( \\theta_{w_y}, x) − m$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss with t=1.2:\n",
      "4.6436086\n",
      "\n",
      "Loss with t=1.0:\n",
      "3.8678675\n"
     ]
    }
   ],
   "source": [
    "# placeholders\n",
    "epsilon = 1.e-9\n",
    "s = tf.placeholder(tf.float32)\n",
    "t = tf.placeholder(tf.float32)\n",
    "m = tf.placeholder(tf.float32)\n",
    "logits = tf.placeholder(tf.float32)\n",
    "y_true = tf.placeholder(tf.float32)\n",
    "\n",
    "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n",
    "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n",
    "\n",
    "# score of groundtruth\n",
    "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n",
    "\n",
    "# binary mask for support vectors\n",
    "I_k = array_ops.where(logit_y - m >= logits, zeros, ones)\n",
    "\n",
    "# I_k should be zero for GT score\n",
    "I_k = I_k * tf.cast(tf.not_equal(y_true, 1), tf.float32)\n",
    "\n",
    "# indicator function\n",
    "h = s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k))\n",
    "\n",
    "logits_m = tf.add(s * (logits - m * y_true), h)\n",
    "ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits_m)\n",
    "ce = tf.reduce_mean(ce)\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    loss_12 = (sess.run(ce, feed_dict={m: 0.35, t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "    loss_1 = (sess.run(ce, feed_dict={m: 0.35, t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n",
    "\n",
    "print(\"Loss with t=1.2:\")\n",
    "print(loss_12)\n",
    "print('')\n",
    "print(\"Loss with t=1.0:\")\n",
    "print(loss_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}