{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SV-Softmax math" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "from tensorflow.python.ops import array_ops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Support vector guided softmax loss - is a novel loss function which adaptively emphasizes the mis-classified points (support vectors) to guide the discriminative features learning. It makes it close to hard negative mining and the Focal loss techniques.\n", "\n", "Let's define a binary mask to adaptively indicate whether a sample is selected as the support vector by a specific classifier in the current stage. To the end, the binary mask is defined as follows:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "I_k = \\left\\{\n", " \\begin{array}{ll}\n", " 0, & \\quad \\cos(\\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) \\ge 0 \\\\\n", " 1, & \\quad \\cos(\\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) < 0\n", " \\end{array}\n", " \\right.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where $\\cos(\\theta_{w_k}, x) = w_k^Tx$ is the cosine similarity and $θ_{w_k,x}$ is the angle between $w_k$ and $x$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Logits:\n", "[[ 2. 3. 1. -1. ]\n", " [-1. 2.1 2. 6. ]\n", " [-2. 3. 4. -2.1]]\n", "\n", "GT:\n", "[[0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [1. 0. 0. 0.]]\n", "\n", "Binary mask:\n", "[[0. 0. 0. 0.]\n", " [0. 1. 0. 1.]\n", " [0. 1. 1. 0.]]\n" ] } ], "source": [ "# placeholders\n", "logits = tf.placeholder(tf.float32)\n", "y_true = tf.placeholder(tf.float32)\n", "\n", "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n", "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n", "\n", "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n", "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n", "\n", "with tf.Session() as sess:\n", " logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n", " y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n", " binary_mask = (sess.run(I_k, feed_dict={logits: logits_array, y_true: y_true_array}))\n", " \n", "print(\"Logits:\")\n", "print(logits_array)\n", "print('')\n", "print(\"GT:\")\n", "print(y_true_array)\n", "print('')\n", "print(\"Binary mask:\")\n", "print(binary_mask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also define indicator function $h(t, θ_{w_k}, x, I_k)$ with preset hyperparameter t:\n", "\n", "$$h(t, θ_{w_k}, x, I_k) = e^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k}$$\n", "\n", "Obviously, when t = 1, the designed SV-Softmax loss becomes identical to the original softmax loss. Let's implement it in a naive way." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "h with t=1.2:\n", "[[1. 1. 1. 1. ]\n", " [1. 1.8589282 1. 4.055201 ]\n", " [1. 2.2255414 2.7182825 1. ]]\n", "\n", "h with t=1.0:\n", "[[1. 1. 1. 1.]\n", " [1. 1. 1. 1.]\n", " [1. 1. 1. 1.]]\n" ] } ], "source": [ "# placeholders\n", "t = tf.placeholder(tf.float32)\n", "s = tf.placeholder(tf.float32)\n", "logits = tf.placeholder(tf.float32)\n", "y_true = tf.placeholder(tf.float32)\n", "epsilon = 1.e-9\n", "\n", "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n", "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n", "\n", "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n", "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n", "\n", "h = tf.exp(s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k)))\n", "\n", "# Let's check\n", "logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n", "y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n", " \n", "with tf.Session() as sess:\n", " h_array_12 = (sess.run(h, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n", " h_array_1 = (sess.run(h, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", " \n", "print(\"h with t=1.2:\")\n", "print(h_array_12)\n", "print('')\n", "print(\"h with t=1.0:\")\n", "print(h_array_1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Full loss is formulated:\n", "$$\\mathcal{L} = -log\\frac{e^{s\\cos(\\theta_{w_y}, x)}}{e^{s\\cos(\\theta_{w_y}, x)}+\\sum_{k\\ne y}^Kh(t, θ_{w_k}, x, I_k)e^{s\\cos(\\theta_{w_k, x})}}$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Softmax with t=1.2:\n", "[[2.4178252e-01 6.5723300e-01 8.8946812e-02 1.2037643e-02]\n", " [2.2175812e-04 4.9225753e-03 4.4541308e-03 2.4318731e-01]\n", " [6.9986947e-04 1.0386984e-01 2.8234750e-01 6.3326815e-04]]\n", "\n", "Softmax with t=1.0:\n", "[[2.4178252e-01 6.5723300e-01 8.8946812e-02 1.2037643e-02]\n", " [8.7725709e-04 1.9473307e-02 1.7620180e-02 9.6202922e-01]\n", " [1.8058796e-03 2.6801631e-01 7.2854382e-01 1.6340276e-03]]\n", "\n", "Pure softmax:\n", "[[2.4178253e-01 6.5723306e-01 8.8946819e-02 1.2037644e-02]\n", " [8.7725703e-04 1.9473307e-02 1.7620180e-02 9.6202922e-01]\n", " [1.8058793e-03 2.6801628e-01 7.2854376e-01 1.6340273e-03]]\n", "\n", "Maximum absolute error between our and tf sodtmax:\n", "5.9604645e-08\n", "\n", "Loss with t=1.2:\n", "4.366085\n", "\n", "Loss with t=1.0:\n", "3.5917118\n", "\n", "tf loss with t=1.0:\n", "3.5917118\n" ] } ], "source": [ "# placeholders\n", "t = tf.placeholder(tf.float32)\n", "s = tf.placeholder(tf.float32)\n", "logits = tf.placeholder(tf.float32)\n", "y_true = tf.placeholder(tf.float32)\n", "epsilon = 1.e-9\n", "\n", "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n", "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n", "\n", "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n", "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n", "\n", "h = tf.exp(s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k)))\n", "\n", "\n", "softmax = tf.exp(s * logits) / (tf.reshape(\n", " tf.reduce_sum(tf.multiply(tf.exp(s * logits), h), axis=-1, keepdims=True), \n", " [-1, 1]) + epsilon)\n", "\n", "tf_softmax = tf.nn.softmax(logits)\n", "\n", "# Let's check softmax\n", "logits_array = np.array([[2., 3., 1., -1.], [-1., 2.1, 2., 6], [-2., 3., 4, -2.1]])\n", "y_true_array = np.array([[0., 1., 0., 0], [0., 0., 1., 0], [1., 0., 0., 0.]])\n", " \n", "with tf.Session() as sess:\n", " softmax_array_12 = (sess.run(softmax, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n", "\n", "with tf.Session() as sess:\n", " softmax_array_1 = (sess.run(softmax, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", "\n", "with tf.Session() as sess:\n", " tf_softmax_array = (sess.run(tf_softmax, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", " \n", "print(\"Softmax with t=1.2:\")\n", "print(softmax_array_12)\n", "print('')\n", "print(\"Softmax with t=1.0:\")\n", "print(softmax_array_1)\n", "print('')\n", "print(\"Pure softmax:\")\n", "print(tf_softmax_array)\n", "print('')\n", "print(\"Maximum absolute error between our and tf sodtmax:\")\n", "print(abs((tf_softmax_array-softmax_array_1)).max())\n", "print('')\n", "\n", "# Full loss:\n", "softmax = tf.add(softmax, epsilon)\n", "ce = tf.multiply(y_true, -tf.log(softmax))\n", "ce = tf.reduce_sum(ce, axis=1)\n", "ce = tf.reduce_mean(ce)\n", "\n", "# tf loss:\n", "tf_ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits)\n", "tf_ce = tf.reduce_mean(tf_ce)\n", "\n", "with tf.Session() as sess:\n", " loss_12 = (sess.run(ce, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n", " loss_1 = (sess.run(ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", " loss_tf = (sess.run(tf_ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", "\n", "print(\"Loss with t=1.2:\")\n", "print(loss_12)\n", "print('')\n", "print(\"Loss with t=1.0:\")\n", "print(loss_1)\n", "print('')\n", "print(\"tf loss with t=1.0:\")\n", "print(loss_tf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As your remember, $h(t, θ_{w_k}, x, I_k) = e^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k}$, so we can rewrite our loss in this way: $$\\mathcal{L} = -log\\frac{e^{s\\cos(\\theta_{w_y}, x)}}{e^{s\\cos(\\theta_{w_y}, x)}+\\sum_{k\\ne y}^Ke^{s(t−1)(\\cos(\\theta_{w_k, x})+1)I_k+s\\cos(\\theta_{w_k, x})}}$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loss with t=1.2:\n", "4.3660855\n", "\n", "Loss with t=1.0:\n", "3.5917118\n" ] } ], "source": [ "# placeholders\n", "epsilon = 1.e-9\n", "s = tf.placeholder(tf.float32)\n", "t = tf.placeholder(tf.float32)\n", "m = tf.placeholder(tf.float32)\n", "logits = tf.placeholder(tf.float32)\n", "y_true = tf.placeholder(tf.float32)\n", "\n", "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n", "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n", "\n", "# score of groundtruth\n", "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n", "\n", "# binary mask for support vectors\n", "I_k = array_ops.where(logit_y >= logits, zeros, ones)\n", "\n", "# indicator function\n", "h = s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k))\n", "logits_h = tf.add(s * logits, h)\n", "ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits_h)\n", "ce = tf.reduce_mean(ce)\n", "\n", "with tf.Session() as sess:\n", " loss_12 = (sess.run(ce, feed_dict={t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n", " loss_1 = (sess.run(ce, feed_dict={t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", "\n", "print(\"Loss with t=1.2:\")\n", "print(loss_12)\n", "print('')\n", "print(\"Loss with t=1.0:\")\n", "print(loss_1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same results as with our previouse code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SV-Softmax loss semantically fuses the motivation of mining-based and margin-based losses into one framework, but from different viewpoints. Therefore, we can also absorb their strengths into our SV-Softmax loss. Specifically, to increase the mining range, we adopt the margin-based decision boundaries to indicate the support vectors. Consequently, the improved SV-X-Softmax loss can be formulated as: $$\\mathcal{L} = -log\\frac{e^{sf(m, \\theta_{w_y}, x)}}{e^{sf(m, \\theta_{w_y}, x)}+\\sum_{k\\ne y}^Kh(t, θ_{w_k}, x, I_k)e^{s\\cos(\\theta_{w_k, x})}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where X is the margin-based losses. It can be A-Softmax, AM-Softmax and Arc-Softmax etc. The indicator mask I k is re-computed according to margin-based decision boundaries. Specifically:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "I_k = \\left\\{\n", " \\begin{array}{ll}\n", " 0, & \\quad f(m, \\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) \\ge 0 \\\\\n", " 1, & \\quad f(m, \\theta_{w_y}, x) − \\cos(\\theta_{w_k}, x) < 0\n", " \\end{array}\n", " \\right.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's implement SV-AM-Softmax loss and done with it (for AM-Softmax we have $f(m, \\theta_{w_y}, x) = \\cos( \\theta_{w_y}, x) − m$)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loss with t=1.2:\n", "4.6436086\n", "\n", "Loss with t=1.0:\n", "3.8678675\n" ] } ], "source": [ "# placeholders\n", "epsilon = 1.e-9\n", "s = tf.placeholder(tf.float32)\n", "t = tf.placeholder(tf.float32)\n", "m = tf.placeholder(tf.float32)\n", "logits = tf.placeholder(tf.float32)\n", "y_true = tf.placeholder(tf.float32)\n", "\n", "zeros = array_ops.zeros_like(logits, dtype=logits.dtype)\n", "ones = array_ops.ones_like(logits, dtype=logits.dtype)\n", "\n", "# score of groundtruth\n", "logit_y = tf.reduce_sum(tf.multiply(y_true, logits), axis=-1, keepdims=True)\n", "\n", "# binary mask for support vectors\n", "I_k = array_ops.where(logit_y - m >= logits, zeros, ones)\n", "\n", "# I_k should be zero for GT score\n", "I_k = I_k * tf.cast(tf.not_equal(y_true, 1), tf.float32)\n", "\n", "# indicator function\n", "h = s * tf.multiply(t - 1., tf.multiply(logits + 1., I_k))\n", "\n", "logits_m = tf.add(s * (logits - m * y_true), h)\n", "ce = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true, logits=logits_m)\n", "ce = tf.reduce_mean(ce)\n", "\n", "with tf.Session() as sess:\n", " loss_12 = (sess.run(ce, feed_dict={m: 0.35, t: 1.2, s: 1, logits: logits_array, y_true: y_true_array}))\n", " loss_1 = (sess.run(ce, feed_dict={m: 0.35, t: 1., s: 1, logits: logits_array, y_true: y_true_array}))\n", "\n", "print(\"Loss with t=1.2:\")\n", "print(loss_12)\n", "print('')\n", "print(\"Loss with t=1.0:\")\n", "print(loss_1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }