{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Introduction to Transducers\n", "\n", "by Amit Ramon , 2019\n", "\n", "This Jupyter notebook contains a brief introduction to *Transducers* in *Clojure*. The purpose of it is to present the topic mainly from a user's perspective, trying not to get too deep into the gory implementation details." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### A Sample Task\n", "\n", "Assume we want to process the following data (sample of the [*Iris* flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set))" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "#'user/iris-data" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def iris-data\n", " [{:sepal_length 5.1 :sepal_width 3.5 :petal_length 1.4 :petal_width 0.2 :species \"setosa\"}\n", " {:sepal_length 6.9 :sepal_width 3.1 :petal_length 4.9 :petal_width 1.5 :species \"versicolor\"}\n", " {:sepal_length 4.7 :sepal_width 3.2 :petal_length 1.3 :petal_width 0.2 :species \"setosa\"}\n", " {:sepal_length 7.1 :sepal_width 3.0 :petal_length 5.9 :petal_width 2.1 :species \"virginica\"}\n", " {:sepal_length 4.6 :sepal_width 3.1 :petal_length 1.5 :petal_width 0.2 :species \"setosa\"}\n", " {:sepal_length 5.0 :sepal_width 3.6 :petal_length 1.4 :petal_width 0.2 :species \"setosa\"}\n", " {:sepal_length 7.0 :sepal_width 3.2 :petal_length 4.7 :petal_width 1.4 :species \"versicolor\"}\n", " {:sepal_length 6.5 :sepal_width 2.8 :petal_length 4.6 :petal_width 1.5 :species \"versicolor\"}\n", " {:sepal_length 4.9 :sepal_width 3.0 :petal_length 1.4 :petal_width 0.2 :species \"setosa\"}\n", " {:sepal_length 5.7 :sepal_width 2.8 :petal_length 4.5 :petal_width 1.3 :species \"versicolor\"}\n", " {:sepal_length 6.3 :sepal_width 3.3 :petal_length 6.0 :petal_width 2.5 :species \"virginica\"}\n", " {:sepal_length 6.4 :sepal_width 3.2 :petal_length 4.5 :petal_width 1.5 :species \"versicolor\"}\n", " {:sepal_length 5.8 :sepal_width 2.7 :petal_length 5.1 :petal_width 1.9 :species \"virginica\"}\n", " {:sepal_length 5.5 :sepal_width 2.3 :petal_length 4.0 :petal_width 1.3 :species \"versicolor\"}\n", " {:sepal_length 6.3 :sepal_width 2.9 :petal_length 5.6 :petal_width 1.8 :species \"virginica\"}])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### Task Definition\n", "1. Analyze \"virginica\" samples\n", "2. Take a random sample\n", "3. Calculate petal's length-to-width ratio and area\n", "4. Return the result as a vector of maps\n", "\n", "### What Is Required From A Good Solution?\n", "\n", "We are seeking a method that is:\n", "- modular\n", "- efficient\n", "- clean & legible\n", "\n", "Before presenting a candidate solution, we need some background..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Transducers To The Rescue\n", "\n", "### What Are \"Transducers\"?\n", "\n", "In Clojure's context, **Transducing** is a technique for processing sequential data by applying to it a series of one or more transformations.\n", "\n", "The term **\"Transduce\"** is a combination of the terms **\"Transform\"** and **\"Reduce\"**.\n", "\n", "> [`transducers` were added in Clojure 1.7]\n", "\n", "Think of shell pipe in Unix:\n", "\n", " cat some-file | grep hello | cut -f 2 | sort | uniq | wc -l\n", "\n", "Before diving into `transducers`, let's see some examples—we'll start by explaining **\"Transformation\"** and **\"Reduction\"**.\n", "\n", "## Transformation\n", "\n", "Apply a function to items of a collection, get a new, transformed collection.\n", "\n", "#### The `map` function\n", " map\n", " [f coll]\n", "\n", "#### The mapping function `f`\n", " [item] --> new-item\n", " \n", "*Examples:*" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(2 3 4 5)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(map inc [1 2 3 4])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(1 4 9 16)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(map #(* % %) [1 2 3 4])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "({\"hello\" 5} {\"world\" 5} {\"Clojure\" 7} {\"is\" 2} {\"great\" 5})" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(map (fn [x] {x (count x)}) [\"hello\" \"world\" \"Clojure\" \"is\" \"great\"])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "#### Transformation can be done with functions other than `map`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "((0 1 2) (3 4 5) (6 7 8) (9 10 11))" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(partition 3 (range 12))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(1 3 5 7 9)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(filter odd? (range 10))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Reduction\n", "\n", "Reduce a collection to a *scalar* value (a somewhat oversimplified definition)\n", "\n", " #### The `reduce` function\n", " reduce\n", " [f coll]\n", " [f val coll]\n", "\n", "#### The reducing function `f`\n", " [acc item] --> new-acc\n", " \n", "*Examples:*" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "362880" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; no initial value supplied\n", "(reduce * (range 1 10)) ; 9!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "362880" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; supplying an initial value\n", "(reduce * 1 (range 1 10)) ; 9!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes you must provide an initial value. Look at the following example—what would happen if you do not provide an initial value?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "50" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; sum of squares - notice the significance of the initial value\n", "(reduce\n", " (fn [acc item] (+ acc (* item item))) ; reducing function\n", " 0 ; initial value\n", " [3 4 5]) ; collection" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "#### The result of reduction need not be a scalar! (But you'll have to supply an appropriate initial value.)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "#{:c :b :a}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; converting a vector to a set\n", "(reduce conj #{} [:a :b :c])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Combining Transformations \"pre-transducers\" era\n", "\n", "*Almost there :-)*" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "42" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def simple-data [2 -3 7 27 -8 2 5 9 -21 11])\n", "(def step-1 (map inc simple-data))\n", "(def step-2 (filter even? step-1))\n", "(reduce + step-2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "42" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(reduce + (filter even? (map inc simple-data)))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "42" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(->> simple-data (map inc) (filter even?) (reduce +))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "How many passes over the data have we made in each of the above methods?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Combining Transformations \"pre-transducers\" era (cont.)\n", "Cons:\n", "- intermediate sequences\n", "- memory usage\n", "- non-modular" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Transducers Are Here\n", "\n", "\n", "The **transduce** function\n", "\n", " transduce\n", " [xform f coll]\n", " [xform f init coll]\n", "\n", "Where:\n", "- `xform` - a transducer\n", "- `f` - a reducing function\n", "- `init` - an initial value for the reducing function\n", "- `coll` - a collection to process\n", "\n", "### What does a `transducer` do?\n", "A `transducer` (or `xform`) is a function that\n", "\n", "- takes a `reducing function`\n", "- wraps it with some transformation\n", "- returns a new `reducing function`, with the transformation 'built-in'\n", "\n", "Note: the `reducing` operation is not changed, but a transformation is added 'on top of it'\n", "\n", "This enables chaining of `xforms`\n", "\n", "*Example*: `transduce` vs. `reduce`" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(transduce (map inc) + (range 5))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(reduce + (map inc (range 5)))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "#### Some more `transducer` examples" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[1 8 27 64]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; no need to provide an initial value!\n", "(transduce (map #(* % % %)) conj [1 2 3 4])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[1 8 27 64]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ ";; here you must provide an initial value\n", "(reduce conj [] (map #(* % % %) [1 2 3 4]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Functions That Create Transducers\n", "\n", "The following Clojure's built-in functions create a `transducer` when called without a collection:\n", "\n", "`map` `cat` `mapcat` `filter` `remove` `take` `take-while`\n", "`take-nth` `drop` `drop-while` `replace` `partition-by`\n", "`partition-all` `keep` `keep-indexed` `map-indexed` `distinct`\n", "`interpose` `dedupe` `random-sample`\n", "\n", "[defining transformations with transducers](https://clojure.org/reference/transducers#_defining_transformations_with_transducers)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## xforms" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[2 -3 7 27 -8 2 5 9 -21 11]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple-data ;; defined above, just a reminder" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[-2 8 28 6 10 -20 12]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(transduce (comp (map inc) (filter even?)) conj simple-data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The same as above, in a (maybe) cleaner and more modular syntax" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[-2 8 28 6 10 -20 12]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def xform (comp (map inc) (filter even?)))\n", "\n", "(transduce xform conj simple-data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We can easily plug-in different `reducing functions`" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "6451200" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(transduce xform * simple-data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## More `xform` Examples\n", "\n", "A 'complex' transducer. Notice that the type of the output of each transformation should be compatible with the type of input expected by the transformation following it.\n", "\n", "For example, `(random-sample 0.42)` produces (in this example) an `int`, which is compatible with what `(filter odd?)` expects. As another example, `(partition-all 2)` produces pairs of integers, which are compatible with the input type expected by `(map #(apply * %))`." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "198" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def test-data (range 100))\n", "\n", "(def test-xform-1\n", " (comp\n", " (random-sample 0.42)\n", " (filter odd?)\n", " (remove #(> % 23))\n", " (map inc)\n", " (take 5)\n", " (partition-all 2)\n", " (map #(apply * %))))\n", "\n", "(transduce test-xform-1 + test-data)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[80 22]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(transduce test-xform-1 conj test-data)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[24 10]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(transduce (comp test-xform-1 (map #(/ % 2))) conj test-data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Back To The Iris Data\n", "[*Iris* data](#iris_data)\n", "\n", "Here is our solution. To make it more modular, separate `xforms` are defined for intermediate tasks, and are later combined for making the final `xform`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[{:petal_ratio 2.8095238095238098, :petal_area 12.39} {:petal_ratio 2.4, :petal_area 15.0} {:petal_ratio 2.6842105263157894, :petal_area 9.69} {:petal_ratio 3.1111111111111107, :petal_area 10.08}]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def xf-prepare-data (comp\n", " (filter #(= (:species %) \"virginica\"))\n", " (random-sample 0.7)))\n", "\n", "(def xf-calculate (comp\n", " (map (fn [item] [(:petal_length item) (:petal_width item)]))\n", " (map (fn [[l w]] {:petal_ratio (/ l w) :petal_area (* l w)}))))\n", " \n", "(def xf (comp xf-prepare-data xf-calculate)) \n", "\n", "(transduce xf conj iris-data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Benefits Of Transduers\n", "\n", "- Processing is done in one pass\n", "- Initial value is generated automatically (when not provided)\n", "- Modularity: xforms can be created using `comp` from built-in\n", " functions and/or other xforms\n", "- The \"transformation\" (described by an xform) is separated from\n", " the \"reduction\" (described by a reducing function); thus the same\n", " xform can be used with different reducing functions and vice versa\n", "\n", "## Additional Usages\n", "\n", "Using the `transduce` function is perhaps the most common way for\n", "using transducers, but transducers can be used via other methods:\n", "\n", "- sequence - lazy pass\n", "- into\n", "- eduction\n", "- core.async channels\n", "\n", "### sequence\n", "\n", " [xform coll]\n", " [xform coll & colls]\n", "\n", "The `sequence` function is called with a `transducer` and one or more\n", "collections, and return a lazy sequence of applications of the\n", "transform to the items of the collection(s)." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(1 2 3 4 5)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def my-seq (sequence (map inc) (range 100)))\n", "(take 5 my-seq)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### into\n", "\n", " [to-coll xform from-coll]\n", "\n", "The `into` function adds the items of a given collection to a target\n", "collection. When it is given a `transducer`, it applies the transform\n", "to the items of the given collection and then adds the transformed\n", "items to the target collection.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "#{7 1 4 6 3 2 9 5 10 8}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(into #{} (map inc) (range 10))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### eduction\n", "\n", " [xform* coll]\n", " \n", "The `eduction` function takes one or more transducers and a\n", "collection, and returns a reducible/iterable application of the\n", "transducers to the items of the collection.\n", "\n", "Notice that no new collection is created when calling `eduction` - the\n", "transform is applied only when iterating over the object returned by\n", "`eduction`. Also note that you can call reduce or iterate the eduction\n", "object as many times as you want." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "35" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(def ed (eduction (map inc) (filter odd?) [1 2 3 4 5 6 7 8 9 10]))\n", "\n", "(reduce + 0 ed)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(3 5 7)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(take 3 ed)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "`eduction` is useful for bundling together a transform and a collection\n", "in advance. You will later be able to apply `reduce` to the\n", "bundled object at different times and locations, possibly with different\n", "reducing functions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### Related Topics\n", "\n", "Some stuff worth looking at:\n", "\n", "- [clojure.core.async](https://clojure.github.io/core.async/) (channels, \"streaming\", ...)\n", "- [clojure.core.reducers](https://clojure.github.io/clojure/clojure.core-api.html#clojure.core.reducers)\n", "\n", "### Suggested Reading\n", "- [Clojure - Transducers](https://clojure.org/reference/transducers)\n", "- [Transducers Are Coming / Rich Hickey](http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming)\n", "- [Understanding Transducers / Elben Shira](http://elbenshira.com/blog/understandingtransducers/)\n", "- [Deriving Transducers from First Principles / Rob Smallshire](https://sixty-north.com/blog/deriving-transducers-from-first-principles.html)\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Lein-Clojure", "language": "clojure", "name": "lein-clojure" }, "language_info": { "file_extension": ".clj", "mimetype": "text/x-clojure", "name": "clojure", "version": "1.9.0" } }, "nbformat": 4, "nbformat_minor": 2 }