--- title: Sizes and Indices author: nbloomf date: 2017-10-12 tags: ml, literate-haskell --- First some boilerplate. > {-# LANGUAGE LambdaCase #-} > module Indices where > > import Data.List > import Test.QuickCheck > import Test.QuickCheck.Test > import System.Exit This post is just some preliminary ideas about tensors - nothing learning-specific yet. Fundamentally, supervised learning models are (nonlinear) functions involving vector spaces over $\mathbb{R}$. A lot of literature refers to the elements of these spaces as "tensors", because, well, that's what they are. But I think this word "tensor" can be unhelpful in this context for several reasons. For one thing, the correct answer to the question "what is a tensor?" quickly veers into multilinear functions and massive quotient spaces and universal properties and omg I just wanted to write a program to tell the difference between cats and dogs. So I'll just say that a tensor "is" just a **multidimensional array**, in the sense that a linear transformation "is" a matrix, since for the most part we really want to think of tensors as data, and don't care so much about the more abstract bits. With that said, what exactly is a multidimensional array? One definition is that it's an element of a set like $$\mathbb{R}^{k_1 \times k_2 \times \cdots \times k_t}$$ where each $k_t$ is a natural number. And this is totally appropriate. But I am going to do something a little different. I'm not really comfortable defining things in terms of ellipses; that "dot dot dot" hides enough details to make rigorous calculations awkward to my taste. And since from a machine learning perspective we don't really need the full power of tensor algebra, I suspect we can afford to take a different tack. Instead, let's think for a minute about what we want out of a *multidimensional* array. An array is a really simple kind of data structure, consisting of entries that are accessed using their index or position in the array. That word dimension is a funny thing -- what does it really mean here? In the strictest sense, it measures the number of "coordinates" needed to specify an entry in the array. So, for example, an array in $\mathbb{R}^{2 \times 3 \times 4}$ has "dimension" 3, since each entry has an address along 3 different "axes". But then $\mathbb{R}^{5 \times 6 \times 7}$ also has dimension 3 in this sense. (In tensor language we'd call this the *rank* rather than dimension.) The reason why we attach numbers to things is typically to quantify how alike they are. So: how are arrays in $\mathbb{R}^{2 \times 3 \times 4}$ and $\mathbb{R}^{5 \times 6 \times 7}$ alike? Are they alike enough to warrant using a hefty word like *dimension* to express their similarity, especially when there's a more relevant notion of vector space dimension floating around? I don't think so. In this post I'll define a couple of algebras in an attempt to nail down a useful notion of *dimension*, as well as *shape*, *size*, and *index* for multidimensional arrays. For now, when I say *algebra* I mean [*universal algebra*](https://en.wikipedia.org/wiki/Universal_algebra); that is, a set with some functions on it that satisfy 0 or more universally quantified axioms. Let's think again about that vector dimension $2 \times 3 \times 4$. This is a funny way to write a dimension. Yes, we can think of a natural number as the set of numbers less than itself, and that $\times$ like the cartesian product of sets, and then $\mathbb{R}^{2 \times 3 \times 4}$ can be thought of as a literal set of functions from the set $2 \times 3 \times 4$ to $\mathbb{R}$, as the notation suggests. But that $2 \times 3 \times 4$ makes it look like we want to express some kind of arithmetic that remembers where it came from, in the sense that $2 \times 3$ and $3 \times 2$ are different. Doing this with actual sets is a little awkward, though, so lets make an algebra instead.

We denote by $\mathbb{S}$ the free algebra over $\mathbb{N}$ with two function symbols of arity 2, denoted $\oplus$ and $\otimes$. Elements of $\mathbb{S}$ are called sizes, and we'll sometimes refer to $\mathbb{S}$ as the algebra of sizes.

For example, $2 \oplus 4$ and $5 \otimes (3 \oplus 6)$ are elements of $\mathbb{S}$. Eventually we'll use, for instance, the element $2 \otimes 3$ to describe the "size" of a $2 \times 3$ matrix. The size algebra has no axioms, so for example $a \otimes (b \otimes c)$ and $(a \otimes b) \otimes c$ are not equal. And just ignore the $\oplus$ for now. :) So the elements of $\mathbb{S}$ look like unevaluated arithmetic expressions with plus and times. By the way, one benefit of using free algebras is we can implement them in Haskell with algebraic data types. > data Size > = Size Integer > | Size :+ Size > | Size :* Size > deriving Eq > > instance Show Size where > show = > let > p x = if ' ' `elem` x > then "(" ++ x ++ ")" else x > in > \case > Size k -> show k > a :+ b -> concat [p $ show a, " + ", p $ show b] > a :* b -> concat [p $ show a, " x ", p $ show b] > > -- so we can define them with numeric literals > instance Num Size where > fromInteger k = if k >= 0 > then Size k > else error "sizes cannot be negative." > (+) = (:+) > (*) = (:*) > > abs = error "Size Num instance: abs makes no sense." > signum = error "Size Num instance: signum makes no sense." > negate = error "Size Num instance: negate makes no sense." If you're following along with GHCi, try defining some ``Size``s. (The ``Num`` instance is just there to make the notation less awkward.) ```haskell $> 4 :: Size $> 2*3 :: Size $> 2+(3*4) :: Size ``` Another nice thing about free algebras is that we get universal mappings for free! For example:

We denote by $\mathbb{H}$ the free algebra over $\ast = \{\ast\}$ with two function symbols of arity 2, denoted $\oplus$ and $\otimes$. Elements of $\mathbb{H}$ are called shapes, and we'll sometimes refer to $\mathbb{H}$ as the algebra of shapes. Define $h : \mathbb{N} \rightarrow \ast$ by $h(k) = \ast$, and let $H : \mathbb{S} \rightarrow \mathbb{H}$ be the map induced by $h$. If $s \in \mathbb{S}$, we say $H(s)$ is the shape of $s$.

Note that $(\mathbb{N},+,\times)$ is an algebra with two function symbols of arity 2. Let $D : \mathbb{S} \rightarrow \mathbb{N}$ be the map induced by the identity function on $\mathbb{N}$. If $s \in \mathbb{S}$, we say $D(s)$ is the dimension of $s$.

Again, we can implement these in code in the usual way. > data Shape > = HAtom > | HPlus Shape Shape > | HTimes Shape Shape > deriving Eq > > instance Show Shape where > show = \case > HAtom -> "*" > HPlus a b -> concat ["(", show a, " + ", show b, ")"] > HTimes a b -> concat ["(", show a, " x ", show b, ")"] > > shapeOf :: Size -> Shape > shapeOf = \case > Size _ -> HAtom > a :+ b -> HPlus (shapeOf a) (shapeOf b) > a :* b -> HTimes (shapeOf a) (shapeOf b) > > dimOf :: Size -> Integer > dimOf = \case > Size k -> k > a :+ b -> (dimOf a) + (dimOf b) > a :* b -> (dimOf a) * (dimOf b) Eventually, $s \in \mathbb{S}$ will represent the "size" of a tensor and $D(s) \in \mathbb{N}$ will be the vector space dimension of the space it comes from. Indices ------- This is well and good; we have a type, ``Size``, that will eventually represent the size of a multidimensional array, and we can extract the "shape" and "dimension" of a size. But we also need a reasonable understanding of how to refer to the entries of an array of a given size. However we define indices, which indices make sense for a given size will depend on the structure of the size. For instance, a natural number size $k$ might be indexed by $k$ contiguous natural numbers, starting from 0 or 1 or whatever. A product-shaped size like $a \otimes b$ might be indexed by a pair $(u,v)$, where $u$ is an index of $a$ and $v$ an index of $b$. The sum size is a little stranger: to index $a \oplus b$, we need an index for either $a$ or $b$, and some way to distinguish which is which. Putting this together, we will define an algebra of indices like so.

We denote by $\mathbb{I}$ the free algebra over $\mathbb{N}$ with two function symbols of arity 1 and one of arity 2, denoted $\mathsf{L}$, $\mathsf{R}$, and $\&$. Elements of $\mathbb{I}$ are called indices, and we'll sometimes refer to $\mathbb{I}$ as the algebra of indices.

Again, since $\mathbb{I}$ is a free algebra we can represent it as an algebraic type. > data Index > = Index Integer > | L Index > | R Index > | Index :& Index > deriving Eq > > > instance Show Index where > show = \case > Index k -> show k > L a -> "L(" ++ show a ++ ")" > R b -> "R(" ++ show b ++ ")" > a :& b -> concat ["(", show a, ",", show b, ")"] > > > instance Num Index where > fromInteger = Index > > (+) = error "Index Num instance: (+) does not make sense" > (*) = error "Index Num instance: (*) does not make sense" > negate = error "Index Num instance: negate does not make sense" > abs = error "Index Num instance: abs does not make sense" > signum = error "Index Num instance: signum does not make sense" Now given an index and a size, it may or may not make sense to talk about an entry at the index in a structure of the given size -- like asking for the item at index 10 in an array of length 5. To capture this, we define a compatibility relation to detect when an index can be used on a given size. > isIndexOf :: Index -> Size -> Bool > (Index t) `isIndexOf` (Size k) > = 0 <= t && t < k > (L u) `isIndexOf` (a :+ _) > = isIndexOf u a > (R v) `isIndexOf` (_ :+ b) > = isIndexOf v b > (u :& v) `isIndexOf` (a :* b) > = (u `isIndexOf` a) && (v `isIndexOf` b) > _ `isIndexOf` _ > = False From now on, if $s$ is a size, I'll also use $s$ to denote the set of indices compatible with $s$. So for example, if $s = 5$, we might say somthing like $$\sum_{i \in s} f(i)$$ without ambiguity. We'd like to be able to construct $s$ as a list; this is what ``indicesOf`` does. I'm going to play a little fast and loose with the proof because laziness. > indicesOf :: Size -> [Index] > indicesOf = \case > Size k -> map Index [0..(k-1)] > a :+ b -> map L (indicesOf a) ++ map R (indicesOf b) > a :* b -> [ u :& v | v <- indicesOf b, u <- indicesOf a ] The number of different indices for a given size should be equal to the size's dimension. This suggests a simple test: the length of the index list is the dimension, and all entries of the index list are distinct. > _test_index_count :: Test (Size -> Bool) > _test_index_count = > testName "dimOf s == length $ indicesOf s" $ > \s -> > (dimOf s) == (fromIntegral $ length $ indicesOf s) > > _test_indices_distinct :: Test (Size -> Bool) > _test_indices_distinct = > testName "indicesOf s all distinct" $ > \s -> (indicesOf s) == (nub $ indicesOf s) In later posts, $s \in \mathbb{S}$ will represent the size (and shape) of the elements in a vector space consisting of tensors, which itself has vector space dimension $D(s)$. But it will sometimes be convenient to think of these tensors canonically as $D(s)$-dimensional vectors. To do this, we'll set up a bijection between the indices of a given size $s$ and the natural numbers less than $D(s)$. I'll call the function from indices to numbers "flatten", since it turns a complicated thing into a one-dimensional thing, and call the inverse "buildup". > flatten :: Size -> Index -> Integer > flatten (Size k) (Index t) > = if 0 <= t && t < k then t else error "index out of bounds" > flatten (a :+ _) (L u) > = flatten a u > flatten (a :+ b) (R v) > = (dimOf a) + (flatten b v) > flatten (a :* b) (u :& v) > = (flatten a u) + (flatten b v)*(dimOf a) > > > buildup :: Size -> Integer -> Index > buildup (Size k) t > = if 0 <= t && t < k > then Index t > else error "integer index out of bounds" > buildup (a :+ b) t > = if t < dimOf a > then L $ buildup a t > else R $ buildup b (t - dimOf a) > buildup (a :* b) t > = (buildup a (t `rem` (dimOf a))) :& (buildup b (t `quot` (dimOf a))) Now ``flatten`` and ``buildup`` should be inverses of each other, which we can test. > _test_flatten_buildup :: Test (Size -> Bool) > _test_flatten_buildup = > testName "flatten s . buildup s == id" $ > \s -> > let ks = [0..((dimOf s) - 1)] > in ks == map (flatten s . buildup s) ks > > _test_buildup_flatten :: Test (Size -> Bool) > _test_buildup_flatten = > testName "buildup s . flatten s == id" $ > \s -> > let ks = indicesOf s > in ks == map (buildup s . flatten s) ks To wrap up, in this post we defined two algebraic types, ``Size`` and ``Index``, to represent the sizes and indices of multidimensional arrays, and two functions, ``flatten`` and ``buildup``, that canonically map the indices of a given size to a 0-indexed list of natural numbers. In the next post, we'll use ``Size`` and ``Index`` to define and manipulate multidimensional arrays. Tests ----- Math heavy code is well suited to automated tests, so we'll write some along the way using the ``QuickCheck`` library. First off, we won't be needing the full complexity of QuickCheck, so here are some helper functions to make the tests a little simpler to write. > type Test prop = (String, prop) > > testName :: String -> prop -> Test prop > testName name prop = (name, prop) > > runTest, chattyTest, skipTest :: Testable prop => Args -> Test prop -> IO () > runTest args (name, prop) = do > putStrLn ("\x1b[1;34m" ++ name ++ "\x1b[0;39;49m") > result <- quickCheckWithResult args prop > if isSuccess result > then return () > else (putStrLn (show result)) >> exitFailure > > chattyTest args (name, prop) = do > putStrLn ("\x1b[1;35m" ++ name ++ "\x1b[0;39;49m") > result <- verboseCheckWithResult args prop > if isSuccess result > then return () > else (putStrLn (show result)) > > -- when testing tests > skipTest _ (name, _) = > putStrLn ("\x1b[1;33mskipped: " ++ name ++ "\x1b[0;39;49m") > > testLabel :: String -> IO () > testLabel msg = putStrLn ("\n\x1b[1;32m" ++ msg ++ "\x1b[0;39;49m") > > class TypeName t where > typeName :: t -> String > > instance TypeName Int where typeName _ = "Int" > instance TypeName Integer where typeName _ = "Integer" > instance TypeName Double where typeName _ = "Double" > > pairOf :: (Monad m) => m a -> m b -> m (a,b) > pairOf ma mb = do > x <- ma > y <- mb > return (x,y) > > forAll2 :: (Show a, Show b, Testable prop) > => Gen a -> Gen b -> (a -> b -> prop) -> Property > forAll2 ga gb f = forAll genPair (uncurry f) > where > genPair = do > x <- ga > y <- gb > return (x,y) > > forAll3 :: (Show a, Show b, Show c, Testable prop) > => Gen a -> Gen b -> Gen c -> (a -> b -> c -> prop) -> Property > forAll3 ga gb gc f = forAll genTriple g > where > genTriple = do > x <- ga > y <- gb > z <- gc > return (x,y,z) > > g (x,y,z) = f x y z To write QuickCheck tests for a given type it needs to be an instance of ``Arbitrary``, which provides two basic functions: ``arbitrary``, which generates a "random" element of the type, and ``shrink``, which takes an element and makes it "smaller" in some way. Defining these functions for a given type may be ugly, but only has to be done once. > instance Arbitrary Size where > arbitrary = sized arbSize > > shrink = \case > Size k -> if k <= 0 then [] else [Size (k-1)] > u :+ v -> [u, v] > u :* v -> [u, v] > > arbSize :: Int -> Gen Size > arbSize 0 = do > k <- elements [0,1,1,2,2,2,2] > return (Size k) > arbSize n = do > switch <- arbitrary :: Gen Int > m <- choose (1,n) > case switch `mod` 5 of > 0 -> do > u <- arbSize $ n-1 > v <- arbSize $ n-1 > return (u :* v) > 1 -> do > u <- arbSize $ n-1 > v <- arbSize $ n-1 > return (u :+ v) > _ -> do > k <- elements [0,1,2] > return (Size k) Now we can wrap up our tests in a little suite, ``_test_index``. The arguments for this function are (1) the number of test cases to generate and (2) how big they should be. > -- run all tests for Size and Index > _test_index :: Int -> Int -> IO () > _test_index num size = do > testLabel "Size and Index" > > let > args = stdArgs > { maxSuccess = num > , maxSize = size > } > > runTest args _test_index_count > runTest args _test_indices_distinct > runTest args _test_flatten_buildup > runTest args _test_buildup_flatten > > main_index :: IO () > main_index = _test_index 200 20