---
title: Sizes and Indices
author: nbloomf
date: 2017-10-12
tags: ml, literate-haskell
---
First some boilerplate.
> {-# LANGUAGE LambdaCase #-}
> module Indices where
>
> import Data.List
> import Test.QuickCheck
> import Test.QuickCheck.Test
> import System.Exit
This post is just some preliminary ideas about tensors - nothing learning-specific yet.
Fundamentally, supervised learning models are (nonlinear) functions involving vector spaces over $\mathbb{R}$. A lot of literature refers to the elements of these spaces as "tensors", because, well, that's what they are. But I think this word "tensor" can be unhelpful in this context for several reasons. For one thing, the correct answer to the question "what is a tensor?" quickly veers into multilinear functions and massive quotient spaces and universal properties and omg I just wanted to write a program to tell the difference between cats and dogs. So I'll just say that a tensor "is" just a **multidimensional array**, in the sense that a linear transformation "is" a matrix, since for the most part we really want to think of tensors as data, and don't care so much about the more abstract bits.
With that said, what exactly is a multidimensional array? One definition is that it's an element of a set like $$\mathbb{R}^{k_1 \times k_2 \times \cdots \times k_t}$$ where each $k_t$ is a natural number. And this is totally appropriate. But I am going to do something a little different. I'm not really comfortable defining things in terms of ellipses; that "dot dot dot" hides enough details to make rigorous calculations awkward to my taste. And since from a machine learning perspective we don't really need the full power of tensor algebra, I suspect we can afford to take a different tack.
Instead, let's think for a minute about what we want out of a *multidimensional* array. An array is a really simple kind of data structure, consisting of entries that are accessed using their index or position in the array. That word dimension is a funny thing -- what does it really mean here? In the strictest sense, it measures the number of "coordinates" needed to specify an entry in the array. So, for example, an array in $\mathbb{R}^{2 \times 3 \times 4}$ has "dimension" 3, since each entry has an address along 3 different "axes". But then $\mathbb{R}^{5 \times 6 \times 7}$ also has dimension 3 in this sense. (In tensor language we'd call this the *rank* rather than dimension.)
The reason why we attach numbers to things is typically to quantify how alike they are. So: how are arrays in $\mathbb{R}^{2 \times 3 \times 4}$ and $\mathbb{R}^{5 \times 6 \times 7}$ alike? Are they alike enough to warrant using a hefty word like *dimension* to express their similarity, especially when there's a more relevant notion of vector space dimension floating around? I don't think so.
In this post I'll define a couple of algebras in an attempt to nail down a useful notion of *dimension*, as well as *shape*, *size*, and *index* for multidimensional arrays. For now, when I say *algebra* I mean [*universal algebra*](https://en.wikipedia.org/wiki/Universal_algebra); that is, a set with some functions on it that satisfy 0 or more universally quantified axioms.
Let's think again about that vector dimension $2 \times 3 \times 4$. This is a funny way to write a dimension. Yes, we can think of a natural number as the set of numbers less than itself, and that $\times$ like the cartesian product of sets, and then $\mathbb{R}^{2 \times 3 \times 4}$ can be thought of as a literal set of functions from the set $2 \times 3 \times 4$ to $\mathbb{R}$, as the notation suggests. But that $2 \times 3 \times 4$ makes it look like we want to express some kind of arithmetic that remembers where it came from, in the sense that $2 \times 3$ and $3 \times 2$ are different. Doing this with actual sets is a little awkward, though, so lets make an algebra instead.
We denote by $\mathbb{S}$ the free algebra over $\mathbb{N}$ with two function symbols of arity 2, denoted $\oplus$ and $\otimes$. Elements of $\mathbb{S}$ are called sizes, and we'll sometimes refer to $\mathbb{S}$ as the algebra of sizes.
For example, $2 \oplus 4$ and $5 \otimes (3 \oplus 6)$ are elements of $\mathbb{S}$. Eventually we'll use, for instance, the element $2 \otimes 3$ to describe the "size" of a $2 \times 3$ matrix. The size algebra has no axioms, so for example $a \otimes (b \otimes c)$ and $(a \otimes b) \otimes c$ are not equal. And just ignore the $\oplus$ for now. :)
So the elements of $\mathbb{S}$ look like unevaluated arithmetic expressions with plus and times. By the way, one benefit of using free algebras is we can implement them in Haskell with algebraic data types.
> data Size
> = Size Integer
> | Size :+ Size
> | Size :* Size
> deriving Eq
>
> instance Show Size where
> show =
> let
> p x = if ' ' `elem` x
> then "(" ++ x ++ ")" else x
> in
> \case
> Size k -> show k
> a :+ b -> concat [p $ show a, " + ", p $ show b]
> a :* b -> concat [p $ show a, " x ", p $ show b]
>
> -- so we can define them with numeric literals
> instance Num Size where
> fromInteger k = if k >= 0
> then Size k
> else error "sizes cannot be negative."
> (+) = (:+)
> (*) = (:*)
>
> abs = error "Size Num instance: abs makes no sense."
> signum = error "Size Num instance: signum makes no sense."
> negate = error "Size Num instance: negate makes no sense."
If you're following along with GHCi, try defining some ``Size``s. (The ``Num`` instance is just there to make the notation less awkward.)
```haskell
$> 4 :: Size
$> 2*3 :: Size
$> 2+(3*4) :: Size
```
Another nice thing about free algebras is that we get universal mappings for free! For example:
We denote by $\mathbb{H}$ the free algebra over $\ast = \{\ast\}$ with two function symbols of arity 2, denoted $\oplus$ and $\otimes$. Elements of $\mathbb{H}$ are called shapes, and we'll sometimes refer to $\mathbb{H}$ as the algebra of shapes. Define $h : \mathbb{N} \rightarrow \ast$ by $h(k) = \ast$, and let $H : \mathbb{S} \rightarrow \mathbb{H}$ be the map induced by $h$. If $s \in \mathbb{S}$, we say $H(s)$ is the shape of $s$.
Note that $(\mathbb{N},+,\times)$ is an algebra with two function symbols of arity 2. Let $D : \mathbb{S} \rightarrow \mathbb{N}$ be the map induced by the identity function on $\mathbb{N}$. If $s \in \mathbb{S}$, we say $D(s)$ is the dimension of $s$.
Again, we can implement these in code in the usual way.
> data Shape
> = HAtom
> | HPlus Shape Shape
> | HTimes Shape Shape
> deriving Eq
>
> instance Show Shape where
> show = \case
> HAtom -> "*"
> HPlus a b -> concat ["(", show a, " + ", show b, ")"]
> HTimes a b -> concat ["(", show a, " x ", show b, ")"]
>
> shapeOf :: Size -> Shape
> shapeOf = \case
> Size _ -> HAtom
> a :+ b -> HPlus (shapeOf a) (shapeOf b)
> a :* b -> HTimes (shapeOf a) (shapeOf b)
>
> dimOf :: Size -> Integer
> dimOf = \case
> Size k -> k
> a :+ b -> (dimOf a) + (dimOf b)
> a :* b -> (dimOf a) * (dimOf b)
Eventually, $s \in \mathbb{S}$ will represent the "size" of a tensor and $D(s) \in \mathbb{N}$ will be the vector space dimension of the space it comes from.
Indices
-------
This is well and good; we have a type, ``Size``, that will eventually represent the size of a multidimensional array, and we can extract the "shape" and "dimension" of a size. But we also need a reasonable understanding of how to refer to the entries of an array of a given size.
However we define indices, which indices make sense for a given size will depend on the structure of the size. For instance, a natural number size $k$ might be indexed by $k$ contiguous natural numbers, starting from 0 or 1 or whatever. A product-shaped size like $a \otimes b$ might be indexed by a pair $(u,v)$, where $u$ is an index of $a$ and $v$ an index of $b$. The sum size is a little stranger: to index $a \oplus b$, we need an index for either $a$ or $b$, and some way to distinguish which is which.
Putting this together, we will define an algebra of indices like so.
We denote by $\mathbb{I}$ the free algebra over $\mathbb{N}$ with two function symbols of arity 1 and one of arity 2, denoted $\mathsf{L}$, $\mathsf{R}$, and $\&$. Elements of $\mathbb{I}$ are called indices, and we'll sometimes refer to $\mathbb{I}$ as the algebra of indices.
Again, since $\mathbb{I}$ is a free algebra we can represent it as an algebraic type.
> data Index
> = Index Integer
> | L Index
> | R Index
> | Index :& Index
> deriving Eq
>
>
> instance Show Index where
> show = \case
> Index k -> show k
> L a -> "L(" ++ show a ++ ")"
> R b -> "R(" ++ show b ++ ")"
> a :& b -> concat ["(", show a, ",", show b, ")"]
>
>
> instance Num Index where
> fromInteger = Index
>
> (+) = error "Index Num instance: (+) does not make sense"
> (*) = error "Index Num instance: (*) does not make sense"
> negate = error "Index Num instance: negate does not make sense"
> abs = error "Index Num instance: abs does not make sense"
> signum = error "Index Num instance: signum does not make sense"
Now given an index and a size, it may or may not make sense to talk about an entry at the index in a structure of the given size -- like asking for the item at index 10 in an array of length 5. To capture this, we define a compatibility relation to detect when an index can be used on a given size.
> isIndexOf :: Index -> Size -> Bool
> (Index t) `isIndexOf` (Size k)
> = 0 <= t && t < k
> (L u) `isIndexOf` (a :+ _)
> = isIndexOf u a
> (R v) `isIndexOf` (_ :+ b)
> = isIndexOf v b
> (u :& v) `isIndexOf` (a :* b)
> = (u `isIndexOf` a) && (v `isIndexOf` b)
> _ `isIndexOf` _
> = False
From now on, if $s$ is a size, I'll also use $s$ to denote the set of indices compatible with $s$. So for example, if $s = 5$, we might say somthing like $$\sum_{i \in s} f(i)$$ without ambiguity.
We'd like to be able to construct $s$ as a list; this is what ``indicesOf`` does. I'm going to play a little fast and loose with the proof because laziness.
> indicesOf :: Size -> [Index]
> indicesOf = \case
> Size k -> map Index [0..(k-1)]
> a :+ b -> map L (indicesOf a) ++ map R (indicesOf b)
> a :* b -> [ u :& v | v <- indicesOf b, u <- indicesOf a ]
The number of different indices for a given size should be equal to the size's dimension. This suggests a simple test: the length of the index list is the dimension, and all entries of the index list are distinct.
> _test_index_count :: Test (Size -> Bool)
> _test_index_count =
> testName "dimOf s == length $ indicesOf s" $
> \s ->
> (dimOf s) == (fromIntegral $ length $ indicesOf s)
>
> _test_indices_distinct :: Test (Size -> Bool)
> _test_indices_distinct =
> testName "indicesOf s all distinct" $
> \s -> (indicesOf s) == (nub $ indicesOf s)
In later posts, $s \in \mathbb{S}$ will represent the size (and shape) of the elements in a vector space consisting of tensors, which itself has vector space dimension $D(s)$. But it will sometimes be convenient to think of these tensors canonically as $D(s)$-dimensional vectors. To do this, we'll set up a bijection between the indices of a given size $s$ and the natural numbers less than $D(s)$. I'll call the function from indices to numbers "flatten", since it turns a complicated thing into a one-dimensional thing, and call the inverse "buildup".
> flatten :: Size -> Index -> Integer
> flatten (Size k) (Index t)
> = if 0 <= t && t < k then t else error "index out of bounds"
> flatten (a :+ _) (L u)
> = flatten a u
> flatten (a :+ b) (R v)
> = (dimOf a) + (flatten b v)
> flatten (a :* b) (u :& v)
> = (flatten a u) + (flatten b v)*(dimOf a)
>
>
> buildup :: Size -> Integer -> Index
> buildup (Size k) t
> = if 0 <= t && t < k
> then Index t
> else error "integer index out of bounds"
> buildup (a :+ b) t
> = if t < dimOf a
> then L $ buildup a t
> else R $ buildup b (t - dimOf a)
> buildup (a :* b) t
> = (buildup a (t `rem` (dimOf a))) :& (buildup b (t `quot` (dimOf a)))
Now ``flatten`` and ``buildup`` should be inverses of each other, which we can test.
> _test_flatten_buildup :: Test (Size -> Bool)
> _test_flatten_buildup =
> testName "flatten s . buildup s == id" $
> \s ->
> let ks = [0..((dimOf s) - 1)]
> in ks == map (flatten s . buildup s) ks
>
> _test_buildup_flatten :: Test (Size -> Bool)
> _test_buildup_flatten =
> testName "buildup s . flatten s == id" $
> \s ->
> let ks = indicesOf s
> in ks == map (buildup s . flatten s) ks
To wrap up, in this post we defined two algebraic types, ``Size`` and ``Index``, to represent the sizes and indices of multidimensional arrays, and two functions, ``flatten`` and ``buildup``, that canonically map the indices of a given size to a 0-indexed list of natural numbers.
In the next post, we'll use ``Size`` and ``Index`` to define and manipulate multidimensional arrays.
Tests
-----
Math heavy code is well suited to automated tests, so we'll write some along the way using the ``QuickCheck`` library.
First off, we won't be needing the full complexity of QuickCheck, so here are some helper functions to make the tests a little simpler to write.
> type Test prop = (String, prop)
>
> testName :: String -> prop -> Test prop
> testName name prop = (name, prop)
>
> runTest, chattyTest, skipTest :: Testable prop => Args -> Test prop -> IO ()
> runTest args (name, prop) = do
> putStrLn ("\x1b[1;34m" ++ name ++ "\x1b[0;39;49m")
> result <- quickCheckWithResult args prop
> if isSuccess result
> then return ()
> else (putStrLn (show result)) >> exitFailure
>
> chattyTest args (name, prop) = do
> putStrLn ("\x1b[1;35m" ++ name ++ "\x1b[0;39;49m")
> result <- verboseCheckWithResult args prop
> if isSuccess result
> then return ()
> else (putStrLn (show result))
>
> -- when testing tests
> skipTest _ (name, _) =
> putStrLn ("\x1b[1;33mskipped: " ++ name ++ "\x1b[0;39;49m")
>
> testLabel :: String -> IO ()
> testLabel msg = putStrLn ("\n\x1b[1;32m" ++ msg ++ "\x1b[0;39;49m")
>
> class TypeName t where
> typeName :: t -> String
>
> instance TypeName Int where typeName _ = "Int"
> instance TypeName Integer where typeName _ = "Integer"
> instance TypeName Double where typeName _ = "Double"
>
> pairOf :: (Monad m) => m a -> m b -> m (a,b)
> pairOf ma mb = do
> x <- ma
> y <- mb
> return (x,y)
>
> forAll2 :: (Show a, Show b, Testable prop)
> => Gen a -> Gen b -> (a -> b -> prop) -> Property
> forAll2 ga gb f = forAll genPair (uncurry f)
> where
> genPair = do
> x <- ga
> y <- gb
> return (x,y)
>
> forAll3 :: (Show a, Show b, Show c, Testable prop)
> => Gen a -> Gen b -> Gen c -> (a -> b -> c -> prop) -> Property
> forAll3 ga gb gc f = forAll genTriple g
> where
> genTriple = do
> x <- ga
> y <- gb
> z <- gc
> return (x,y,z)
>
> g (x,y,z) = f x y z
To write QuickCheck tests for a given type it needs to be an instance of ``Arbitrary``, which provides two basic functions: ``arbitrary``, which generates a "random" element of the type, and ``shrink``, which takes an element and makes it "smaller" in some way. Defining these functions for a given type may be ugly, but only has to be done once.
> instance Arbitrary Size where
> arbitrary = sized arbSize
>
> shrink = \case
> Size k -> if k <= 0 then [] else [Size (k-1)]
> u :+ v -> [u, v]
> u :* v -> [u, v]
>
> arbSize :: Int -> Gen Size
> arbSize 0 = do
> k <- elements [0,1,1,2,2,2,2]
> return (Size k)
> arbSize n = do
> switch <- arbitrary :: Gen Int
> m <- choose (1,n)
> case switch `mod` 5 of
> 0 -> do
> u <- arbSize $ n-1
> v <- arbSize $ n-1
> return (u :* v)
> 1 -> do
> u <- arbSize $ n-1
> v <- arbSize $ n-1
> return (u :+ v)
> _ -> do
> k <- elements [0,1,2]
> return (Size k)
Now we can wrap up our tests in a little suite, ``_test_index``. The arguments for this function are (1) the number of test cases to generate and (2) how big they should be.
> -- run all tests for Size and Index
> _test_index :: Int -> Int -> IO ()
> _test_index num size = do
> testLabel "Size and Index"
>
> let
> args = stdArgs
> { maxSuccess = num
> , maxSize = size
> }
>
> runTest args _test_index_count
> runTest args _test_indices_distinct
> runTest args _test_flatten_buildup
> runTest args _test_buildup_flatten
>
> main_index :: IO ()
> main_index = _test_index 200 20