> ðŸ“¢: **This document was used during early development of siuba. See the [siuba intro doc](https://siuba.readthedocs.io/en/latest/api_table_core/03_select.html).**


In [1]:
from siuba.siu import _, explain

In [2]:
f = _['a'] + _['b']

(_ + _)(1)
d = {'a': 1, 'b': 2}

explain(_.somecol.min())

(_['a'] + _['b'])(d)

f = _['a'] + 4
f(d)


_.somecol.min()


5

# Rules

1. When `_()` represents a call, rather than executing one, it is called symbolic call
2. _ always performs symbolic calls, except immediately after...
  * binary operations. e.g. `_ + _`
  * a symbolic call. e.g. `_()`
  * an index operation. e.g. `_['a']`
3. You can explicitly tell _ to do a normal call, using `~~`. e.g. `~~_.func`

**Rational:**
It is much less common for people to make a call after a binary operation.

For example,

* uncommon: `(_.a + _.b)()`
* common: `(_.a + _.b).sum()`

# Examples

## Common cases

In [3]:
data = ['a','b','c']

In [4]:
# Binary operation
list(map(_ * 2, data))

['aa', 'bb', 'cc']

In [5]:
# Method call
list(map(_.upper(), data))

['A', 'B', 'C']

In [6]:
# Index
get_ax = _['a']['x']

get_ax({'a': {'x': 1}, 'b': 2})

1

## Escaping

In [7]:
# Escaping
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])

points = [Point(x = 0, y = 1), Point(x = 1, y = 2)]

# doesn't work, since _.x() is a symbolic call, like _.upper()
#list(map(_.x, points))

# works via escaping
list(map(~~_.x, points))

[0, 1]

In [8]:
# needs no escaping, since binary op!

list(map(_.x + _.y, points))

[1, 3]

In [9]:
# contrived complex example of escaping
# access .imag attribute of x + y

list(map(~~(_.x + _.y).imag, points))

[0, 0]

## Review of siu expressions...

Ready to call:

* _.a + _.b
* (_.a + _.b) / 2
* _.sum()
* -_.sum()
* _.a.sum() + _.b.sum()
* (_.a + _.b).sum()
* _["a"].sum()
* _["a"] + _["b"]
* ~~_.a
* ~~-_.a

Not ready to call:

* _
* _.a
* -_.a
* (_.a + _.b).sum

# Benefits

## Transparent

Lambdas lock your code away.
You know that when you call it, it will do some work, but you don't know what that is.
Siu expressions can state what they want to do.

In [10]:
f = _.a + _.b / 2 + _.c**_.d >> _ & _

explain(f)

((_.a + (_.b / 2) + _.c**_.d) >> _) & _


By default, siu expressions are represented via a call tree...

In [11]:
(_.a + _.b) / 2

â–ˆâ”€/
â”œâ”€â–ˆâ”€+
â”‚ â”œâ”€â–ˆâ”€.
â”‚ â”‚ â”œâ”€_
â”‚ â”‚ â””â”€'a'
â”‚ â””â”€â–ˆâ”€.
â”‚   â”œâ”€_
â”‚   â””â”€'b'
â””â”€2

While still rough, we can do analyses on siu expressions

In [12]:
symbol = _.a[_.b + 1] + _['c']

# hacky way to go from symbol to call for now
call = symbol.source

call.op_vars()

{'a', 'b', 'c'}

## Pandas, sql, and more

Down the road, we can use siu's transparency in execution engines.

People can say **what** they want to do, and we can optimize **how** to do it (e.g. in pandas, sql, etc..).

## metahooks

One kind of crazy thing I did was create a metahook, that automatically turns an imported function into one that creates siu expressions... (feature is currently unused!)

In [13]:
import siuba.meta_hook
from siuba.meta_hook.operator import add, sub
from siuba.meta_hook.pandas import DataFrame

f = add(1, _['a'] + _['b'])
explain(f)

f({'a': 1, 'b': 2})

<built-in function add>(1,_('a') + _('b'))


4

In [14]:
DataFrame({'a': [1,2,3]})

â–ˆâ”€'__call__'
â”œâ”€<class 'pandas.core.frame.DataFrame'>
â””â”€{'a': [1, 2, 3]}

In [15]:
_.a + _.b

â–ˆâ”€+
â”œâ”€â–ˆâ”€.
â”‚ â”œâ”€_
â”‚ â””â”€'a'
â””â”€â–ˆâ”€.
  â”œâ”€_
  â””â”€'b'

In [16]:
_.a() + _.b

â–ˆâ”€+
â”œâ”€â–ˆâ”€'__call__'
â”‚ â””â”€â–ˆâ”€.
â”‚   â”œâ”€_
â”‚   â””â”€'a'
â””â”€â–ˆâ”€.
  â”œâ”€_
  â””â”€'b'

# Is siu fast?

It depends how many times you call it.
For many applications you only need to call an expression once (e.g. in pandas).
If you call it many times, like in the example below, then it will be slower than using a lambda.

However, for libraries that expect siu expressions, knowing what they want to do means that we can actually speed up operations.

Below I just show the downside, that out of the box they're slower than lambdas :/

In [17]:
def lmap(*args, **kwargs): return list(map(*args, **kwargs))
l = [dict(a = 1) for ii in range(10*6)]

In [18]:
%%timeit
# NBVAL_IGNORE_OUTPUT

x = lmap(_['a'], l)

212 Âµs Â± 50.2 Âµs per loop (mean Â± std. dev. of 7 runs, 1000 loops each)


In [19]:
%%timeit
# NBVAL_IGNORE_OUTPUT

x = lmap(lambda x: x['a'], l)

7.29 Âµs Â± 199 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)


# Limitations

When siu expressions contain list literals, they can't know about any expressions inside those lists. E.g.

In [20]:
f = _ + [_, _, _]

f(['a'])

['a', _, _, _]

This can easily be worked around, though!

In [21]:
from siuba.meta_hook import lazy_func

@lazy_func
def List(*args):
    return list(args)

f = _ + List(_, _, _)

f(['a'])

['a', ['a'], ['a'], ['a']]

# Similar Projects

The below projects use similar symbolic objects, but driven by lambdas (so the **what** can't be inspected)

* [fn.py](https://github.com/kachayev/fn.py)
* [phi](https://github.com/cgarciae/phi)
* [pandasply](https://github.com/coursera/pandas-ply)

The projects below are symbolic computation engines, but lack a simple, generic, symbolic object

* [ibis](https://github.com/ibis-project/ibis)
* [blaze](https://github.com/blaze/blaze)