# Configuration

Loupe can be fine-tuned to match your requirements. As with any search engine, Loupe optimizes the documents you index
for efficient retrieval later on. This means, indexing takes rather long compared to searching.
Moreover, Loupe is typo tolerant which is achieved using a State Set Index implementation. Loupe is shipped with 
sane defaults, but you may want to tweak the parameters for your use case.

But let's start with the basic configuration. 

## Document ID

Every document has to have an identifier in Loupe. By default, Loupe expects every document you index to have an `id` 
key. But you can adjust that to your needs: 

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withPrimaryKey('uuid')
;
```

## Searchable attributes

By default, Loupe indexes all the attributes of your documents. This makes the search index considerably bigger.
So be sure to configure, which attributes you want to search through later on:

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withSearchableAttributes(['firstname', 'lastname'])
;
```

Note that the order of searchable attributes has an influence on the [relevance ranking](./ranking.md) of search
results: attributes listed earlier carry more weight than attributes listed later.

## Displayed attributes

By default, Loupe lists all the attributes of your documents in a search result.
If a field attribute is not in the displayed attribute list, the field won't be added to the returned documents.
This can be useful if you e.g. need an attribute only for filtering, but you're never interested in the value itself when
receiving the document. Internally, Loupe can thus also optimize the storage.

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withDisplayedAttributes(['firstname', 'lastname'])
;
```

## Filterable attributes

By default, no attribute can be filtered on in Loupe. Any attribute you want to filter for, needs to be defined as 
such before you start indexing. Notice that the attributes can be single values (scalar) but also arrays - Loupe 
does everything for you:

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withFilterableAttributes(['departments', 'age'])
;
```

## Sortable attributes

Loupe can order your results by any scalar attribute of your document:

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withSortableAttributes(['age', 'lastname'])
;
```

## Tokenization

In order to optimize tokenization for your use case, read [the "Tokenizer" section of the docs](tokenizer.md). These 
are the options:

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withMaxQueryTokens(12)
    ->withLanguages(['en', 'fr', 'de'])
;
```


## Minimum length for prefix search

In Loupe - as in MeiliSearch - we follow the philosophy of prefix search. 

Prefix search means that it's not necessary to type a word in its entirety to find documents containing that 
word — you can just type the first few letters. So `huck` would also find `huckleberry`.

Prefix search is only performed on the last word in a search query. Prior words must be typed out fully to get 
accurate results. E.g. `my friend huck` would find documents containing `huckleberry` - `huck is my friend`, however, 
would not.

Searching by prefix (rather than using complete words) has a significant impact on search time. 
The shorter the query term, the more possible matches in the dataset.

That's why you can also configure the minimum length of characters that a term must contain before the prefix search 
kicks in. By default, this is configured to `3`. So searching for `h` would not find `huckleberry` while `huc` would.

You can configure this behavior:

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withMinTokenLengthForPrefixSearch(1)
;
```

## Typo tolerance

Loupe is typo tolerant! This is achieved by implementing the algorithm presented in the 2012 research paper "Efficient 
Similarity Search in Very Large String Sets" by Dandy Fenz, Dustin Lange, Astrid Rheinländer, Felix Naumann,
and Ulf Leser from the Hasso Plattner Institute, Potsdam, Germany and Humboldt-Universität zu Berlin, Department of
Computer Science, Berlin, Germany.

The algorithm allows to efficiently search through huge datasets with typos (Levenshtein distance) while keeping the
index size small. [Download the paper and read all the details here][Paper].

Typo tolerance is configured as a sub object of the `Configuration` class:

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create();

$configuration = \Loupe\Loupe\Configuration::create()
    ->withTypoTolerance($typoTolerance)
;
```

In the following examples, we're thus only going to look at the `TypoTolerance` method calls.

### Disabling typo tolerance

By default, typo tolerance is enabled, but you can disable typo tolerance entirely. It's as easy as this:

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::disabled();
```

### Alphabet size and index length

Those are the two major configuration values that affect basically everything in Loupe:

- The index size
- The indexing performance
- The search performance

It's pretty hard to explain the State Set Index algorithm in a few short words, but I tried my very best to explain 
some of it in the [Performance](performance.md) section. Best is to read the academic paper
linked. However, one thing to note: You **cannot** get wrong search results no matter what values you configure. Those  
values are basically about the number of potential false-positives that then have to be filtered by 
running the Damerau-Levenshtein algorithm on all results. The higher the values, the less false-positives. But also the more 
space required for the index.

The alphabet size is configured to `4` by default. The index length to `14`.

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
    ->withAlphabetSize(5)
    ->withIndexLength(18)
;
```

Note: The paper works using the Levenshtein algorithm. Loupe includes adjustments built on top of that paper to support
Damerau-Levenshtein.

### Typo thresholds

Usually, the longer the words, the more typos should be tolerated. It makes no sense to tolerate `6` typos for a word 
like `search` as it would mean that `engine` matches as well.

By default, Loupe tolerates `2` typos for words that are `9` or more characters long and `1` typo for `5` to `8` 
character long words. You can configure those thresholds. The key is the threshold and the value represents the 
allowed typos:

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
    ->withTypoThresholds([
        8 => 2, // 8 or more characters allow for 2 typos
        3 => 1, // 3 - 7 characters, allow one typo
    ])
;
```

### Count a typo at the beginning of the word as two mistakes

Typos at the beginning of a word are not as likely as typos in between words. Thus, Loupe counts a
typo at the first character of a word as two typos by default. You can disable this behavior like so:

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
    ->withFirstCharTypoCountsDouble(false)
;
```

### Prefix search with typos

By default, Loupe will not allow typos on prefixes. So if you e.g. search for `Huckle`, it will find `Huckleberry` 
but if you search for `Hukcle`, it won't. This is for performance reasons. However, you can enable typo tolerance on 
prefix search. Just be aware that you probably shouldn't do this in case you have tens of thousands of documents:

```php
$typoTolerance = \Loupe\Loupe\Config\TypoTolerance::create()
    ->withEnabledForPrefixSearch(true)
;
```

## Total result limit

Loupe enforces a maximum number of 1,000 results across all pages to safeguard your index against malicious scraping.
Beyond this limit, no results will be calculated or returned, independent of pagination settings. The limit can be
configured if your application requires a different value.

```php
$configuration = \Loupe\Loupe\Configuration::create()
    ->withMaxTotalHits(100);
```

If you need to browse all the documents and ignore this limit, use [the browse API][Browse].

## Debugging

You may pass a PSR-3 logger to Loupe. For the sake of simplicity, Loupe also ships with a very simple 
`InMemoryLogger` so you don't have to require any special package only to quickly debug internals:

```php
$logger = new \Loupe\Loupe\Logger\InMemoryLogger();

$configuration = \Loupe\Loupe\Configuration::create()
    ->withLogger($logger)
;

print_r($logger->getRecords());
```

## Serializing and reconstructing configuration

If you need to serialize the configuration of Loupe e.g. to allow configuration using a DSN, you can use the `toString()` and `fromString()` helpers accordingly:

```php
$configuration = \Loupe\Loupe\Configuration::create();

$serialized = $configuration->toString();
$reconstructed = \Loupe\Loupe\Configuration::fromString($serialized);
```

Note that this does not work for instances such as the "logger".

[Browse]: browsing.md
[Paper]: https://hpi.de/oldsite/fileadmin/user_upload/fachgebiete/naumann/publications/PDFs/2012_fenz_efficient.pdf