---
layout: docu
redirect_from:
- /docs/guides/sql_features/full_text_search
title: Full-Text Search
---

DuckDB supports full-text search via the [`fts` extension]({% link docs/stable/extensions/full_text_search.md %}).
A full-text index allows for a query to quickly search for all occurrences of individual words within longer text strings.

## Example: Shakespeare Corpus

Here's an example of building a full-text index of Shakespeare's plays.

```sql
CREATE TABLE corpus AS
    SELECT * FROM 'https://blobs.duckdb.org/data/shakespeare.parquet';
```

```sql
DESCRIBE corpus;
```

<div class="monospace_table"></div>

| column_name | column_type | null | key  | default | extra |
|-------------|-------------|------|------|---------|-------|
| line_id     | VARCHAR     | YES  | NULL | NULL    | NULL  |
| play_name   | VARCHAR     | YES  | NULL | NULL    | NULL  |
| line_number | VARCHAR     | YES  | NULL | NULL    | NULL  |
| speaker     | VARCHAR     | YES  | NULL | NULL    | NULL  |
| text_entry  | VARCHAR     | YES  | NULL | NULL    | NULL  |

The text of each line is in `text_entry`, and a unique key for each line is in `line_id`.

## Creating a Full-Text Search Index

First, we create the index, specifying the table name, the unique id column, and the column(s) to index. We will just index the single column `text_entry`, which contains the text of the lines in the play.

```sql
PRAGMA create_fts_index('corpus', 'line_id', 'text_entry');
```

The table is now ready to query using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking function. Rows with no match return a `NULL` score.

What does Shakespeare say about butter?

```sql
SELECT
    fts_main_corpus.match_bm25(line_id, 'butter') AS score,
    line_id, play_name, speaker, text_entry
FROM corpus
WHERE score IS NOT NULL
ORDER BY score DESC;
```

|       score        |   line_id   |        play_name         |   speaker    |                     text_entry                     |
|-------------------:|-------------|--------------------------|--------------|----------------------------------------------------|
| 4.427313429798464  | H4/2.4.494  | Henry IV                 | Carrier      | As fat as butter.                                  |
| 3.836270302568675  | H4/1.2.21   | Henry IV                 | FALSTAFF     | prologue to an egg and butter.                     |
| 3.836270302568675  | H4/2.1.55   | Henry IV                 | Chamberlain  | They are up already, and call for eggs and butter; |
| 3.3844488405497115 | H4/4.2.21   | Henry IV                 | FALSTAFF     | toasts-and-butter, with hearts in their bellies no |
| 3.3844488405497115 | H4/4.2.62   | Henry IV                 | PRINCE HENRY | already made thee butter. But tell me, Jack, whose |
| 3.3844488405497115 | AWW/4.1.40  | Alls well that ends well | PAROLLES     | butter-womans mouth and buy myself another of      |
| 3.3844488405497115 | AYLI/3.2.93 | As you like it           | TOUCHSTONE   | right butter-womens rank to market.                |
| 3.3844488405497115 | KL/2.4.132  | King Lear                | Fool         | kindness to his horse, buttered his hay.           |
| 3.0278411214953107 | AWW/5.2.9   | Alls well that ends well | Clown        | henceforth eat no fish of fortunes buttering.      |
| 3.0278411214953107 | MWW/2.2.260 | Merry Wives of Windsor   | FALSTAFF     | Hang him, mechanical salt-butter rogue! I will     |
| 3.0278411214953107 | MWW/2.2.284 | Merry Wives of Windsor   | FORD         | rather trust a Fleming with my butter, Parson Hugh |
| 3.0278411214953107 | MWW/3.5.7   | Merry Wives of Windsor   | FALSTAFF     | Ill have my brains taen out and buttered, and give |
| 3.0278411214953107 | MWW/3.5.102 | Merry Wives of Windsor   | FALSTAFF     | to heat as butter; a man of continual dissolution  |
| 2.739219044070792  | H4/2.4.115  | Henry IV                 | PRINCE HENRY | Didst thou never see Titan kiss a dish of butter?  |

Unlike standard indexes, full-text indexes don't auto-update as the underlying data is changed, so you need to `PRAGMA drop_fts_index(my_fts_index)` and recreate it when appropriate.

## Note on Generating the Corpus Table

For more details, see the [“Generating a Shakespeare corpus for full-text searching from JSON” blog post](https://duckdb.blogspot.com/2023/04/generating-shakespeare-corpus-for-full.html).

* The Columns are: line_id, play_name, line_number, speaker, text_entry.
* We need a unique key for each row in order for full-text searching to work.
* The line_id `KL/2.4.132` means King Lear, Act 2, Scene 4, Line 132.