{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Deduplicating the febrl3 dataset\n", "\n", "See A.2 [here](https://arxiv.org/pdf/2008.04443.pdf) and [here](https://recordlinkage.readthedocs.io/en/latest/ref-datasets.html) for the source of this data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | rec_id | \n", "given_name | \n", "surname | \n", "street_number | \n", "address_1 | \n", "address_2 | \n", "suburb | \n", "postcode | \n", "state | \n", "date_of_birth | \n", "soc_sec_id | \n", "cluster | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "rec-1496-org | \n", "mitchell | \n", "green | \n", "7.0 | \n", "wallaby place | \n", "delmar | \n", "cleveland | \n", "2119 | \n", "sa | \n", "19560409 | \n", "1804974 | \n", "rec-1496 | \n", "
1 | \n", "rec-552-dup-3 | \n", "harley | \n", "mccarthy | \n", "177.0 | \n", "pridhamstreet | \n", "milton | \n", "marsden | \n", "3165 | \n", "nsw | \n", "19080419 | \n", "6089216 | \n", "rec-552 | \n", "