{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Linking banking transactions\n", "\n", "This example shows how to perform a one-to-one link on banking transactions. \n", "\n", "The data is fake data, and was generated has the following features:\n", "\n", "- Money shows up in the destination account with some time delay\n", "- The amount sent and the amount received are not always the same - there are hidden fees and foreign exchange effects\n", "- The memo is sometimes truncated and content is sometimes missing\n", "\n", "Since each origin payment should end up in the destination account, the `probability_two_random_records_match` of the model is known." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'There are 1,000 records to match'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use arrow to read in data to ensure date types are correct\n", "from pyarrow import parquet as pq\n", "from splink.duckdb.duckdb_linker import DuckDBLinker\n", "import altair as alt\n", "alt.renderers.enable('mimetype')\n", "\n", "df_origin = pq.read_table(\"./data/transactions_left.parquet\")\n", "df_origin = df_origin.slice(length=1_000)\n", "df_destination = pq.read_table(\"./data/transactions_right.parquet\")\n", "df_destination = df_destination.slice(length=1_000)\n", "f\"There are {df_origin.num_rows:,.0f} records to match\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ground_truth | \n", "memo | \n", "transaction_date | \n", "amount | \n", "unique_id | \n", "
---|---|---|---|---|---|
0 | \n", "0 | \n", "MATTHIAS C paym | \n", "2022-03-28 | \n", "36.36 | \n", "0 | \n", "
1 | \n", "1 | \n", "M CORVINUS dona | \n", "2022-02-14 | \n", "221.91 | \n", "1 | \n", "