{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "27b86f5a-e9da-497c-aef8-01b24613962c", "metadata": {}, "outputs": [], "source": [ "Oracle AI Data Platform v1.0\n", "\n", "Copyright \u00a9 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "206813f7-5e2f-490e-8857-c282ae2b703f", "metadata": { "execution": { "iopub.status.busy": "2025-03-26T05:32:45.403Z" }, "type": "python" }, "source": [ "# Integrate with ALH\n", " **Loading Data in ALH**\n", " \n", " This notebook demonstrates writing data into ALH in AI Data Platform. It covers:\n", " \n", " 1. **Create ALH External Catalog**\n", " 2. **Inserting data into table in the ALH using pyspark insertInto**\n", " 3. **Inserting data into table in the ALH using SQL INSERT**\n", " \n", " **Parameters**\n", " - Workspace location for the wallet file - you must upload your ADB wallet\n", " - User name for your ADB\n", " - Password for your ADB\n", " - TNS name to use\n", " - Wallet password\n", "\n", " **Prerequisites**\n", "\n", "Before you begin, ensure you have:\n", " - An ALH accessible from your AI Data Platform workspace (if its private you need to create a workspace to use private connectivity)\n", " - A configured AI Data Platform environment with a compute cluster created.\n", "\n", " - Upload your ADB wallet into the workspace and set the parameter for the wallet file location\n", " - Create the table in your ALH using ALH OCI Console or your other favorite IDE\n", " - create table gold.patient_data(\"patient_id\" varchar2(30), \"heart_rate_bpm\" number, \"blood_pressure\" varchar2(30), \"oxygen_saturation\" number, \"room_number\" varchar2(30), \"status\" varchar2(30));\n", "\n", " **Next Steps**\n", "\n", "Follow the step-by-step instructions in this notebook to implement each approach. You can also refer to the following resources for additional guidance:\n", " - Creating external catalogs in AI Data Platform\n", "\n", "By the end of this notebook, you\u2019ll be able to load data into an ALH table from AI Data Platform." ] }, { "cell_type": "code", "execution_count": null, "id": "989b56ed-13a6-4ee8-b8b3-d65db157e515", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:52:07.900Z" }, "type": "python" }, "outputs": [], "source": [ "# Define parameters (can set parameters in a workflow job)\n", "user=oidlUtils.parameters.getParameter(\"USER\", \"replace_with_your_db_user\")\n", "passwd=oidlUtils.parameters.getParameter(\"PASSWD\", \"replace_with_your_db_password\")\n", "tns=oidlUtils.parameters.getParameter(\"TNS\", \"replace_with_your_tns\")\n", "wallet_passwd=oidlUtils.parameters.getParameter(\"WALLET_PASSWD\", \"replace_with_your_wallet_password\")\n", "wallet_path=oidlUtils.parameters.getParameter(\"WALLET_PATH\", \"replace_with_your_wallet_path\")\n", "#End of parameters that need to be set, you can run the rest of the cells\"" ] }, { "cell_type": "code", "execution_count": null, "id": "c5fa89dc-097e-41e2-bdc6-f9c93f12b060", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:52:21.012Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+-------+\n",
"|status |\n",
"+-------+\n",
"|CREATED|\n",
"+-------+\n",
"\n",
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import base64\n",
"\n",
"byte_array=[]\n",
"try:\n",
" with open(wallet_path, 'rb') as file:\n",
" byte_array = bytearray(file.read())\n",
"except FileNotFoundError:\n",
" print(f\"Error: File not found: {wallet_path}\")\n",
"except Exception as e:\n",
" print(f\"An error occurred: {e}\")\n",
"\n",
"wt = base64.b64encode(byte_array).decode('utf-8')\n",
"create_sql=f\"create external catalog if not exists catalog_ALH options ('wallet.content' = '{wt}', 'type' = 'ORACLE_ALH', 'user.name' = '{user}', 'tns' = '{tns}', 'password' = '{passwd}','wallet.password' = '{wallet_passwd}')\"\n",
"spark.sql(create_sql).show(1000,False)"
]
},
{
"cell_type": "markdown",
"id": "d2002de1",
"metadata": {},
"source": [
"### Alternate approach 1\n",
"#### Copy to clipboard\n",
"\n",
"#### cat wallet.zip | base64 | pbcopy\n",
"\n",
"### Alternate approach 1\n",
"#### Write the output to a text file and copy it manually\n",
"\n",
"#### cat wallet.zip | base64 > wallet.txt\n",
"\n",
"### Copying from a text file can help avoid any unintended characters that might get introduced when copying directly from the terminal."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5459a224-a2c5-4d72-bf00-03b5eac0ecb8",
"metadata": {
"execution": {
"iopub.status.busy": "2025-03-27T21:54:47.118Z"
},
"type": "python"
},
"outputs": [],
"source": [
"df = spark.createDataFrame(\n",
"[\n",
" (\"P001\", 72, \"120/80\", 98, \"302A\", \"stable\"),\n",
" (\"P002\", 95, \"145/95\", 90, \"215B\", \"critical\")\n",
"],[\"patient_id\", \"heart_rate_bpm\", \"blood_pressure\", \"oxygen_saturation\", \"room_number\", \"status\"])\n",
"df.write.insertInto(\"catalog_ALH.gold.patient_data\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7acc9c2c-77b8-49e7-ada6-7ac286c12c71",
"metadata": {
"execution": {
"iopub.status.busy": "2025-03-27T21:54:57.569Z"
},
"type": "python"
},
"outputs": [
{
"data": {
"text/html": [
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number| status|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"| P002| 95.0000000000| 145/95| 90.0000000000| 215B|critical|\n",
"| P001| 72.0000000000| 120/80| 98.0000000000| 302A| stable|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"\n",
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df=spark.sql(\"select * from catalog_ALH.gold.patient_data\")\n",
"df.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63b96546-fc40-458f-948c-4575bae418e0",
"metadata": {
"execution": {
"iopub.status.busy": "2025-03-27T21:56:47.013Z"
},
"type": "python"
},
"outputs": [
{
"data": {
"text/html": [
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number| status|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"| P002| 95.0000000000| 145/95| 90.0000000000| 215B|critical|\n",
"| P001| 72.0000000000| 120/80| 98.0000000000| 302A| stable|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"\n",
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Read using Spark.read.table\n",
"df = spark.read.table(\"catalog_ALH.gold.patient_data\")\n",
"df.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2127015-b154-42d8-a447-3bce7f990cd9",
"metadata": {
"execution": {
"iopub.status.busy": "2025-03-27T21:58:49.100Z"
},
"type": "python"
},
"outputs": [
{
"data": {
"text/html": [
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number| status|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"| P003| 88| 125/95| 92| 215B|critical|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"\n",
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Let's create a dataframe to use for SQL INSERT into ALH\n",
"df = spark.createDataFrame(\n",
"[\n",
" (\"P003\", 88, \"125/95\", 92, \"215B\", \"critical\")\n",
"],[\"patient_id\", \"heart_rate_bpm\", \"blood_pressure\", \"oxygen_saturation\", \"room_number\", \"status\"])\n",
"df.createOrReplaceTempView(\"src_data\")\n",
"spark.sql(\"select * from src_data\").show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0901f6f3-bc6d-490f-810f-e1695fb4775c",
"metadata": {
"execution": {
"iopub.status.busy": "2025-03-27T22:00:12.684Z"
},
"type": "sql"
},
"outputs": [
{
"data": {
"text/html": [
"OK" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%sql\n", "INSERT into catalog_ALH.gold.patient_data select * from src_data" ] }, { "cell_type": "code", "execution_count": null, "id": "ef40ccd0-81a4-4eac-811f-6a192364024b", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T22:00:22.447Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+----------+--------------+--------------+-----------------+-----------+--------+\n",
"|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number| status|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"| P002| 95.0000000000| 145/95| 90.0000000000| 215B|critical|\n",
"| P001| 72.0000000000| 120/80| 98.0000000000| 302A| stable|\n",
"| P003| 88.0000000000| 125/95| 92.0000000000| 215B|critical|\n",
"+----------+--------------+--------------+-----------------+-----------+--------+\n",
"\n",
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Read using Spark.read.table\n",
"df = spark.read.table(\"catalog_ALH.gold.patient_data\")\n",
"df.show()"
]
}
],
"metadata": {
"Last_Active_Cell_Index": 12,
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}