{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "27b86f5a-e9da-497c-aef8-01b24613962c", "metadata": {}, "outputs": [], "source": [ "Oracle AI Data Platform v1.0\n", "\n", "Copyright \u00a9 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "206813f7-5e2f-490e-8857-c282ae2b703f", "metadata": { "execution": { "iopub.status.busy": "2025-03-26T05:32:45.403Z" }, "type": "python" }, "source": [ "# Integrate with ALH\n", " **Loading Data in ALH**\n", " \n", " This notebook demonstrates writing data into ALH in AI Data Platform. It covers:\n", " \n", " 1. **Create ALH External Catalog**\n", " 2. **Inserting data into table in the ALH using pyspark insertInto**\n", " 3. **Inserting data into table in the ALH using SQL INSERT**\n", " \n", " **Parameters**\n", " - Workspace location for the wallet file - you must upload your ADB wallet\n", " - User name for your ADB\n", " - Password for your ADB\n", " - TNS name to use\n", " - Wallet password\n", "\n", " **Prerequisites**\n", "\n", "Before you begin, ensure you have:\n", " - An ALH accessible from your AI Data Platform workspace (if its private you need to create a workspace to use private connectivity)\n", " - A configured AI Data Platform environment with a compute cluster created.\n", "\n", " - Upload your ADB wallet into the workspace and set the parameter for the wallet file location\n", " - Create the table in your ALH using ALH OCI Console or your other favorite IDE\n", " - create table gold.patient_data(\"patient_id\" varchar2(30), \"heart_rate_bpm\" number, \"blood_pressure\" varchar2(30), \"oxygen_saturation\" number, \"room_number\" varchar2(30), \"status\" varchar2(30));\n", "\n", " **Next Steps**\n", "\n", "Follow the step-by-step instructions in this notebook to implement each approach. You can also refer to the following resources for additional guidance:\n", " - Creating external catalogs in AI Data Platform\n", "\n", "By the end of this notebook, you\u2019ll be able to load data into an ALH table from AI Data Platform." ] }, { "cell_type": "code", "execution_count": null, "id": "989b56ed-13a6-4ee8-b8b3-d65db157e515", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:52:07.900Z" }, "type": "python" }, "outputs": [], "source": [ "# Define parameters (can set parameters in a workflow job)\n", "user=oidlUtils.parameters.getParameter(\"USER\", \"replace_with_your_db_user\")\n", "passwd=oidlUtils.parameters.getParameter(\"PASSWD\", \"replace_with_your_db_password\")\n", "tns=oidlUtils.parameters.getParameter(\"TNS\", \"replace_with_your_tns\")\n", "wallet_passwd=oidlUtils.parameters.getParameter(\"WALLET_PASSWD\", \"replace_with_your_wallet_password\")\n", "wallet_path=oidlUtils.parameters.getParameter(\"WALLET_PATH\", \"replace_with_your_wallet_path\")\n", "#End of parameters that need to be set, you can run the rest of the cells\"" ] }, { "cell_type": "code", "execution_count": null, "id": "c5fa89dc-097e-41e2-bdc6-f9c93f12b060", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:52:21.012Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+-------+\n",
       "|status |\n",
       "+-------+\n",
       "|CREATED|\n",
       "+-------+\n",
       "\n",
       "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import base64\n", "\n", "byte_array=[]\n", "try:\n", " with open(wallet_path, 'rb') as file:\n", " byte_array = bytearray(file.read())\n", "except FileNotFoundError:\n", " print(f\"Error: File not found: {wallet_path}\")\n", "except Exception as e:\n", " print(f\"An error occurred: {e}\")\n", "\n", "wt = base64.b64encode(byte_array).decode('utf-8')\n", "create_sql=f\"create external catalog if not exists catalog_ALH options ('wallet.content' = '{wt}', 'type' = 'ORACLE_ALH', 'user.name' = '{user}', 'tns' = '{tns}', 'password' = '{passwd}','wallet.password' = '{wallet_passwd}')\"\n", "spark.sql(create_sql).show(1000,False)" ] }, { "cell_type": "markdown", "id": "d2002de1", "metadata": {}, "source": [ "### Alternate approach 1\n", "#### Copy to clipboard\n", "\n", "#### cat wallet.zip | base64 | pbcopy\n", "\n", "### Alternate approach 1\n", "#### Write the output to a text file and copy it manually\n", "\n", "#### cat wallet.zip | base64 > wallet.txt\n", "\n", "### Copying from a text file can help avoid any unintended characters that might get introduced when copying directly from the terminal." ] }, { "cell_type": "code", "execution_count": null, "id": "5459a224-a2c5-4d72-bf00-03b5eac0ecb8", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:54:47.118Z" }, "type": "python" }, "outputs": [], "source": [ "df = spark.createDataFrame(\n", "[\n", " (\"P001\", 72, \"120/80\", 98, \"302A\", \"stable\"),\n", " (\"P002\", 95, \"145/95\", 90, \"215B\", \"critical\")\n", "],[\"patient_id\", \"heart_rate_bpm\", \"blood_pressure\", \"oxygen_saturation\", \"room_number\", \"status\"])\n", "df.write.insertInto(\"catalog_ALH.gold.patient_data\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7acc9c2c-77b8-49e7-ada6-7ac286c12c71", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:54:57.569Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number|  status|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|      P002| 95.0000000000|        145/95|    90.0000000000|       215B|critical|\n",
       "|      P001| 72.0000000000|        120/80|    98.0000000000|       302A|  stable|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "\n",
       "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df=spark.sql(\"select * from catalog_ALH.gold.patient_data\")\n", "df.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "63b96546-fc40-458f-948c-4575bae418e0", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:56:47.013Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number|  status|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|      P002| 95.0000000000|        145/95|    90.0000000000|       215B|critical|\n",
       "|      P001| 72.0000000000|        120/80|    98.0000000000|       302A|  stable|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "\n",
       "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Read using Spark.read.table\n", "df = spark.read.table(\"catalog_ALH.gold.patient_data\")\n", "df.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "c2127015-b154-42d8-a447-3bce7f990cd9", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T21:58:49.100Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number|  status|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|      P003|            88|        125/95|               92|       215B|critical|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "\n",
       "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Let's create a dataframe to use for SQL INSERT into ALH\n", "df = spark.createDataFrame(\n", "[\n", " (\"P003\", 88, \"125/95\", 92, \"215B\", \"critical\")\n", "],[\"patient_id\", \"heart_rate_bpm\", \"blood_pressure\", \"oxygen_saturation\", \"room_number\", \"status\"])\n", "df.createOrReplaceTempView(\"src_data\")\n", "spark.sql(\"select * from src_data\").show()" ] }, { "cell_type": "code", "execution_count": null, "id": "0901f6f3-bc6d-490f-810f-e1695fb4775c", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T22:00:12.684Z" }, "type": "sql" }, "outputs": [ { "data": { "text/html": [ "
OK
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%sql\n", "INSERT into catalog_ALH.gold.patient_data select * from src_data" ] }, { "cell_type": "code", "execution_count": null, "id": "ef40ccd0-81a4-4eac-811f-6a192364024b", "metadata": { "execution": { "iopub.status.busy": "2025-03-27T22:00:22.447Z" }, "type": "python" }, "outputs": [ { "data": { "text/html": [ "
+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|patient_id|heart_rate_bpm|blood_pressure|oxygen_saturation|room_number|  status|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "|      P002| 95.0000000000|        145/95|    90.0000000000|       215B|critical|\n",
       "|      P001| 72.0000000000|        120/80|    98.0000000000|       302A|  stable|\n",
       "|      P003| 88.0000000000|        125/95|    92.0000000000|       215B|critical|\n",
       "+----------+--------------+--------------+-----------------+-----------+--------+\n",
       "\n",
       "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Read using Spark.read.table\n", "df = spark.read.table(\"catalog_ALH.gold.patient_data\")\n", "df.show()" ] } ], "metadata": { "Last_Active_Cell_Index": 12, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }