{ "cells": [ { "cell_type": "markdown", "id": "584fa1da-de62-4a0c-a333-f9d9d08e3838", "metadata": {}, "source": [ "## Oracle AI Data Platform v1.0\n", "\n", "Copyright © 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "24ecb8a4-69c6-4533-85df-fd8f0877f238", "metadata": { "type": "markdown" }, "source": [ "### Example: Read Pipe-Delimited File and Create External Table in Spark\n", "\n", "This notebook demonstrates how to read a pipe-delimited (`|`) file from OCI Object Storage using Spark, infer its schema, and create an external table in the Master Catalog.\n", "\n", "#### Assumptions:\n", "- The source data is stored in a CSV format in OCI Object Storage.\n", "- default is the catalog used to register the external table.\n", "- Header row is present and schema is inferred automatically." ] }, { "cell_type": "code", "execution_count": null, "id": "75e10a5c-84b5-40ff-8bcc-a2ee8dd0d8af", "metadata": { "type": "python" }, "outputs": [], "source": [ "catalog_name = \"default\"\n", "schema_name = \"default\"\n", "table_name = \"sample_pipe_table\"\n", "\n", "fq_table_name = f\"{catalog_name}.{schema_name}.{table_name}\"\n", "\n", "csv_data_folder_path = \"\" ## Example \"oci://bucket-name@namespace/foldername/\"\n", "\n", "csv_read_options = {\n", " \"header\": \"true\",\n", " \"inferSchema\": \"true\",\n", " \"delimiter\": \"|\"\n", "}\n", "\n", "csv_df = spark.read.options(**csv_read_options).format(\"csv\").load(csv_data_folder_path)\n", "\n", "#csv_df.show()\n", "\n", "columns_definitions = csv_df._jdf.schema().toDDL()\n", "#print(columns_definitions)\n", "\n", "if spark.catalog.tableExists(fq_table_name):\n", " spark.sql(f\"DROP TABLE IF EXISTS {fq_table_name}\").show()\n", " print(f\"Dropped the external table: {fq_table_name}\")\n", "else:\n", " print(f\"Didn't drop the external table `{fq_table_name}` as it doesn't exist!\")\n", " \n", "create_table_ddl = f\"\"\"CREATE EXTERNAL TABLE IF NOT EXISTS {fq_table_name}\n", " ({columns_definitions})\n", " USING CSV\n", " OPTIONS('header'='true', 'delimiter'='|')\n", " LOCATION '{csv_data_folder_path}'\"\"\"\n", "\n", "#print(f\"Would create table using below DDL:\\n{create_table_ddl}\")\n", "\n", "\n", "if not spark.catalog.tableExists(fq_table_name):\n", " spark.sql(create_table_ddl).show()\n", " print(f\"Created external table: {fq_table_name}\")\n", "else:\n", " print(f\"Didn't create the external table `{fq_table_name}` as it already exists\")" ] }, { "cell_type": "code", "execution_count": null, "id": "1704f33e-6bb0-45ee-ba23-7cf9acfff14b", "metadata": { "type": "python" }, "outputs": [], "source": [ "df = spark.read.table(\"default.default.sample_pipe_table\")\n", "df.show()" ] } ], "metadata": { "Last_Active_Cell_Index": 2, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }