{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "d6244506", "metadata": { "id": "d6244506" }, "source": [ "
\n", "\n", "# PySpark miscellanea\n", " \n", "

Table of Contents

\n", "
\n", "\n" ] }, { "cell_type": "markdown", "source": [ "### Imports" ], "metadata": { "id": "sQMqWsAziAhH" }, "id": "sQMqWsAziAhH" }, { "cell_type": "code", "source": [ "%%bash\n", "pip install pyspark" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jm-8qt32hxyG", "outputId": "0158d638-131e-40ca-d41f-d1b836cdfab5" }, "id": "jm-8qt32hxyG", "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Requirement already satisfied: pyspark in /usr/local/lib/python3.7/dist-packages (3.3.1)\n", "Requirement already satisfied: py4j==0.10.9.5 in /usr/local/lib/python3.7/dist-packages (from pyspark) (0.10.9.5)\n" ] } ] }, { "cell_type": "markdown", "id": "9dfe7c0a", "metadata": { "id": "9dfe7c0a" }, "source": [ "## How to get your application's id in pyspark\n", "\n", "\n", "See also: [How to extract application ID from the PySpark context](https://stackoverflow.com/questions/30983226/how-to-extract-application-id-from-the-pyspark-context)\n", "\n" ] }, { "cell_type": "markdown", "id": "02dbf961", "metadata": { "id": "02dbf961" }, "source": [ "### Starting from a Spark session\n", "\n", "What is a [Spark session](https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession)?" ] }, { "cell_type": "code", "execution_count": 2, "id": "0fd4ce7b", "metadata": { "id": "0fd4ce7b" }, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "spark = SparkSession \\\n", " .builder \\\n", " .appName(\"App ID\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "id": "d02813d3", "metadata": { "id": "d02813d3" }, "source": [ "Get the session's context ([what is a Spark context?](https://spark.apache.org/docs/latest/rdd-programming-guide.html#initializing-spark) and [detailed documentation](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.html))." ] }, { "cell_type": "code", "execution_count": 3, "id": "7d8b99e3", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "id": "7d8b99e3", "outputId": "3c3e5dd3-80b5-4df0-d5b6-004a9678bb5f" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": [ "\n", "
\n", "

SparkContext

\n", "\n", "

Spark UI

\n", "\n", "
\n", "
Version
\n", "
v3.3.1
\n", "
Master
\n", "
local[*]
\n", "
AppName
\n", "
App ID
\n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 3 } ], "source": [ "sc = spark.sparkContext\n", "sc" ] }, { "cell_type": "markdown", "id": "57df230b", "metadata": { "id": "57df230b" }, "source": [ "Get `applicationId` from the context." ] }, { "cell_type": "code", "execution_count": 4, "id": "bed1685a", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "bed1685a", "outputId": "b395a5f3-54d8-474c-9d79-2d99308e67fd" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'local-1668378738271'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 4 } ], "source": [ "sc.applicationId" ] }, { "cell_type": "markdown", "id": "de33f1ce", "metadata": { "id": "de33f1ce" }, "source": [ "All in one step:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f0023074", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "f0023074", "outputId": "a68d0e19-6af3-435b-ab92-6bacb35c10cb" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'local-1668378738271'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 5 } ], "source": [ "spark.sparkContext.applicationId" ] }, { "cell_type": "markdown", "id": "dc31a770", "metadata": { "id": "dc31a770" }, "source": [ "**Note:** if you're using the _pyspark shell_ (see [using the shell](https://spark.apache.org/docs/latest/rdd-programming-guide.html#initializing-spark)), `SparkContext` is created automatically and it can be accessed from the variable called `sc`." ] }, { "cell_type": "markdown", "id": "44ece7a4", "metadata": { "id": "44ece7a4" }, "source": [ "## How to get/set default parallelism in pyspark" ] }, { "cell_type": "markdown", "id": "880e6ada", "metadata": { "id": "880e6ada" }, "source": [ "Create a [Spark session](https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession)." ] }, { "cell_type": "code", "execution_count": 6, "id": "451653da", "metadata": { "id": "451653da" }, "outputs": [], "source": [ "spark = SparkSession \\\n", " .builder \\\n", " .appName(\"defaultParallelism\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "id": "d2fe39a2", "metadata": { "id": "d2fe39a2" }, "source": [ "Check the value of `defaultParallelism`:" ] }, { "cell_type": "code", "execution_count": 7, "id": "f221e3d1", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "f221e3d1", "outputId": "64b93c84-f06a-4a9d-f590-4172ffd0ff9f" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "spark.sparkContext.defaultParallelism" ] }, { "cell_type": "markdown", "id": "69b3994c", "metadata": { "id": "69b3994c" }, "source": [ "To change a property it's necessary to stop and start a new context/session." ] }, { "cell_type": "code", "execution_count": 8, "id": "4cf47f42", "metadata": { "id": "4cf47f42" }, "outputs": [], "source": [ "spark = SparkSession \\\n", " .builder \\\n", " .appName(\"Set parallelism\") \\\n", " .config(\"spark.default.parallelism\", 8) \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "id": "c6725687", "metadata": { "id": "c6725687" }, "source": [ "Default parallelism hasn't changed!" ] }, { "cell_type": "code", "execution_count": 9, "id": "6b592d7e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6b592d7e", "outputId": "a18b5e4f-0307-4df6-aed6-c5203073eaea" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2" ] }, "metadata": {}, "execution_count": 9 } ], "source": [ "spark.sparkContext.defaultParallelism" ] }, { "cell_type": "markdown", "id": "49e168d0", "metadata": { "id": "49e168d0" }, "source": [ "Stop and start session anew." ] }, { "cell_type": "code", "execution_count": 10, "id": "da0512cd", "metadata": { "id": "da0512cd" }, "outputs": [], "source": [ "spark.stop()\n", "spark = SparkSession \\\n", " .builder \\\n", " .appName(\"Set parallelism\") \\\n", " .config(\"spark.default.parallelism\", 4) \\\n", " .getOrCreate()" ] }, { "cell_type": "code", "execution_count": 11, "id": "bba24e5e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bba24e5e", "outputId": "4d0a48f6-0436-42f1-d708-8fe5e7a6b459" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "4" ] }, "metadata": {}, "execution_count": 11 } ], "source": [ "spark.sparkContext.defaultParallelism" ] }, { "cell_type": "markdown", "id": "1ef9f91f", "metadata": { "id": "1ef9f91f" }, "source": [ "Great! Now the context has been changed (and also the applications's name has been updated)." ] }, { "cell_type": "code", "execution_count": 12, "id": "0e2fbc63", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "id": "0e2fbc63", "outputId": "5611a4dd-8e8f-4f1f-9789-4b764599df3d" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": [ "\n", "
\n", "

SparkContext

\n", "\n", "

Spark UI

\n", "\n", "
\n", "
Version
\n", "
v3.3.1
\n", "
Master
\n", "
local[*]
\n", "
AppName
\n", "
Set parallelism
\n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 12 } ], "source": [ "spark.sparkContext" ] }, { "cell_type": "markdown", "id": "238722de", "metadata": { "id": "238722de" }, "source": [ "### What is `spark.default.parallelism`?\n", "\n", "This property determines the default number of chunks in which an RDD ([Resilient Distributed Dataset](https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds)) is partitioned.\n", "\n", "Unless specified by the user, the value of is set based on the _cluster manager_:\n", " - in standalone mode it is equal to the number of (virtual) cores on the local machine\n", " - in Mesos: 8\n", " - for YARN, Kubernetes: total number of cores on all executor nodes or 2, whichever is larger\n", " \n", " (see [Spark configuration/Execution behavior](https://spark.apache.org/docs/latest/configuration.html#execution-behavior))" ] }, { "cell_type": "markdown", "id": "8cb775c1", "metadata": { "id": "8cb775c1" }, "source": [ "## About `spark-defaults.conf`" ] }, { "cell_type": "markdown", "id": "4cb5650c", "metadata": { "id": "4cb5650c" }, "source": [ "The file `spark-defaults.conf` contains the default Spark configuration properties and it is by default located in Spark's configuration directory `$SPARK_HOME/conf` (see [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html)). \n", "\n", "The format of `spark-defaults.conf` is whitespace-separated lines containing property name and value, for instance:\n", "```\n", "spark.master spark://5.6.7.8:7077\n", "spark.executor.memory 4g\n", "spark.eventLog.enabled true\n", "spark.serializer org.apache.spark.serializer.KryoSerializer\n", "```" ] }, { "cell_type": "markdown", "id": "81a5dae5", "metadata": { "id": "81a5dae5" }, "source": [ "### Where is my `spark-defaults.conf`?" ] }, { "cell_type": "markdown", "source": [ "Use `findspark` to find the location of `SPARK_HOME`." ], "metadata": { "id": "zm4AzgMZmtjF" }, "id": "zm4AzgMZmtjF" }, { "cell_type": "code", "source": [ "!pip install findspark" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AzGcNtRWlObJ", "outputId": "8654c799-e4f9-44a5-a2da-fa1ec1055079" }, "id": "AzGcNtRWlObJ", "execution_count": 13, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Requirement already satisfied: findspark in /usr/local/lib/python3.7/dist-packages (2.0.1)\n" ] } ] }, { "cell_type": "code", "source": [ "import findspark\n", "import os\n", "findspark.init()\n", "os.environ[\"SPARK_HOME\"]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "dZghO2oqlREb", "outputId": "3ab3ce98-f1ed-4ce5-ab67-f4e310998586" }, "id": "dZghO2oqlREb", "execution_count": 14, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'/usr/local/lib/python3.7/dist-packages/pyspark'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 14 } ] }, { "cell_type": "markdown", "source": [ "Let's look for all files called `spark-defaults*` in Spark's configuration directory:" ], "metadata": { "id": "obTrz_VXmmVg" }, "id": "obTrz_VXmmVg" }, { "cell_type": "code", "execution_count": 15, "id": "6fede028", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6fede028", "outputId": "eebb70a6-ce61-4641-e317-ad7337b5895f" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['/usr/local/lib/python3.7/dist-packages/pyspark/conf/spark-defaults.conf']" ] }, "metadata": {}, "execution_count": 15 } ], "source": [ "import glob\n", "glob.glob(os.path.join(os.environ[\"SPARK_HOME\"], \"conf\", \"spark-defaults*\"))" ] }, { "cell_type": "markdown", "source": [ "If no file `spark-defaults.conf` is contained in Spark's configuration directory, you should find a _template_ configuration file `spark-defaults.conf.template`. You can rename this to `spark-defaults.conf` and use it as default configuration file.\n" ], "metadata": { "id": "pqWv5HTmG3V7" }, "id": "pqWv5HTmG3V7" }, { "cell_type": "markdown", "id": "077ff7c0", "metadata": { "id": "077ff7c0" }, "source": [ "If neither `spark-defaults.conf` nor a template file is found, create `spark-defaults.conf`. " ] }, { "cell_type": "code", "source": [ "if not os.path.exists(os.path.join(os.environ[\"SPARK_HOME\"], \"conf\")):\n", " os.makedirs(os.path.join(os.environ[\"SPARK_HOME\"], \"conf\"))" ], "metadata": { "id": "_tVY3uIiqg91" }, "id": "_tVY3uIiqg91", "execution_count": 16, "outputs": [] }, { "cell_type": "code", "execution_count": 17, "id": "4ab0f938", "metadata": { "id": "4ab0f938", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "0b21b797-7db0-40b5-ae5e-eb74a9f314ad" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Overwriting /usr/local/lib/python3.7/dist-packages/pyspark/conf/spark-defaults.conf\n" ] } ], "source": [ "%%writefile /usr/local/lib/python3.7/dist-packages/pyspark/conf/spark-defaults.conf\n", "#\n", "# Licensed to the Apache Software Foundation (ASF) under one or more\n", "# contributor license agreements. See the NOTICE file distributed with\n", "# this work for additional information regarding copyright ownership.\n", "# The ASF licenses this file to You under the Apache License, Version 2.0\n", "# (the \"License\"); you may not use this file except in compliance with\n", "# the License. You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "#\n", "\n", "# Default system properties included when running spark-submit.\n", "# This is useful for setting default environmental settings.\n", "\n", "# Example:\n", "# spark.master spark://master:7077\n", "# spark.eventLog.enabled true\n", "# spark.eventLog.dir hdfs://namenode:8021/directory\n", "# spark.serializer org.apache.spark.serializer.KryoSerializer\n", "# spark.driver.memory 5g\n", "# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=\"one two three\"" ] }, { "cell_type": "markdown", "id": "21692085", "metadata": { "id": "21692085" }, "source": [ "As you see, everything is commented out in the template file. Just uncomment and edit the properties you want to set as defaults." ] }, { "cell_type": "markdown", "id": "2bc0e3e7", "metadata": { "id": "2bc0e3e7" }, "source": [ "### How to override the default configuration directory\n", "\n", "If you want to have your Spark configuration files in a directory other than `$SPARK_HOME/conf`, you can set the environment variable `SPARK_CONF_DIR`. \n", "\n", "Spark will then look in `$SPARK_CONF_DIR` for all of its configuration files: `spark-defaults.conf`, `spark-env.sh`, `log4j2.properties`, etc. (see https://spark.apache.org/docs/latest/configuration.html#overriding-configuration-directory).\n", "\n", "Here's the list of files in the default Spark configuration directory:" ] }, { "cell_type": "code", "execution_count": 18, "id": "ff9642c7", "metadata": { "id": "ff9642c7", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "1894c8b3-dfc6-4e46-ee12-bffb9514c635" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['spark-defaults.conf']" ] }, "metadata": {}, "execution_count": 18 } ], "source": [ "os.listdir(os.path.join(os.environ[\"SPARK_HOME\"],\"conf\"))" ] }, { "cell_type": "markdown", "id": "91a61ed7", "metadata": { "id": "91a61ed7" }, "source": [ "But now assume that you have no `spark-defaults.conf` or did not configure Spark anywhere else. Still, Spark has some _default values_ for several properties.\n", "\n", "Where are those properties defined and how to get their default values?" ] }, { "cell_type": "markdown", "id": "5fd5be6f", "metadata": { "id": "5fd5be6f" }, "source": [ "#### Spark configuration properties\n", "\n", "Spark's documentation provides the list of all [available properties](https://spark.apache.org/docs/latest/configuration.html#available-properties) grouped into several categories:\n", "\n", " - [application properties](https://spark.apache.org/docs/latest/configuration.html#application-properties)\n", " - [runtime environment](https://spark.apache.org/docs/latest/configuration.html#runtime-environment)\n", " - [shuffle behavior](https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior)\n", " - [Spark UI](https://spark.apache.org/docs/latest/configuration.html#spark-ui)\n", " - [compression and serialization](https://spark.apache.org/docs/latest/configuration.html#compression-and-serialization)\n", " - [memory management](https://spark.apache.org/docs/latest/configuration.html#memory-management)\n", " - [execution behavior](https://spark.apache.org/docs/latest/configuration.html#execution-behavior)\n", " - [executor metrics](https://spark.apache.org/docs/latest/configuration.html#executor-metrics)\n", " - [networking](https://spark.apache.org/docs/latest/configuration.html#networking)\n", " - [scheduling](https://spark.apache.org/docs/latest/configuration.html#scheduling)\n", " - [barrier execution mode](https://spark.apache.org/docs/latest/configuration.html#barrier-execution-mode)\n", " - [dynamic allocation](https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation)\n", " - [thread configurations](https://spark.apache.org/docs/latest/configuration.html#thread-configurations)\n", " - [security](https://spark.apache.org/docs/latest/security.html)\n", " - [Spark SQL](https://spark.apache.org/docs/latest/configuration.html#spark-sql)\n", " - [Spark streaming](https://spark.apache.org/docs/latest/configuration.html#spark-streaming)\n", " - [SparkR](https://spark.apache.org/docs/latest/configuration.html#sparkr)\n", " - [GraphX](https://spark.apache.org/docs/latest/configuration.html#graphx)\n", " - [Deploy](https://spark.apache.org/docs/latest/configuration.html#deploy)" ] }, { "cell_type": "markdown", "id": "98e1480f", "metadata": { "id": "98e1480f" }, "source": [ "All properties have a default value that should accommodate most situations.\n", "\n", "As a beginner you might want to give your application a name by configuring `spark.app.name` and perhaps change the default values of the following properties:\n", " - `spark.master` and `spark.submit.deployMode` to define where the application should be deployed\n", " - `spark.driver.memory` and `spark.driver.maxResultSize` to control the memory usage of the driver\n", " - `spark.executor.memory` and `spark.executor.cores` to control executors\n", " \n", " \n", "For instance let's create a new session" ] }, { "cell_type": "code", "execution_count": 19, "id": "844047f9", "metadata": { "id": "844047f9" }, "outputs": [], "source": [ "spark.stop()\n", "spark = SparkSession \\\n", " .builder \\\n", " .appName(\"my_app\") \\\n", " .config(\"spark.driver.memory\", \"2g\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "id": "f3c250e0", "metadata": { "id": "f3c250e0" }, "source": [ "Show properties included in the Spark context" ] }, { "cell_type": "code", "execution_count": 20, "id": "9f25abd1", "metadata": { "id": "9f25abd1", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "5ea17f92-6d6c-4b5b-da23-91fe9ce36384" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[('spark.driver.extraJavaOptions',\n", " '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),\n", " ('spark.executor.id', 'driver'),\n", " ('spark.driver.host', 'c3218be40803'),\n", " ('spark.app.name', 'my_app'),\n", " ('spark.rdd.compress', 'True'),\n", " ('spark.executor.extraJavaOptions',\n", " '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),\n", " ('spark.driver.memory', '2g'),\n", " ('spark.serializer.objectStreamReset', '100'),\n", " ('spark.master', 'local[*]'),\n", " ('spark.submit.pyFiles', ''),\n", " ('spark.submit.deployMode', 'client'),\n", " ('spark.driver.port', '35255'),\n", " ('spark.app.startTime', '1668378747626'),\n", " ('spark.app.id', 'local-1668378747902'),\n", " ('spark.app.submitTime', '1668378734186'),\n", " ('spark.default.parallelism', '4'),\n", " ('spark.ui.showConsoleProgress', 'true')]" ] }, "metadata": {}, "execution_count": 20 } ], "source": [ "spark.sparkContext.getConf().getAll()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "colab": { "provenance": [], "collapsed_sections": [], "include_colab_link": true } }, "nbformat": 4, "nbformat_minor": 5 }