{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " \"ServiceX\"\n", " \"UT\n", " The new Python client library of ServiceX, the novel data delivery system\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

KyungEon Choi (UT Austin) for ServiceX team (IRIS-HEP)

\n", "\n", "

PyHEP 2024 (July 1, 2024)

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

ServiceX

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "ServiceX is a scalable data extraction, transformation and delivery system deployed in a Kubernetes cluster. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\"ServiceX\"

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "- Event data\n", " - ServiceX delivers from grid or remote XRootD storage to the user. Or more precisely ServiceX writes into an object store (ServiceX internal storage) and users download files or URLs from the object store as soon as available.\n", " - Thickness of arrows reflect the amount of data over a wire. ServiceX is NOT designed to download full data from grids. Transformers effectively reduce data that will be delivered to user based on a query for selection and filtering.\n", " - ServiceX is often co-located with a grid site to maximize network bandwith. XCache is preferable to allow much faster read for frequently accessed datasets.\n", "- Transformer\n", " - ServiceX consists of multiple microservices that are deployed as static K8s pod (always \"running\" state) but transformers are dynamically created via HPA (Horizontal Pod Scaling)\n", " - A transformer pod runs on a file at a time and number of transformer pods are scaled up and down depending on the number of input files in the dataset and other criteria.\n", "- ServiceX Request\n", " - ServiceX request(s) is(are) made from the SerivceX client libary to ServiceX Web API via HTTP request\n", " - A ServiceX request takes one input dataset (or list of files) and ServiceX is happily scale transformer pods automatically. A dataset with a single file should work but it's much more desirable to utilize HPA.\n", " - Users can make ServiceX request anywhere only with Python ServiceX client library and servicex.yaml includes an access token. Thus it's perfectly fine to deliver data to a university cluster or a laptop for small tests.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "

ServiceX Webpage

\n", "\n", "- Download a ServiceX configuration file (servicex.yaml) from the ServiceX website and copy to your home or working directory \n", "- NOTE: the ServiceX endpoint servicex.af.uchicago.edu is limited to the ATLAS users as it provides an access to the ATLAS event data\n", "\n", "

\"ServiceX

\n", "\n", "
\n", "\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

ServiceX Client library

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "ServiceX Client library is a python library for users to communicate with ServiceX backend (or server) to make delivery requests and handling of outputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "The most fundamental compenents of a ServiceX request\n", "1. Dataset\n", "1. Query - describe what a user wants to run in transformers\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "
\n", "\n", "Design goal of the new ServiceX Client library\n", "- Minimize boilerplates\n", "- Support YAML interface (integration of ServiceX DataBinder)\n", "- Strongly typed (pydantic)\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Installation
\n", "- pip install servicex==3.0.0.alpha.18" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "servicex 3.0.0a18\n" ] } ], "source": [ "# !pip install servicex==3.0.0.alpha.18\n", "!pip list | grep servicex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "I have downloaded my ServiceX configuration file (servicex.yaml) from the ServiceX webpage and installed servicex package\n", "
--> Ready to make a ServiceX request!\n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

First ServiceX request

\n", "\n", "\n", "Let's begin with the basic:
\n", "Deliver a branch (or column) from a dataset in the grid" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import servicex" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "spec = {\n", " \"Sample\":[{\n", " \"Name\": \"UprootRaw_PyHEP\",\n", " \"Dataset\": servicex.dataset.Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": servicex.query.UprootRaw({\"treename\": \"nominal\", \"filter_name\": \"el_pt\"})\n", " }]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "- One sample named \"UprootRaw_PyHEP\" is defined in the spec object.\n", "- A Rucio dataset is specified\n", "- Defined a `Query`, sent to transformers and run on all files in the given Rucio dataset\n", "- `UprootRaw` query takes `\"treename\"` to set `TTree` in flat ROOT ntuples and `\"filter_name\"` to select branches in a given tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Let's deliver my ServiceX request" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1903f515405840e4b6c000ca3e681969", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "o = servicex.deliver(spec)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(o['UprootRaw_PyHEP'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Returns a dictionary" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sample.Name: dict_keys(['UprootRaw_PyHEP'])\n", "\n", "Fileset: \n", "\n", "First file: /Users/kc43627/Work/data/servicex_cache/c9a57bae-b2c3-4432-93cd-253763e42ead/root___192.170.240.145__root___fax.mwt2.org_1094__pnfs_uchicago.edu_atlaslocalgroupdisk_rucio_user_mgeyik_a0_3c_user.mgeyik.30183079._000006.out.root\n", "\n" ] } ], "source": [ "print(f\"Sample.Name: {o.keys()}\\n\")\n", "print(f\"Fileset: {type(o['UprootRaw_PyHEP'])}\\n\")\n", "print(f\"First file: {(o['UprootRaw_PyHEP'][0])}\\n\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
[[3.86e+04, 3.6e+04],\n",
       " [3.44e+04, 1.91e+04],\n",
       " [],\n",
       " [5.98e+04, 5.76e+04],\n",
       " [6.84e+04, 2.4e+04],\n",
       " [3.5e+04],\n",
       " [],\n",
       " [],\n",
       " [1.34e+05, 4.36e+04],\n",
       " [],\n",
       " ...,\n",
       " [6.46e+04, 3.66e+04, 2.78e+04],\n",
       " [],\n",
       " [1.74e+04],\n",
       " [],\n",
       " [5.42e+04],\n",
       " [3.81e+04, 1.26e+04],\n",
       " [],\n",
       " [3.53e+04],\n",
       " [6.17e+04]]\n",
       "--------------------------------\n",
       "type: 11543 * var * float32
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import uproot\n", "\n", "with uproot.open(o['UprootRaw_PyHEP'][0]) as f:\n", " column = f['nominal']['el_pt']\n", "column.array()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Only few lines of a python script brings the data you want from the grid!\n", "\n", "

\n", "\n", "Let me go through what kinds of `Dataset` and `Query` are supported by ServiceX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Dataset

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "ServiceX supports Rucio, XRootD, and CERN OpenDataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "servicex.dataset.Rucio.__init__" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "servicex.dataset.FileList.__init__" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "servicex.dataset.CERNOpenData.__init__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Query

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
    \n", "
  • Query is a representation of what user wants from input dataset. e.g.
  • \n", "
      \n", "
    • UprootRaw({\"treename\": \"nominal\", \"filter_name\": \"el_pt\"})
    • \n", "
    \n", "
  • User provided query is translated into a code that runs on transformers
  • \n", "
  • Query is input data format dependent as a code for flat ROOT ntuple differs from the one for Apache parquet
  • \n", " \n", " \n", " \n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[EntryPoint(name='FuncADL_ATLASr21', value='servicex.func_adl.func_adl_dataset:FuncADLQuery_ATLASr21', group='servicex.query'),\n", " EntryPoint(name='FuncADL_ATLASr22', value='servicex.func_adl.func_adl_dataset:FuncADLQuery_ATLASr22', group='servicex.query'),\n", " EntryPoint(name='FuncADL_ATLASxAOD', value='servicex.func_adl.func_adl_dataset:FuncADLQuery_ATLASxAOD', group='servicex.query'),\n", " EntryPoint(name='FuncADL_CMS', value='servicex.func_adl.func_adl_dataset:FuncADLQuery_CMS', group='servicex.query'),\n", " EntryPoint(name='FuncADL_Uproot', value='servicex.func_adl.func_adl_dataset:FuncADLQuery_Uproot', group='servicex.query'),\n", " EntryPoint(name='PythonFunction', value='servicex.python_dataset:PythonQuery', group='servicex.query'),\n", " EntryPoint(name='UprootRaw', value='servicex.uproot_raw.uproot_raw:UprootRawQuery', group='servicex.query')]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "servicex.query.plugins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "Query classes for ROOT ntuples (via Uproot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "UprootRaw Query\n", "- This is a new query language, essentially calling `uproot.tree.arrays()` function\n", "- A UprootRaw query can be a dictionary or a list of dictionaries\n", "- There are two types of operations a user can put in a dictionary\n", " - query: contains a `treename` key\n", " - copy: contains a `copy_histograms` key" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "
\n",
    "        \n",
    "query = [\n",
    "         {\n",
    "          'treename': 'reco', \n",
    "          'filter_name': ['/mu.*/', 'runNumber', 'lbn', 'jet_pt_*'], \n",
    "          'cut':'(count_nonzero(jet_pt_NOSYS>40e3, axis=1)>=4)'\n",
    "         },\n",
    "         {\n",
    "          'copy_histograms': ['CutBookkeeper*', '/cflow.*/', 'metadata', 'listOfSystematics']\n",
    "         }\n",
    "        ]\n",
    "        \n",
    "    
\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "- More details on the grammar can be found [here](https://servicex-frontend.readthedocs.io/en/latest/transformer_matrix.html)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "query_UprootRaw = servicex.query.UprootRaw({\"treename\": \"nominal\", \"filter_name\": \"el_pt\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "\n", "FuncADL_Uproot Query\n", "- Functional Analysis Description Language is a powerful query language that has been supported by ServiceX\n", "- In addition to the basic operations like `Select()` for column selection or `Where()` for filtering, more sophisticated query can be built\n", "- One new addition `FromTree()` method to set a tree name in a query\n", "- More details can be found at the [talk](https://indico.cern.ch/event/1019958/timetable/#31-funcadl-functional-analysis) by M. Proffitt at PyHEP 2021" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "query_FuncADL = servicex.query.FuncADL_Uproot().FromTree('nominal').Select(lambda e: {'el_pt': e['el_pt']})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "\n", "PythonFunction Query\n", "- Python function can be passed as a query\n", "- `uproot`, `awkward`, `vector` can be imported (limited by the transformer image)\n", "- Primarily experimental purpose and likely to be discontinued" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def run_query(input_filenames=None):\n", " import uproot\n", " with uproot.open({input_filenames: \"nominal\"}) as o:\n", " br = o.arrays(\"el_pt\")\n", " return br\n", "\n", "query_PythonFunction = servicex.query.PythonFunction().with_uproot_function(run_query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "All three queries return the same output, ROOT files with selected branch el_pt_NOSYS!\n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Multiple samples

\n", "\n", "\n", "\n", "- HEP analysis often needs more than one sample" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "spec_multiple = {\n", " \"Sample\":[\n", " {\n", " \"Name\": \"UprootRaw_PyHEP\",\n", " \"Dataset\": servicex.dataset.Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": query_UprootRaw\n", " },\n", " {\n", " \"Name\": \"FuncADL_Uproot_PyHEP\",\n", " \"Dataset\": servicex.dataset.Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": query_FuncADL\n", " },\n", " {\n", " \"Name\": \"PythonFunction_PyHEP\",\n", " \"Dataset\": servicex.dataset.Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": query_PythonFunction\n", " }\n", " ]\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "- `Sample` block is a list of dictionaries, each with a `Dataset` - `Query` pair\n", "- Client library makes one ServiceX request per `Dataset` - `Query` pair\n", "- Again, it's preferred to have more files in a request to utilize K8s HPA than having multiple requests for the same query" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c5f7467d95024fe19a2cbb0595c6ebab", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "o_multiple = servicex.deliver(spec_multiple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

YAML interface

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "- It's cool to deliver only interested columns from grid storages in a Jupyter notebook, but real analysis often becomes quite messy\n", "- The new client library brings `servicex-databinder` and significantly improve user interface to allow a seamless experience with YAML" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing config_UprootRaw.yaml\n" ] } ], "source": [ "%%writefile -a config_UprootRaw.yaml\n", "\n", "Sample:\n", " - Name: Uproot_UprootRaw_YAML\n", " Dataset: !Rucio user.kchoi.pyhep2024.test_dataset\n", " Query: !UprootRaw |\n", " {\"treename\":\"nominal\", \"filter_name\": \"el_pt\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Compare with the one in this notebook" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "from servicex.dataset import Rucio\n", "from servicex.query import UprootRaw\n", "from servicex import deliver\n", "\n", "spec = {\n", " \"Sample\":[{\n", " \"Name\": \"UprootRaw_PyHEP\",\n", " \"Dataset\": Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": UprootRaw({\"treename\": \"nominal\", \"filter_name\": \"el_pt\"})\n", " }]\n", "}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from servicex import deliver" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "95809b3b4f854cec937c5c7b18c21d7a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "o_yaml = deliver(\"config_UprootRaw.yaml\")\n", "# o_py = deliver(spec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "YAML syntax\n", "- The exclamation mark(!) to declare dataset type and query type (see detail on the [PyYAML constructor](https://matthewpburruss.com/post/yaml/))\n", " - Dataset tags: `!Rucio`, `!Rucio`, `!FileList`, `!CERNOpenData`\n", " - Query tags: `!UprootRaw`, `!FuncADL_Uproot`, `!PythonFunction`\n", "- The pipe (`|`) after query tag represents the literal operator and allows to properly interpret multi-line string\n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Optional configurations

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "Definition:\n", " - &DEF_ggH_input \"root://eospublic.cern.ch//eos/opendata/atlas/OutreachDatasets\\\n", " /2020-01-22/4lep/MC/mc_345060.ggH125_ZZ4lep.4lep.root\"\n", "\n", " - &DEF_query1 !PythonFunction |\n", " def run_query(input_filenames=None):\n", " import uproot\n", "\n", " with uproot.open({input_filenames:\"nominal\"}) as o:\n", " br = o.arrays(\"mu_pt\")\n", " return br\n", "\n", " - &DEF_query2 !FuncADL_Uproot |\n", " FromTree('mini').Select(lambda e: {'lep_pt': e['lep_pt']}).Where(lambda e: e['lep_pt'] > 1000)\n", "\n", "General:\n", " OutputFormat: parquet\n", " Delivery: SignedURLs\n", "\n", "Sample:\n", " - Name: ttH\n", " Dataset: !Rucio user.kchoi.fcnc_tHq_ML.ttH.v11\n", " Query: *DEF_query1\n", " NFiles: 5\n", " # IgnoreLocalCache: False\n", "\n", " - Name: ttZ\n", " Dataset: !Rucio user.kchoi.fcnc_tHq_ML.ttZ.v11 \n", " Query: *DEF_query1\n", " NFiles: 3\n", "\n", " - Name: ggH\n", " Dataset: !FileList *DEF_ggH_input\n", " Query: *DEF_query2\n", "```\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Failed transformation

" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "spec_typo = {\n", " \"Sample\":[{\n", " \"Name\": \"UprootRaw_PyHEP\",\n", " \"Dataset\": Rucio(\"user.kchoi.pyhep2024.test_dataset\"),\n", " \"Query\": UprootRaw({\"treename\": \"nominal\", \"filter_name\": \"el_pta\"})\n", " }]\n", "}" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "60b9c1ea32da4ec496ecb1785eb75074", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
[07/01/24 00:13:17] WARNING  Transform \"UprootRaw_PyHEP\" completed with failures: 3/3 files       query_core.py:215\n",
       "                             failed                                                                                \n",
       "
\n" ], "text/plain": [ "\u001b[2;36m[07/01/24 00:13:17]\u001b[0m\u001b[2;36m \u001b[0m\u001b[31mWARNING \u001b[0m Transform \u001b[32m\"UprootRaw_PyHEP\"\u001b[0m completed with failures: \u001b[1;36m3\u001b[0m/\u001b[1;36m3\u001b[0m files \u001b]8;id=567884;file:///opt/miniconda3/envs/pyhep/lib/python3.10/site-packages/servicex/query_core.py\u001b\\\u001b[2mquery_core.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=82617;file:///opt/miniconda3/envs/pyhep/lib/python3.10/site-packages/servicex/query_core.py#215\u001b\\\u001b[2m215\u001b[0m\u001b]8;;\u001b\\\n", "\u001b[2;36m \u001b[0m failed \u001b[2m \u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
                    WARNING  More information of 'UprootRaw_PyHEP' HERE                           query_core.py:226\n",
       "
\n" ], "text/plain": [ "\u001b[2;36m \u001b[0m\u001b[2;36m \u001b[0m\u001b[31mWARNING \u001b[0m More information of \u001b[32m'UprootRaw_PyHEP'\u001b[0m \u001b]8;id=479336;https://atlas-kibana.mwt2.org:5601/s/servicex/app/dashboards?auth_provider_hint=anonymous1#/view/6d069520-f34e-11ed-a6d8-9f6a16cd6d78?embed=true&_g=(time:(from:now-30d%2Fd,to:now))&_a=(filters:!((query:(match_phrase:(requestId:'ab736b6d-d3d9-439a-8734-ed3ba9276540'))),(query:(match_phrase:(level:'error')))))&show-time-filter=true&hide-filter-bar=true\u001b\\\u001b[1;31;47mHERE\u001b[0m\u001b]8;;\u001b\\ \u001b]8;id=178926;file:///opt/miniconda3/envs/pyhep/lib/python3.10/site-packages/servicex/query_core.py\u001b\\\u001b[2mquery_core.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=210571;file:///opt/miniconda3/envs/pyhep/lib/python3.10/site-packages/servicex/query_core.py#226\u001b\\\u001b[2m226\u001b[0m\u001b]8;;\u001b\\\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "o = deliver(spec_typo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Future plans" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "\n", "Client library\n", "- Improve robustness: progress bar (transform status/object store access) and local caching\n", "- Readthedoc of the new ServiceX cilent library is under construction! https://servicex-frontend.readthedocs.io/en/latest/index.html\n", "\n", "ServiceX\n", "- Improve stability and robustness of ServiceX especially what we learned during 200Gbps challenge (hundreds of ServiceX requests on hundreds of TB datasets)\n", "- Server-side caching\n", "- Other ServiceX transformers: ATLAS TopCPToolkit transformer, column-join transformer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 4 }