{ "cells": [ { "cell_type": "markdown", "id": "343c7b20", "metadata": {}, "source": [ "## A JSON format compatible with the pandas data structure \n", "------- \n", "\n", "### NTV-pandas\n", "To have a simple, compact and reversible solution, the interface uses the [JSON-NTV format (Named and Typed Value)](https://github.com/loco-philippe/NTV#readme) - which integrates the notion of type - and its JSON-TAB variation for tabular data. \n", "This solution allows to include a large number of types (not necessarily Pandas dtype). \n", "JSON-TAB is also applicable for multidimensional data (e.g. Xarray).\n", "\n", "### Content\n", "This NoteBook uses examples to present some key points\n", "\n", "*(active link on jupyter Notebook or Nbviewer)*\n", "- [1 - Simple example](#1---Simple-example)\n", "- [2 - Series](#2---Series)\n", " - [Simple example](#Simple-example)\n", " - [Typed example](#Typed-example)\n", " - [Examples with a non-Pandas type](#Examples-with-a-non-Pandas-type)\n", " - [Categorical examples](#Categorical-examples)\n", "- [3 - DataFrame](#3---DataFrame)\n", " - [Initial example](#Initial-example)\n", " - [Complete example](#Complete-example)\n", " - [Json data can be annotated](#Json-data-can-be-annotated)\n", " - [Categorical data can be included](#Categorical-data-can-be-included)\n", " - [Multidimensional data](#Multidimensional-data)\n", "- [Appendix : Series tests](#Appendix-:-Series-tests) \n", " \n", "### References\n", "- [JSON-NTV specification](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm)\n", "- [JSON-TAB specification](https://loco-philippe.github.io/ES/NTV%20tabular%20format%20(NTV-TAB).htm)\n", "- [JSON-NTV classes and methods](https://loco-philippe.github.io/NTV/json_ntv.html)\n", "\n", "This Notebook can also be viewed at [nbviewer](http://nbviewer.org/github/loco-philippe/ntv-pandas/tree/main/example)" ] }, { "cell_type": "code", "execution_count": 1, "id": "d00a0cb6", "metadata": {}, "outputs": [], "source": [ "import math\n", "import json\n", "from pprint import pprint\n", "\n", "import pandas as pd\n", "import ntv_pandas as npd\n", "from shapely.geometry import Point, Polygon\n", "from json_ntv import Ntv\n", "from datetime import date, datetime, time" ] }, { "cell_type": "markdown", "id": "6834381c", "metadata": {}, "source": [ "## 1 - Simple example" ] }, { "cell_type": "markdown", "id": "2ccd4147", "metadata": {}, "source": [ "- The example is a Dataframe with several dtype" ] }, { "cell_type": "code", "execution_count": 2, "id": "87e08572", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dates::datevalidvalue32::monthcoord::pointnamesunique::year
index
1001964-01-01True121POINT (1 2)john2021
2001985-02-05False122POINT (3 4)eric2021
3002022-01-21True221POINT (5 6)judith2021
4001964-01-01True221POINT (7 8)mila2021
5001985-02-05False322POINT (3 4)hector2021
6002022-01-21True321POINT (5 6)maria2021
\n", "
" ], "text/plain": [ " dates::date valid value32 ::month coord::point names unique::year\n", "index \n", "100 1964-01-01 True 12 1 POINT (1 2) john 2021\n", "200 1985-02-05 False 12 2 POINT (3 4) eric 2021\n", "300 2022-01-21 True 22 1 POINT (5 6) judith 2021\n", "400 1964-01-01 True 22 1 POINT (7 8) mila 2021\n", "500 1985-02-05 False 32 2 POINT (3 4) hector 2021\n", "600 2022-01-21 True 32 1 POINT (5 6) maria 2021" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tab_data = {'index': [100, 200, 300, 400, 500, 600],\n", " 'dates::date': pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), \n", " date(1985,2,5), date(2022,1,21)], dtype='category'), \n", " 'valid': [True, False, True, True, False, True],\n", " 'value32': pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),\n", " '::month': [1, 2, 1, 1, 2, 1],\n", " 'coord::point': pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),\n", " 'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),\n", " 'unique::year': 2021 }\n", "df = pd.DataFrame(tab_data).set_index('index')\n", "df" ] }, { "cell_type": "markdown", "id": "47d48540", "metadata": {}, "source": [ "- the example has a simple and compact JSON representation including dtype" ] }, { "cell_type": "code", "execution_count": 3, "id": "c37108e8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{':tab': {'::month': [1, 2, 1, 1, 2, 1],\n", " 'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n", " 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n", " 'index': [100, 200, 300, 400, 500, 600],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'unique:year': 2021,\n", " 'valid': [True, False, True, True, False, True],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n" ] } ], "source": [ "df_to_json = df.npd.to_json()\n", "pprint(df_to_json, width=120)" ] }, { "cell_type": "markdown", "id": "7deb4ffb", "metadata": {}, "source": [ "- The json conversion is reversible : df_from_json equals initial df" ] }, { "cell_type": "code", "execution_count": 4, "id": "0e9736c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "df created from JSON-NTV is equal to initial df ? True\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dates::datevalidvalue32::monthcoord::pointnamesunique::year
1001964-01-01True121POINT (1 2)john2021
2001985-02-05False122POINT (3 4)eric2021
3002022-01-21True221POINT (5 6)judith2021
4001964-01-01True221POINT (7 8)mila2021
5001985-02-05False322POINT (3 4)hector2021
6002022-01-21True321POINT (5 6)maria2021
\n", "
" ], "text/plain": [ " dates::date valid value32 ::month coord::point names unique::year\n", "100 1964-01-01 True 12 1 POINT (1 2) john 2021\n", "200 1985-02-05 False 12 2 POINT (3 4) eric 2021\n", "300 2022-01-21 True 22 1 POINT (5 6) judith 2021\n", "400 1964-01-01 True 22 1 POINT (7 8) mila 2021\n", "500 1985-02-05 False 32 2 POINT (3 4) hector 2021\n", "600 2022-01-21 True 32 1 POINT (5 6) maria 2021" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_from_json = npd.read_json(df_to_json)\n", "print('df created from JSON-NTV is equal to initial df ? ', df_from_json.equals(df))\n", "df_from_json" ] }, { "cell_type": "markdown", "id": "ca787e67", "metadata": {}, "source": [ "## 2 - Series" ] }, { "cell_type": "markdown", "id": "14204c2d", "metadata": {}, "source": [ "### Simple example" ] }, { "cell_type": "code", "execution_count": 5, "id": "62319579", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 1\n", "1 2\n", "2 3\n", "Name: value, dtype: int64\n", "\n", "Json representation : \n", " {':field': {'value': [1, 2, 3]}}\n", "\n", "is Json translation reversible ? True\n", "\n", "is pandas translation reversible ? True\n" ] } ], "source": [ "field_data = {'value': [1, 2, 3]}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n", "print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))" ] }, { "cell_type": "markdown", "id": "954c6e65", "metadata": {}, "source": [ "### Typed example" ] }, { "cell_type": "code", "execution_count": 6, "id": "a556868a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 1964-01-01\n", "1 1985-02-05\n", "2 2022-01-21\n", "Name: dates, dtype: datetime64[ns]\n", "\n", "Json representation : \n", " {':field': {'dates::datetime': ['1964-01-01T00:00:00.000', '1985-02-05T00:00:00.000', '2022-01-21T00:00:00.000']}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "field_data = {'dates::datetime': ['1964-01-01', '1985-02-05', '2022-01-21']}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))" ] }, { "cell_type": "markdown", "id": "bbef79a9", "metadata": {}, "source": [ "### Examples with a non-Pandas type" ] }, { "cell_type": "code", "execution_count": 7, "id": "765f0396", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 1964-01-01\n", "1 1985-02-05\n", "2 2022-01-21\n", "Name: dates::date, dtype: object\n", "\n", "Json representation : \n", " {':field': {'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21']}}\n", "\n", "is Json translation reversible ? True\n", "\n", "is pandas translation reversible ? True\n" ] } ], "source": [ "field_data = {'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21']}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n", "print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))" ] }, { "cell_type": "code", "execution_count": 8, "id": "327c67a9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 POINT (1 2)\n", "1 POINT (3 4)\n", "2 POINT (5 6)\n", "Name: coord::point, dtype: object\n", "\n", "Json representation : \n", " {':field': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "field_data = {'coord::point': [[1,2], [3,4], [5,6]]}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))" ] }, { "cell_type": "markdown", "id": "90a68b81", "metadata": {}, "source": [ "### Categorical examples\n", "- available only with hashable data" ] }, { "cell_type": "code", "execution_count": 9, "id": "6ba57f0a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 1\n", "1 2\n", "2 2\n", "3 1\n", "Name: integer, dtype: category\n", "Categories (2, Int64): [1, 2]\n", "\n", "Json representation : \n", " {':field': {'integer': [[1, 2], [0, 1, 1, 0]]}}\n", "\n", "is Json translation reversible ? True\n", "\n", "is pandas translation reversible ? True\n" ] } ], "source": [ "field_data = {\"integer\": [[1, 2], [0, 1, 1, 0]]}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n", "print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))" ] }, { "cell_type": "code", "execution_count": 10, "id": "7e45d94d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 1964-01-01\n", "1 1985-02-05\n", "2 1964-01-01\n", "3 2022-01-21\n", "Name: dates::date, dtype: category\n", "Categories (3, object): [1964-01-01, 1985-02-05, 2022-01-21]\n", "\n", "Json representation : \n", " {':field': {'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 0, 2]]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "field_data = {'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 0, 2]]}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))" ] }, { "cell_type": "code", "execution_count": 11, "id": "a4a756e7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas object :\n", "0 (1, 2)\n", "1 (3, 4)\n", "2 (1, 2)\n", "3 (5, 6)\n", "Name: test_array::array, dtype: category\n", "Categories (3, object): [(1, 2), (3, 4), (5, 6)]\n", "\n", "Json representation : \n", " {':field': {'test_array': [{'::array': [(1, 2), (3, 4), (5, 6)]}, [0, 1, 0, 2]]}}\n", "\n", "is Json translation reversible ? True\n", "\n", "is pandas translation reversible ? True\n" ] } ], "source": [ "field_data = {'test_array': [{'::array': [[1,2], [3,4], [5,6]]}, [0, 1, 0, 2]]}\n", "sr = npd.read_json({':field': field_data})\n", "# pandas dtype conform to Ntv type\n", "print('pandas object :\\n' + str(sr))\n", "print('\\nJson representation : \\n ', sr.npd.to_json())\n", "print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n", "print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))" ] }, { "cell_type": "markdown", "id": "38335df3", "metadata": {}, "source": [ "## 3 - DataFrame" ] }, { "cell_type": "markdown", "id": "fb4a7b88", "metadata": {}, "source": [ "### Initial example" ] }, { "cell_type": "code", "execution_count": 12, "id": "ac890bcb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas dtype :\n", "A object\n", "B object\n", "dtype: object\n", "\n", "pandas object :\n", " A B\n", "0 a b\n", "1 b c\n", "2 c c\n", "3 a d\n", "\n", "Json representation : \n", " {':tab': {'index': [0, 1, 2, 3], 'A': ['a', 'b', 'c', 'a'], 'B': ['b', 'c', 'c', 'd']}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "df = pd.DataFrame({\"A\": list(\"abca\"), \"B\": list(\"bccd\")})\n", "\n", "print('pandas dtype :\\n' + str(df.dtypes))\n", "print('\\npandas object :\\n' + str(df))\n", "print('\\nJson representation : \\n ', df.npd.to_json())\n", "print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))" ] }, { "cell_type": "markdown", "id": "c04430e6", "metadata": {}, "source": [ "### Complete example\n", "- index data\n", "- Pandas dtype (int32, bool, string)\n", "- NTV type (date, point) -> object dtype\n", "- data unique" ] }, { "cell_type": "code", "execution_count": 13, "id": "fe6c69cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas dtype :\n", "dates::date object\n", "value int64\n", "value32 int32\n", "res int64\n", "coord::point object\n", "names string[python]\n", "unique bool\n", "dtype: object\n", "\n", "pandas object :\n", " dates::date value value32 res coord::point names unique\n", "100 1964-01-01 10 12 10 POINT (1 2) john True\n", "200 1985-02-05 10 12 20 POINT (3 4) eric True\n", "300 2022-01-21 20 22 30 POINT (5 6) judith True\n", "400 1964-01-01 20 22 10 POINT (7 8) mila True\n", "500 1985-02-05 30 32 20 POINT (3 4) hector True\n", "600 2022-01-21 30 32 30 POINT (5 6) maria True\n", "\n", "Json representation :\n", "{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n", " 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],\n", " 'index': [100, 200, 300, 400, 500, 600],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'res': [10, 20, 30, 10, 20, 30],\n", " 'unique': True,\n", " 'value': [10, 10, 20, 20, 30, 30],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "tab_data = {'index': [100, 200, 300, 400, 500, 600],\n", " 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], \n", " 'value': [10, 10, 20, 20, 30, 30],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32],\n", " 'res': [10, 20, 30, 10, 20, 30],\n", " 'coord::point': [[1,2], [3,4], [5,6], [7,8], [3,4], [5,6]],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'unique': True }\n", "df = npd.read_json({':tab': tab_data})\n", "print('pandas dtype :\\n' + str(df.dtypes))\n", "print('\\npandas object :\\n' + str(df))\n", "print('\\nJson representation :')\n", "pprint(df.npd.to_json(), width=140)\n", "print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))" ] }, { "cell_type": "markdown", "id": "513dd337", "metadata": {}, "source": [ "### Json data can be annotated" ] }, { "cell_type": "code", "execution_count": 14, "id": "cdda0ce3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "is DataFrame identical ? True\n" ] } ], "source": [ "tab_data = {'index': [100, 200, 300, 400, 500, 600],\n", " 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], \n", " 'value': [10, 10, 20, 20, {'valid?': 30}, 30],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32],\n", " 'res': {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},\n", " 'coord::point': [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'unique': True }\n", "\n", "df2 = npd.read_json({':tab': tab_data}, annotated=True)\n", "print('is DataFrame identical ? ', df.equals(df2))" ] }, { "cell_type": "markdown", "id": "82017e47", "metadata": {}, "source": [ "### Categorical data can be included" ] }, { "cell_type": "code", "execution_count": 15, "id": "22b7a281", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas dtype :\n", "A category\n", "B category\n", "dtype: object\n", "\n", "pandas object :\n", " A B\n", "0 a b\n", "1 b c\n", "2 c c\n", "3 a d\n", "\n", "Json representation : \n", " {':tab': {'index': [0, 1, 2, 3], 'A': [['a', 'b', 'c'], [0, 1, 2, 0]], 'B': [['b', 'c', 'd'], [0, 1, 1, 2]]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "df = pd.DataFrame({\"A\": list(\"abca\"), \"B\": list(\"bccd\")}, dtype=\"category\")\n", "\n", "print('pandas dtype :\\n' + str(df.dtypes))\n", "print('\\npandas object :\\n' + str(df))\n", "print('\\nJson representation : \\n ', df.npd.to_json())\n", "print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))" ] }, { "cell_type": "code", "execution_count": 16, "id": "d7eb51ea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas dtype :\n", "dates::date category\n", "value category\n", "value32 int32\n", "res int64\n", "coord::point object\n", "names string[python]\n", "unique::boolean bool\n", "dtype: object\n", "\n", "pandas object :\n", " dates::date value value32 res coord::point names unique::boolean\n", "100 1964-01-01 10 12 10 POINT (1 2) john True\n", "200 1985-02-05 10 12 20 POINT (3 4) eric True\n", "300 2022-01-21 20 22 30 POINT (5 6) judith True\n", "400 1964-01-01 20 22 10 POINT (7 8) mila True\n", "500 1985-02-05 30 32 20 POINT (3 4) hector True\n", "600 2022-01-21 30 32 30 POINT (5 6) maria True\n", "\n", "Json representation :\n", "{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n", " 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n", " 'index': [100, 200, 300, 400, 500, 600],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'res': [10, 20, 30, 10, 20, 30],\n", " 'unique:boolean': True,\n", " 'value': [[10, 20, 30], [2]],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "tab_data = {'index': [100, 200, 300, 400, 500, 600],\n", " 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 2, 0, 1, 2]],\n", " 'value': [[10, 20, {'valid?': 30}], [0, 0, 1, 1, 2, 2]],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32],\n", " 'res': {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},\n", " 'coord::point': [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'unique:boolean': True }\n", "\n", "df = npd.read_json({':tab': tab_data}, annotated=True)\n", "print('pandas dtype :\\n' + str(df.dtypes))\n", "print('\\npandas object :\\n' + str(df))\n", "print('\\nJson representation :')\n", "pprint(df.npd.to_json(), width=140)\n", "print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))" ] }, { "cell_type": "code", "execution_count": 17, "id": "a3565bed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pandas dtype :\n", "dates::date category\n", "value category\n", "value32 int32\n", "coord::point object\n", "names string[python]\n", "unique bool\n", "dtype: object\n", "\n", "pandas object :\n", " dates::date value value32 coord::point names unique\n", "100 1964-01-01 10 12 POINT (1 2) john True\n", "200 1985-02-05 10 12 POINT (3 4) eric True\n", "300 2022-01-21 20 22 POINT (5 6) judith True\n", "400 1964-01-01 20 22 POINT (7 8) mila True\n", "500 1985-02-05 30 32 POINT (3 4) hector True\n", "600 2022-01-21 30 32 POINT (5 6) maria True\n", "\n", "Json representation :\n", "{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n", " 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n", " 'index': [100, 200, 300, 400, 500, 600],\n", " 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n", " 'unique': True,\n", " 'value': [[10, 20, 30], [2]],\n", " 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n", "\n", "is Json translation reversible ? True\n" ] } ], "source": [ "index = pd.Series([100, 200, 300, 400, 500, 600])\n", "dates = pd.Series(name='dates::date', data=[date(1964, 1, 1), date(1985, 2, 5), date(2022, 1, 21), date(1964, 1, 1),\n", " date(1985, 2, 5), date(2022, 1, 21)], dtype='object').astype('category')\n", "value = pd.Series(name='value', data=[10,10,20,20,30,30], dtype='Int64').astype('category') #alias mandatory \n", "value32 = pd.Series(name='value32', data=[12, 12, 22, 22, 32, 32], dtype='int32')\n", "coord = pd.Series(name='coord::point', data=[Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)])\n", "names = pd.Series(name='names', data=['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string')\n", "unique = pd.Series(name='unique', data=[True, True, True, True, True, True])\n", "\n", "df = pd.DataFrame({ser.name: ser for ser in [index, dates, value, value32, coord, names, unique]}).set_index(None)\n", "\n", "print('pandas dtype :\\n' + str(df.dtypes))\n", "print('\\npandas object :\\n' + str(df))\n", "print('\\nJson representation :')\n", "pprint(df.npd.to_json(), width=140)\n", "print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))" ] }, { "cell_type": "markdown", "id": "f23d9221", "metadata": {}, "source": [ "### Multidimensional data\n", "- JSON-TAB format is applicable for multidimensional data\n", "- JSON multi-dimensional data can be translated into a Pandas Dataframe or a DataArray Xarray" ] }, { "cell_type": "code", "execution_count": 18, "id": "270354b3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantityproductplantsprice
21 kgapplefruit1.0
01 kgbananafruit0.5
11 kgorangefruit2.0
31 kgpeppersvegetable1.5
610 kgapplefruit10.0
410 kgbananafruit5.0
510 kgorangefruit20.0
710 kgpeppersvegetable15.0
\n", "
" ], "text/plain": [ " quantity product plants price\n", "2 1 kg apple fruit 1.0\n", "0 1 kg banana fruit 0.5\n", "1 1 kg orange fruit 2.0\n", "3 1 kg peppers vegetable 1.5\n", "6 10 kg apple fruit 10.0\n", "4 10 kg banana fruit 5.0\n", "5 10 kg orange fruit 20.0\n", "7 10 kg peppers vegetable 15.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\"quantity\": [\"1 kg\", \"1 kg\", \"1 kg\", \"1 kg\", \"10 kg\", \"10 kg\", \"10 kg\", \"10 kg\"],\n", " \"product\": [\"banana\", \"orange\", \"apple\", \"peppers\", \"banana\", \"orange\", \"apple\", \"peppers\"], \n", " \"plants\": [\"fruit\", \"fruit\", \"fruit\", \"vegetable\", \"fruit\", \"fruit\", \"fruit\", \"vegetable\"], \n", " \"price\": [0.5, 2, 1, 1.5, 5, 20, 10, 15]}\n", "\n", "df = pd.DataFrame(data)\n", "df2 = pd.DataFrame(data, dtype='category').sort_values(by=['quantity', 'product'])\n", "df2" ] }, { "cell_type": "code", "execution_count": 19, "id": "40d718be", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "json_df is the JSON-TAB format with \"full\" mode\n", "\n", "{'index': [0, 1, 2, 3, 4, 5, 6, 7],\n", " 'plants': ['fruit', 'fruit', 'fruit', 'vegetable', 'fruit', 'fruit', 'fruit', 'vegetable'],\n", " 'price': [0.5, 2.0, 1.0, 1.5, 5.0, 20.0, 10.0, 15.0],\n", " 'product': ['banana', 'orange', 'apple', 'peppers', 'banana', 'orange', 'apple', 'peppers'],\n", " 'quantity': ['1 kg', '1 kg', '1 kg', '1 kg', '10 kg', '10 kg', '10 kg', '10 kg']}\n", "\n", "json_xa is the JSON-TAB format with \"optimize\" mode\n", "\n", "{'index': [2, 0, 1, 3, 6, 4, 5, 7],\n", " 'plants': [['fruit', 'vegetable'], [0, 0, 0, 1, 0, 0, 0, 1]],\n", " 'price': [[0.5, 1.0, 1.5, 2.0, 5.0, 10.0, 15.0, 20.0], [1, 0, 3, 2, 5, 4, 7, 6]],\n", " 'product': [['apple', 'banana', 'orange', 'peppers'], [1]],\n", " 'quantity': [['1 kg', '10 kg'], [4]]}\n", "\n", "DataFrame from the two JSON-TAB format are identical ? True\n", "\n", "The \"optimize\" JSON-TAB format is the image of the DataArray Xarray\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'price' (quantity: 2, product: 4)> Size: 64B\n",
       "array([[0.5, 2.0, 1.0, 1.5],\n",
       "       [5.0, 20.0, 10.0, 15.0]], dtype=object)\n",
       "Coordinates:\n",
       "  * quantity  (quantity) object 16B '1 kg' '10 kg'\n",
       "  * product   (product) object 32B 'banana' 'orange' 'apple' 'peppers'\n",
       "    plants    (product) object 32B 'fruit' 'fruit' 'fruit' 'vegetable'
" ], "text/plain": [ " Size: 64B\n", "array([[0.5, 2.0, 1.0, 1.5],\n", " [5.0, 20.0, 10.0, 15.0]], dtype=object)\n", "Coordinates:\n", " * quantity (quantity) object 16B '1 kg' '10 kg'\n", " * product (product) object 32B 'banana' 'orange' 'apple' 'peppers'\n", " plants (product) object 32B 'fruit' 'fruit' 'fruit' 'vegetable'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "json_df = Ntv.obj(df).to_obj()[':tab']\n", "print('json_df is the JSON-TAB format with \"full\" mode\\n')\n", "pprint(json_df, width=200)\n", "\n", "json_xar = Ntv.obj(df2).to_obj()[':tab']\n", "print('\\njson_xa is the JSON-TAB format with \"optimize\" mode\\n')\n", "pprint(json_xar, width=200)\n", "\n", "df_from_xar = Ntv.obj({':tab': json_xar}).to_obj(format='obj').sort_index()\n", "print('\\nDataFrame from the two JSON-TAB format are identical ? ', df.astype('object').equals(df_from_xar.astype('object')))\n", "\n", "print('\\nThe \"optimize\" JSON-TAB format is the image of the DataArray Xarray')\n", "from tab_dataset import Sdataset\n", "Sdataset.ntv(json_df).setcanonorder().to_xarray(varname='price')" ] }, { "cell_type": "markdown", "id": "57322f1d", "metadata": {}, "source": [ "## Appendix : Series tests" ] }, { "cell_type": "code", "execution_count": 20, "id": "5f07553d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True {':field': [{'a': 2, 'e': 4}, {'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}\n", "True {':field': [[1, 2], [3, 4], [5, 6]]}\n", "True {':field': [[1, 2], [3, 4], {'a': 3, 'e': 5}]}\n", "True {':field': [True, False, True]}\n", "True {':field': ['az', 'er', 'cd']}\n", "True {':field': ['az', 'az', 'az']}\n", "True {':field': [1, 2, 3]}\n", "True {':field': [1.1, 2.0, 3.0]}\n", "True {':field': [10, 20, 30]}\n", "True {':field': [True, False, True]}\n", "True {':field': [1.1, 2.0, 3.0]}\n", "True {':field': {'::datetime': [None, None, None]}}\n", "True {':field': {'::datetime': ['2022-01-01T00:00:00.000', '2022-01-02T00:00:00.000']}}\n", "True {':field': {'::duration': ['P1DT0H0M0S', 'P2DT0H0M0S']}}\n", "True {':field': {'::string': ['az', 'er', 'cd']}}\n", "True {':field': {'::int32': [1, 2, 3]}}\n", "True {':field': {'::uint64': [1, 2, 3]}}\n", "True {':field': {'::float32': [1.0, 2.0, 3.0]}}\n", "True {':field': {'::int64': [1, 2, 3]}}\n", "True {':field': {'::float64': [1.0, 2.0, 3.0]}}\n", "True {':field': {'::array': [[1, 2], [3, 4], [5, 6]]}}\n", "True {':field': {'::object': [{'a': 2, 'e': 4}, {'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}}\n", "True {':field': {'::null': [None, None, None]}}\n", "True {':field': {'::uri': ['geo:13.412 ,103.866', 'mailto:John.Doe@example.com']}}\n", "True {':field': {'::file': ['///path/to/file', '//host.example.com/path/to/file']}}\n", "True {':field': {'::date': ['2022-01-01', '2022-01-02']}}\n", "True {':field': {'::time': ['10:21:01', '08:01:02']}}\n", "True {':field': {'::day': [1, 2, 3]}}\n", "True {':field': {'::year': [2001, 2002, 2003]}}\n", "True {':field': {'::minute': [21, 10, 55]}}\n", "True {':field': {'::point': [[1.0, 0.0], [1.0, 1.0], [1.0, 2.0]]}}\n" ] } ], "source": [ "# json interface ok\n", "srs = [\n", " # without ntv_type, without dtype\n", " pd.Series([{'a': 2, 'e':4}, {'a': 3, 'e':5}, {'a': 4, 'e':6}]), \n", " pd.Series([[1,2], [3,4], [5,6]]), \n", " pd.Series([[1,2], [3,4], {'a': 3, 'e':5}]), \n", " pd.Series([True, False, True]),\n", " pd.Series(['az', 'er', 'cd']),\n", " pd.Series(['az', 'az', 'az']),\n", " pd.Series([1,2,3]),\n", " pd.Series([1.1,2,3]),\n", " \n", " # without ntv_type, with dtype\n", " pd.Series([10,20,30], dtype='Int64'),\n", " pd.Series([True, False, True], dtype='boolean'),\n", " pd.Series([1.1, 2, 3], dtype='float64'), \n", "\n", " # with ntv_type only in json data (not numbers)\n", " pd.Series([pd.NaT, pd.NaT, pd.NaT]),\n", " pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2)], dtype='datetime64[ns]'),\n", " pd.Series(pd.to_timedelta(['1D', '2D'])),\n", " pd.Series(['az', 'er', 'cd'], dtype='string'), \n", "\n", " # with ntv_type only in json data (numbers)\n", " pd.Series([1,2,3], dtype='Int32'), \n", " pd.Series([1,2,3], dtype='UInt64'),\n", " pd.Series([1,2,3], dtype='float32'),\n", "\n", " # with ntv_type in Series name and in json data (numbers)\n", " pd.Series([1,2,3], name='::int64'),\n", " pd.Series([1,2,3], dtype='Float64', name='::float64'), # force dtype dans la conversion json\n", "\n", " # with ntv_type in Series name and in json data (not numbers)\n", " pd.Series([[1,2], [3,4], [5,6]], name='::array'), \n", " pd.Series([{'a': 2, 'e':4}, {'a': 3, 'e':5}, {'a': 4, 'e':6}], name='::object'), \n", " pd.Series([None, None, None], name='::null'), \n", " pd.Series([\"geo:13.412 ,103.866\", \"mailto:John.Doe@example.com\"], name='::uri', dtype='string'),\n", " pd.Series([\"///path/to/file\", \"//host.example.com/path/to/file\"], name='::file', dtype='string'),\n", "\n", " # with ntv_type converted in object dtype (not in datetime)\n", " pd.Series([date(2022, 1, 1), date(2022, 1, 2)], name='::date'),\n", " pd.Series([time(10, 21, 1), time(8, 1, 2)], name='::time'),\n", "\n", " # with ntv_type unknown in pandas and with pandas conversion \n", " pd.Series([1,2,3], dtype='int64', name='::day'),\n", " pd.Series([2001,2002,2003], dtype='int64', name='::year'),\n", " pd.Series([21,10,55], name='::minute'),\n", "\n", " # with ntv_type unknown in pandas and NTV conversion\n", " pd.Series([Point(1, 0), Point(1, 1), Point(1, 2)], name='::point'),\n", "]\n", "for sr in srs:\n", " print(sr.npd.equals(npd.read_json(sr.npd.to_json())), sr.npd.to_json()) " ] }, { "cell_type": "code", "execution_count": 21, "id": "090ad57b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True {':field': {'test::int32': [1, 2, 3]}}\n", "True {':field': {'test': [1, 2, 3]}}\n", "True {':field': [1.0, 2.1, 3.0]}\n", "True {':field': ['er', 'et', 'ez']}\n", "True {':field': [True, False, True]}\n", "True {':field': {'::boolean': [True, False, True]}}\n", "True {':field': {'::string': ['er', 'et', 'ez']}}\n", "True {':field': {'test::float32': [1.0, 2.5, 3.0]}}\n", "True {':field': {'::int64': [1, 2, 3]}}\n", "True {':field': {'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}}\n", "True {':field': {'::date': ['2021-12-31', '2022-01-01']}}\n", "True {':field': {'::time': ['23:00:00', '23:01:00']}}\n", "True {':field': {'::object': [{'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}}\n", "True {':field': {'::array': [[1, 2], [3, 4], [5, 6]]}}\n", "True {':field': True}\n", "True {':field': {':boolean': True}}\n" ] } ], "source": [ "# json interface ok\n", "for a in [{'test::int32': [1,2,3]},\n", " {'test': [1,2,3]},\n", " [1.0, 2.1, 3.0],\n", " ['er', 'et', 'ez'],\n", " [True, False, True],\n", " {'::boolean': [True, False, True]},\n", " {'::string': ['er', 'et', 'ez']},\n", " {'test::float32': [1.0, 2.5, 3.0]},\n", " {'::int64': [1,2,3]},\n", " {'::datetime': [\"2021-12-31T23:00:00.000\",\"2022-01-01T23:00:00.000\"] },\n", " {'::date': [\"2021-12-31\", \"2022-01-01\"] },\n", " {'::time': [\"23:00:00\", \"23:01:00\"] },\n", " {'::object': [{'a': 3, 'e':5}, {'a': 4, 'e':6}]},\n", " {'::array': [[1,2], [3,4], [5,6]]},\n", " True,\n", " {':boolean': True}\n", " ]:\n", " field = {':field': a}\n", " print(npd.read_json(field).npd.to_json() == field, field)" ] }, { "cell_type": "code", "execution_count": 22, "id": "a878011a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True {':field': {'test': [{'::int32': [1, 2, 3]}, [0, 1, 2, 0, 1]]}}\n", "True {':field': {'test': [[1, 2, 3], [0, 1, 2, 0, 1]]}}\n", "True {':field': [[1.0, 2.1, 3.0], [0, 1, 2, 0, 1]]}\n", "True {':field': [['er', 'et', 'ez'], [0, 1, 2, 0, 1]]}\n", "True {':field': [[True, False], [0, 1, 0, 1, 0]]}\n", "True {':field': [{'::string': ['er', 'et', 'ez']}, [0, 1, 2, 0, 1]]}\n", "True {':field': {'test': [{'::float32': [1.0, 2.5, 3.0]}, [0, 1, 2, 0, 1]]}}\n", "True {':field': [{'::int64': [1, 2, 3]}, [0, 1, 2, 0, 1]]}\n", "True {':field': [{'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}, [0, 1, 0, 1, 0]]}\n", "True {':field': [{'::date': ['2021-12-31', '2022-01-01']}, [0, 1, 0, 1, 0]]}\n", "True {':field': [{'::time': ['23:00:00', '23:01:00']}, [0, 1, 0, 1, 0]]}\n", "True {':field': {'test_date': [{'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}, [0, 1, 0, 1, 0]]}}\n", "True {':field': [{'::boolean': [True, False]}, [0, 1, 0, 1, 0]]}\n", "True {':field': [[True], [2]]}\n", "True {':field': {'quantity': [['1 kg', '10 kg'], [4]]}}\n" ] } ], "source": [ "# json interface ok (categorical data)\n", "for a in [{'test': [{'::int32': [1, 2, 3]}, [0,1,2,0,1]]},\n", " {'test': [[1, 2, 3], [0,1,2,0,1]]},\n", " [[1.0, 2.1, 3.0], [0,1,2,0,1]],\n", " [['er', 'et', 'ez'], [0,1,2,0,1]],\n", " [[True, False], [0,1,0,1,0]],\n", " [{'::string': ['er', 'et', 'ez']}, [0,1,2,0,1]],\n", " {'test':[{'::float32': [1.0, 2.5, 3.0]}, [0,1,2,0,1]]},\n", " [{'::int64': [1, 2, 3]}, [0,1,2,0,1]],\n", " [{'::datetime': [\"2021-12-31T23:00:00.000\", \"2022-01-01T23:00:00.000\"] }, [0,1,0,1,0]],\n", " [{'::date': [\"2021-12-31\", \"2022-01-01\"] }, [0,1,0,1,0]],\n", " [{'::time': [\"23:00:00\", \"23:01:00\"] }, [0,1,0,1,0]],\n", " {'test_date': [{'::datetime': [\"2021-12-31T23:00:00.000\", \"2022-01-01T23:00:00.000\"] }, [0,1,0,1,0]]},\n", " [{'::boolean': [True, False]}, [0,1,0,1,0]],\n", " [[True], [2]], # periodic Series\n", " {'quantity': [['1 kg', '10 kg'], [4]]}]: # periodic Series\n", " field = {':field': a}\n", " print(npd.read_json(field).npd.to_json() == field, field)" ] }, { "cell_type": "code", "execution_count": 23, "id": "fc5c1c1f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False True {':field': [None, None]}\n", "False True {':field': [None, None]}\n", "False False {':field': {'::uint64': [1, 2, 3]}}\n", "False True {':field': ['2022-01-01T00:00:00.000', '2022-01-02T00:00:00.000', '2022-01-03T00:00:00.000']}\n" ] } ], "source": [ "# json interface ko\n", "srs = [# without ntv_type\n", " pd.Series([math.nan, math.nan]), # bug pandas conversion json : datetime NaT\n", " \n", " # without ntv_type, with dtype\n", " pd.Series([math.nan, math.nan], dtype='float64'), # bug pandas conversion json : datetime NaT\n", " \n", " # with ntv_type in Series name and in json data\n", " pd.Series([1,2,3], dtype='UInt64', name='::uint64'), # name inutile\n", " \n", " # with ntv_type unknown in pandas\n", " pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2), datetime(2022, 1, 3)], dtype='datetime64[ns, UTC]'), #à traiter\n", "]\n", "for sr in srs:\n", " print(sr.npd.equals(npd.read_json(sr.npd.to_json())), \n", " npd.read_json(sr.npd.to_json()).name == sr.name, \n", " sr.npd.to_json(text=True)) " ] }, { "cell_type": "code", "execution_count": 24, "id": "0319c881", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False {':field': {'test_array': [{'::array': [[1, 2], [3, 4], [5, 6], [7, 8]]}, [0, 1, 0, 2, 3]]}}\n" ] } ], "source": [ "# json interface ko (categorical data)\n", "for a in [{'test_array': [{'::array': [[1,2], [3,4], [5,6], [7,8]]}, [0, 1, 0, 2, 3]]}]: # list -> tuple to be hashable\n", " field = {':field': a}\n", " print(npd.read_json(field).npd.to_json() == field, field)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 5 }