{
"cells": [
{
"cell_type": "markdown",
"id": "343c7b20",
"metadata": {},
"source": [
"## A JSON format compatible with the pandas data structure \n",
"------- \n",
"\n",
"### NTV-pandas\n",
"To have a simple, compact and reversible solution, the interface uses the [JSON-NTV format (Named and Typed Value)](https://github.com/loco-philippe/NTV#readme) - which integrates the notion of type - and its JSON-TAB variation for tabular data. \n",
"This solution allows to include a large number of types (not necessarily Pandas dtype). \n",
"JSON-TAB is also applicable for multidimensional data (e.g. Xarray).\n",
"\n",
"### Content\n",
"This NoteBook uses examples to present some key points\n",
"\n",
"*(active link on jupyter Notebook or Nbviewer)*\n",
"- [1 - Simple example](#1---Simple-example)\n",
"- [2 - Series](#2---Series)\n",
" - [Simple example](#Simple-example)\n",
" - [Typed example](#Typed-example)\n",
" - [Examples with a non-Pandas type](#Examples-with-a-non-Pandas-type)\n",
" - [Categorical examples](#Categorical-examples)\n",
"- [3 - DataFrame](#3---DataFrame)\n",
" - [Initial example](#Initial-example)\n",
" - [Complete example](#Complete-example)\n",
" - [Json data can be annotated](#Json-data-can-be-annotated)\n",
" - [Categorical data can be included](#Categorical-data-can-be-included)\n",
" - [Multidimensional data](#Multidimensional-data)\n",
"- [Appendix : Series tests](#Appendix-:-Series-tests) \n",
" \n",
"### References\n",
"- [JSON-NTV specification](https://loco-philippe.github.io/ES/JSON%20semantic%20format%20(JSON-NTV).htm)\n",
"- [JSON-TAB specification](https://loco-philippe.github.io/ES/NTV%20tabular%20format%20(NTV-TAB).htm)\n",
"- [JSON-NTV classes and methods](https://loco-philippe.github.io/NTV/json_ntv.html)\n",
"\n",
"This Notebook can also be viewed at [nbviewer](http://nbviewer.org/github/loco-philippe/ntv-pandas/tree/main/example)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d00a0cb6",
"metadata": {},
"outputs": [],
"source": [
"import math\n",
"import json\n",
"from pprint import pprint\n",
"\n",
"import pandas as pd\n",
"import ntv_pandas as npd\n",
"from shapely.geometry import Point, Polygon\n",
"from json_ntv import Ntv\n",
"from datetime import date, datetime, time"
]
},
{
"cell_type": "markdown",
"id": "6834381c",
"metadata": {},
"source": [
"## 1 - Simple example"
]
},
{
"cell_type": "markdown",
"id": "2ccd4147",
"metadata": {},
"source": [
"- The example is a Dataframe with several dtype"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "87e08572",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dates::date | \n",
" valid | \n",
" value32 | \n",
" ::month | \n",
" coord::point | \n",
" names | \n",
" unique::year | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 100 | \n",
" 1964-01-01 | \n",
" True | \n",
" 12 | \n",
" 1 | \n",
" POINT (1 2) | \n",
" john | \n",
" 2021 | \n",
"
\n",
" \n",
" 200 | \n",
" 1985-02-05 | \n",
" False | \n",
" 12 | \n",
" 2 | \n",
" POINT (3 4) | \n",
" eric | \n",
" 2021 | \n",
"
\n",
" \n",
" 300 | \n",
" 2022-01-21 | \n",
" True | \n",
" 22 | \n",
" 1 | \n",
" POINT (5 6) | \n",
" judith | \n",
" 2021 | \n",
"
\n",
" \n",
" 400 | \n",
" 1964-01-01 | \n",
" True | \n",
" 22 | \n",
" 1 | \n",
" POINT (7 8) | \n",
" mila | \n",
" 2021 | \n",
"
\n",
" \n",
" 500 | \n",
" 1985-02-05 | \n",
" False | \n",
" 32 | \n",
" 2 | \n",
" POINT (3 4) | \n",
" hector | \n",
" 2021 | \n",
"
\n",
" \n",
" 600 | \n",
" 2022-01-21 | \n",
" True | \n",
" 32 | \n",
" 1 | \n",
" POINT (5 6) | \n",
" maria | \n",
" 2021 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" dates::date valid value32 ::month coord::point names unique::year\n",
"index \n",
"100 1964-01-01 True 12 1 POINT (1 2) john 2021\n",
"200 1985-02-05 False 12 2 POINT (3 4) eric 2021\n",
"300 2022-01-21 True 22 1 POINT (5 6) judith 2021\n",
"400 1964-01-01 True 22 1 POINT (7 8) mila 2021\n",
"500 1985-02-05 False 32 2 POINT (3 4) hector 2021\n",
"600 2022-01-21 True 32 1 POINT (5 6) maria 2021"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tab_data = {'index': [100, 200, 300, 400, 500, 600],\n",
" 'dates::date': pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), \n",
" date(1985,2,5), date(2022,1,21)], dtype='category'), \n",
" 'valid': [True, False, True, True, False, True],\n",
" 'value32': pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),\n",
" '::month': [1, 2, 1, 1, 2, 1],\n",
" 'coord::point': pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),\n",
" 'names': pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),\n",
" 'unique::year': 2021 }\n",
"df = pd.DataFrame(tab_data).set_index('index')\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "47d48540",
"metadata": {},
"source": [
"- the example has a simple and compact JSON representation including dtype"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c37108e8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{':tab': {'::month': [1, 2, 1, 1, 2, 1],\n",
" 'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n",
" 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n",
" 'index': [100, 200, 300, 400, 500, 600],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'unique:year': 2021,\n",
" 'valid': [True, False, True, True, False, True],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n"
]
}
],
"source": [
"df_to_json = df.npd.to_json()\n",
"pprint(df_to_json, width=120)"
]
},
{
"cell_type": "markdown",
"id": "7deb4ffb",
"metadata": {},
"source": [
"- The json conversion is reversible : df_from_json equals initial df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0e9736c8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df created from JSON-NTV is equal to initial df ? True\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" dates::date | \n",
" valid | \n",
" value32 | \n",
" ::month | \n",
" coord::point | \n",
" names | \n",
" unique::year | \n",
"
\n",
" \n",
" \n",
" \n",
" 100 | \n",
" 1964-01-01 | \n",
" True | \n",
" 12 | \n",
" 1 | \n",
" POINT (1 2) | \n",
" john | \n",
" 2021 | \n",
"
\n",
" \n",
" 200 | \n",
" 1985-02-05 | \n",
" False | \n",
" 12 | \n",
" 2 | \n",
" POINT (3 4) | \n",
" eric | \n",
" 2021 | \n",
"
\n",
" \n",
" 300 | \n",
" 2022-01-21 | \n",
" True | \n",
" 22 | \n",
" 1 | \n",
" POINT (5 6) | \n",
" judith | \n",
" 2021 | \n",
"
\n",
" \n",
" 400 | \n",
" 1964-01-01 | \n",
" True | \n",
" 22 | \n",
" 1 | \n",
" POINT (7 8) | \n",
" mila | \n",
" 2021 | \n",
"
\n",
" \n",
" 500 | \n",
" 1985-02-05 | \n",
" False | \n",
" 32 | \n",
" 2 | \n",
" POINT (3 4) | \n",
" hector | \n",
" 2021 | \n",
"
\n",
" \n",
" 600 | \n",
" 2022-01-21 | \n",
" True | \n",
" 32 | \n",
" 1 | \n",
" POINT (5 6) | \n",
" maria | \n",
" 2021 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" dates::date valid value32 ::month coord::point names unique::year\n",
"100 1964-01-01 True 12 1 POINT (1 2) john 2021\n",
"200 1985-02-05 False 12 2 POINT (3 4) eric 2021\n",
"300 2022-01-21 True 22 1 POINT (5 6) judith 2021\n",
"400 1964-01-01 True 22 1 POINT (7 8) mila 2021\n",
"500 1985-02-05 False 32 2 POINT (3 4) hector 2021\n",
"600 2022-01-21 True 32 1 POINT (5 6) maria 2021"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_from_json = npd.read_json(df_to_json)\n",
"print('df created from JSON-NTV is equal to initial df ? ', df_from_json.equals(df))\n",
"df_from_json"
]
},
{
"cell_type": "markdown",
"id": "ca787e67",
"metadata": {},
"source": [
"## 2 - Series"
]
},
{
"cell_type": "markdown",
"id": "14204c2d",
"metadata": {},
"source": [
"### Simple example"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "62319579",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 1\n",
"1 2\n",
"2 3\n",
"Name: value, dtype: int64\n",
"\n",
"Json representation : \n",
" {':field': {'value': [1, 2, 3]}}\n",
"\n",
"is Json translation reversible ? True\n",
"\n",
"is pandas translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'value': [1, 2, 3]}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n",
"print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))"
]
},
{
"cell_type": "markdown",
"id": "954c6e65",
"metadata": {},
"source": [
"### Typed example"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a556868a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 1964-01-01\n",
"1 1985-02-05\n",
"2 2022-01-21\n",
"Name: dates, dtype: datetime64[ns]\n",
"\n",
"Json representation : \n",
" {':field': {'dates::datetime': ['1964-01-01T00:00:00.000', '1985-02-05T00:00:00.000', '2022-01-21T00:00:00.000']}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'dates::datetime': ['1964-01-01', '1985-02-05', '2022-01-21']}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))"
]
},
{
"cell_type": "markdown",
"id": "bbef79a9",
"metadata": {},
"source": [
"### Examples with a non-Pandas type"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "765f0396",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 1964-01-01\n",
"1 1985-02-05\n",
"2 2022-01-21\n",
"Name: dates::date, dtype: object\n",
"\n",
"Json representation : \n",
" {':field': {'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21']}}\n",
"\n",
"is Json translation reversible ? True\n",
"\n",
"is pandas translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21']}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n",
"print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "327c67a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 POINT (1 2)\n",
"1 POINT (3 4)\n",
"2 POINT (5 6)\n",
"Name: coord::point, dtype: object\n",
"\n",
"Json representation : \n",
" {':field': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'coord::point': [[1,2], [3,4], [5,6]]}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))"
]
},
{
"cell_type": "markdown",
"id": "90a68b81",
"metadata": {},
"source": [
"### Categorical examples\n",
"- available only with hashable data"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6ba57f0a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 1\n",
"1 2\n",
"2 2\n",
"3 1\n",
"Name: integer, dtype: category\n",
"Categories (2, Int64): [1, 2]\n",
"\n",
"Json representation : \n",
" {':field': {'integer': [[1, 2], [0, 1, 1, 0]]}}\n",
"\n",
"is Json translation reversible ? True\n",
"\n",
"is pandas translation reversible ? True\n"
]
}
],
"source": [
"field_data = {\"integer\": [[1, 2], [0, 1, 1, 0]]}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n",
"print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7e45d94d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 1964-01-01\n",
"1 1985-02-05\n",
"2 1964-01-01\n",
"3 2022-01-21\n",
"Name: dates::date, dtype: category\n",
"Categories (3, object): [1964-01-01, 1985-02-05, 2022-01-21]\n",
"\n",
"Json representation : \n",
" {':field': {'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 0, 2]]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 0, 2]]}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a4a756e7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas object :\n",
"0 (1, 2)\n",
"1 (3, 4)\n",
"2 (1, 2)\n",
"3 (5, 6)\n",
"Name: test_array::array, dtype: category\n",
"Categories (3, object): [(1, 2), (3, 4), (5, 6)]\n",
"\n",
"Json representation : \n",
" {':field': {'test_array': [{'::array': [(1, 2), (3, 4), (5, 6)]}, [0, 1, 0, 2]]}}\n",
"\n",
"is Json translation reversible ? True\n",
"\n",
"is pandas translation reversible ? True\n"
]
}
],
"source": [
"field_data = {'test_array': [{'::array': [[1,2], [3,4], [5,6]]}, [0, 1, 0, 2]]}\n",
"sr = npd.read_json({':field': field_data})\n",
"# pandas dtype conform to Ntv type\n",
"print('pandas object :\\n' + str(sr))\n",
"print('\\nJson representation : \\n ', sr.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', sr.equals(npd.read_json(sr.npd.to_json())))\n",
"print('\\nis pandas translation reversible ? ', json.dumps(sr.npd.to_json()) == json.dumps({':field': field_data}))"
]
},
{
"cell_type": "markdown",
"id": "38335df3",
"metadata": {},
"source": [
"## 3 - DataFrame"
]
},
{
"cell_type": "markdown",
"id": "fb4a7b88",
"metadata": {},
"source": [
"### Initial example"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "ac890bcb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas dtype :\n",
"A object\n",
"B object\n",
"dtype: object\n",
"\n",
"pandas object :\n",
" A B\n",
"0 a b\n",
"1 b c\n",
"2 c c\n",
"3 a d\n",
"\n",
"Json representation : \n",
" {':tab': {'index': [0, 1, 2, 3], 'A': ['a', 'b', 'c', 'a'], 'B': ['b', 'c', 'c', 'd']}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"df = pd.DataFrame({\"A\": list(\"abca\"), \"B\": list(\"bccd\")})\n",
"\n",
"print('pandas dtype :\\n' + str(df.dtypes))\n",
"print('\\npandas object :\\n' + str(df))\n",
"print('\\nJson representation : \\n ', df.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))"
]
},
{
"cell_type": "markdown",
"id": "c04430e6",
"metadata": {},
"source": [
"### Complete example\n",
"- index data\n",
"- Pandas dtype (int32, bool, string)\n",
"- NTV type (date, point) -> object dtype\n",
"- data unique"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fe6c69cb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas dtype :\n",
"dates::date object\n",
"value int64\n",
"value32 int32\n",
"res int64\n",
"coord::point object\n",
"names string[python]\n",
"unique bool\n",
"dtype: object\n",
"\n",
"pandas object :\n",
" dates::date value value32 res coord::point names unique\n",
"100 1964-01-01 10 12 10 POINT (1 2) john True\n",
"200 1985-02-05 10 12 20 POINT (3 4) eric True\n",
"300 2022-01-21 20 22 30 POINT (5 6) judith True\n",
"400 1964-01-01 20 22 10 POINT (7 8) mila True\n",
"500 1985-02-05 30 32 20 POINT (3 4) hector True\n",
"600 2022-01-21 30 32 30 POINT (5 6) maria True\n",
"\n",
"Json representation :\n",
"{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n",
" 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],\n",
" 'index': [100, 200, 300, 400, 500, 600],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'res': [10, 20, 30, 10, 20, 30],\n",
" 'unique': True,\n",
" 'value': [10, 10, 20, 20, 30, 30],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"tab_data = {'index': [100, 200, 300, 400, 500, 600],\n",
" 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], \n",
" 'value': [10, 10, 20, 20, 30, 30],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32],\n",
" 'res': [10, 20, 30, 10, 20, 30],\n",
" 'coord::point': [[1,2], [3,4], [5,6], [7,8], [3,4], [5,6]],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'unique': True }\n",
"df = npd.read_json({':tab': tab_data})\n",
"print('pandas dtype :\\n' + str(df.dtypes))\n",
"print('\\npandas object :\\n' + str(df))\n",
"print('\\nJson representation :')\n",
"pprint(df.npd.to_json(), width=140)\n",
"print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))"
]
},
{
"cell_type": "markdown",
"id": "513dd337",
"metadata": {},
"source": [
"### Json data can be annotated"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "cdda0ce3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"is DataFrame identical ? True\n"
]
}
],
"source": [
"tab_data = {'index': [100, 200, 300, 400, 500, 600],\n",
" 'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], \n",
" 'value': [10, 10, 20, 20, {'valid?': 30}, 30],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32],\n",
" 'res': {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},\n",
" 'coord::point': [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'unique': True }\n",
"\n",
"df2 = npd.read_json({':tab': tab_data}, annotated=True)\n",
"print('is DataFrame identical ? ', df.equals(df2))"
]
},
{
"cell_type": "markdown",
"id": "82017e47",
"metadata": {},
"source": [
"### Categorical data can be included"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "22b7a281",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas dtype :\n",
"A category\n",
"B category\n",
"dtype: object\n",
"\n",
"pandas object :\n",
" A B\n",
"0 a b\n",
"1 b c\n",
"2 c c\n",
"3 a d\n",
"\n",
"Json representation : \n",
" {':tab': {'index': [0, 1, 2, 3], 'A': [['a', 'b', 'c'], [0, 1, 2, 0]], 'B': [['b', 'c', 'd'], [0, 1, 1, 2]]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"df = pd.DataFrame({\"A\": list(\"abca\"), \"B\": list(\"bccd\")}, dtype=\"category\")\n",
"\n",
"print('pandas dtype :\\n' + str(df.dtypes))\n",
"print('\\npandas object :\\n' + str(df))\n",
"print('\\nJson representation : \\n ', df.npd.to_json())\n",
"print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d7eb51ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas dtype :\n",
"dates::date category\n",
"value category\n",
"value32 int32\n",
"res int64\n",
"coord::point object\n",
"names string[python]\n",
"unique::boolean bool\n",
"dtype: object\n",
"\n",
"pandas object :\n",
" dates::date value value32 res coord::point names unique::boolean\n",
"100 1964-01-01 10 12 10 POINT (1 2) john True\n",
"200 1985-02-05 10 12 20 POINT (3 4) eric True\n",
"300 2022-01-21 20 22 30 POINT (5 6) judith True\n",
"400 1964-01-01 20 22 10 POINT (7 8) mila True\n",
"500 1985-02-05 30 32 20 POINT (3 4) hector True\n",
"600 2022-01-21 30 32 30 POINT (5 6) maria True\n",
"\n",
"Json representation :\n",
"{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n",
" 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n",
" 'index': [100, 200, 300, 400, 500, 600],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'res': [10, 20, 30, 10, 20, 30],\n",
" 'unique:boolean': True,\n",
" 'value': [[10, 20, 30], [2]],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"tab_data = {'index': [100, 200, 300, 400, 500, 600],\n",
" 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 2, 0, 1, 2]],\n",
" 'value': [[10, 20, {'valid?': 30}], [0, 0, 1, 1, 2, 2]],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32],\n",
" 'res': {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},\n",
" 'coord::point': [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'unique:boolean': True }\n",
"\n",
"df = npd.read_json({':tab': tab_data}, annotated=True)\n",
"print('pandas dtype :\\n' + str(df.dtypes))\n",
"print('\\npandas object :\\n' + str(df))\n",
"print('\\nJson representation :')\n",
"pprint(df.npd.to_json(), width=140)\n",
"print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "a3565bed",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pandas dtype :\n",
"dates::date category\n",
"value category\n",
"value32 int32\n",
"coord::point object\n",
"names string[python]\n",
"unique bool\n",
"dtype: object\n",
"\n",
"pandas object :\n",
" dates::date value value32 coord::point names unique\n",
"100 1964-01-01 10 12 POINT (1 2) john True\n",
"200 1985-02-05 10 12 POINT (3 4) eric True\n",
"300 2022-01-21 20 22 POINT (5 6) judith True\n",
"400 1964-01-01 20 22 POINT (7 8) mila True\n",
"500 1985-02-05 30 32 POINT (3 4) hector True\n",
"600 2022-01-21 30 32 POINT (5 6) maria True\n",
"\n",
"Json representation :\n",
"{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],\n",
" 'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [1]],\n",
" 'index': [100, 200, 300, 400, 500, 600],\n",
" 'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],\n",
" 'unique': True,\n",
" 'value': [[10, 20, 30], [2]],\n",
" 'value32::int32': [12, 12, 22, 22, 32, 32]}}\n",
"\n",
"is Json translation reversible ? True\n"
]
}
],
"source": [
"index = pd.Series([100, 200, 300, 400, 500, 600])\n",
"dates = pd.Series(name='dates::date', data=[date(1964, 1, 1), date(1985, 2, 5), date(2022, 1, 21), date(1964, 1, 1),\n",
" date(1985, 2, 5), date(2022, 1, 21)], dtype='object').astype('category')\n",
"value = pd.Series(name='value', data=[10,10,20,20,30,30], dtype='Int64').astype('category') #alias mandatory \n",
"value32 = pd.Series(name='value32', data=[12, 12, 22, 22, 32, 32], dtype='int32')\n",
"coord = pd.Series(name='coord::point', data=[Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)])\n",
"names = pd.Series(name='names', data=['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string')\n",
"unique = pd.Series(name='unique', data=[True, True, True, True, True, True])\n",
"\n",
"df = pd.DataFrame({ser.name: ser for ser in [index, dates, value, value32, coord, names, unique]}).set_index(None)\n",
"\n",
"print('pandas dtype :\\n' + str(df.dtypes))\n",
"print('\\npandas object :\\n' + str(df))\n",
"print('\\nJson representation :')\n",
"pprint(df.npd.to_json(), width=140)\n",
"print('\\nis Json translation reversible ? ', df.equals(npd.read_json(df.npd.to_json())))"
]
},
{
"cell_type": "markdown",
"id": "f23d9221",
"metadata": {},
"source": [
"### Multidimensional data\n",
"- JSON-TAB format is applicable for multidimensional data\n",
"- JSON multi-dimensional data can be translated into a Pandas Dataframe or a DataArray Xarray"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "270354b3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" quantity | \n",
" product | \n",
" plants | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" 1 kg | \n",
" apple | \n",
" fruit | \n",
" 1.0 | \n",
"
\n",
" \n",
" 0 | \n",
" 1 kg | \n",
" banana | \n",
" fruit | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 kg | \n",
" orange | \n",
" fruit | \n",
" 2.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 kg | \n",
" peppers | \n",
" vegetable | \n",
" 1.5 | \n",
"
\n",
" \n",
" 6 | \n",
" 10 kg | \n",
" apple | \n",
" fruit | \n",
" 10.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 10 kg | \n",
" banana | \n",
" fruit | \n",
" 5.0 | \n",
"
\n",
" \n",
" 5 | \n",
" 10 kg | \n",
" orange | \n",
" fruit | \n",
" 20.0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10 kg | \n",
" peppers | \n",
" vegetable | \n",
" 15.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" quantity product plants price\n",
"2 1 kg apple fruit 1.0\n",
"0 1 kg banana fruit 0.5\n",
"1 1 kg orange fruit 2.0\n",
"3 1 kg peppers vegetable 1.5\n",
"6 10 kg apple fruit 10.0\n",
"4 10 kg banana fruit 5.0\n",
"5 10 kg orange fruit 20.0\n",
"7 10 kg peppers vegetable 15.0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = {\"quantity\": [\"1 kg\", \"1 kg\", \"1 kg\", \"1 kg\", \"10 kg\", \"10 kg\", \"10 kg\", \"10 kg\"],\n",
" \"product\": [\"banana\", \"orange\", \"apple\", \"peppers\", \"banana\", \"orange\", \"apple\", \"peppers\"], \n",
" \"plants\": [\"fruit\", \"fruit\", \"fruit\", \"vegetable\", \"fruit\", \"fruit\", \"fruit\", \"vegetable\"], \n",
" \"price\": [0.5, 2, 1, 1.5, 5, 20, 10, 15]}\n",
"\n",
"df = pd.DataFrame(data)\n",
"df2 = pd.DataFrame(data, dtype='category').sort_values(by=['quantity', 'product'])\n",
"df2"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "40d718be",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"json_df is the JSON-TAB format with \"full\" mode\n",
"\n",
"{'index': [0, 1, 2, 3, 4, 5, 6, 7],\n",
" 'plants': ['fruit', 'fruit', 'fruit', 'vegetable', 'fruit', 'fruit', 'fruit', 'vegetable'],\n",
" 'price': [0.5, 2.0, 1.0, 1.5, 5.0, 20.0, 10.0, 15.0],\n",
" 'product': ['banana', 'orange', 'apple', 'peppers', 'banana', 'orange', 'apple', 'peppers'],\n",
" 'quantity': ['1 kg', '1 kg', '1 kg', '1 kg', '10 kg', '10 kg', '10 kg', '10 kg']}\n",
"\n",
"json_xa is the JSON-TAB format with \"optimize\" mode\n",
"\n",
"{'index': [2, 0, 1, 3, 6, 4, 5, 7],\n",
" 'plants': [['fruit', 'vegetable'], [0, 0, 0, 1, 0, 0, 0, 1]],\n",
" 'price': [[0.5, 1.0, 1.5, 2.0, 5.0, 10.0, 15.0, 20.0], [1, 0, 3, 2, 5, 4, 7, 6]],\n",
" 'product': [['apple', 'banana', 'orange', 'peppers'], [1]],\n",
" 'quantity': [['1 kg', '10 kg'], [4]]}\n",
"\n",
"DataFrame from the two JSON-TAB format are identical ? True\n",
"\n",
"The \"optimize\" JSON-TAB format is the image of the DataArray Xarray\n"
]
},
{
"data": {
"text/html": [
"\n",
"
<xarray.DataArray 'price' (quantity: 2, product: 4)> Size: 64B\n",
"array([[0.5, 2.0, 1.0, 1.5],\n",
" [5.0, 20.0, 10.0, 15.0]], dtype=object)\n",
"Coordinates:\n",
" * quantity (quantity) object 16B '1 kg' '10 kg'\n",
" * product (product) object 32B 'banana' 'orange' 'apple' 'peppers'\n",
" plants (product) object 32B 'fruit' 'fruit' 'fruit' 'vegetable'
"
],
"text/plain": [
" Size: 64B\n",
"array([[0.5, 2.0, 1.0, 1.5],\n",
" [5.0, 20.0, 10.0, 15.0]], dtype=object)\n",
"Coordinates:\n",
" * quantity (quantity) object 16B '1 kg' '10 kg'\n",
" * product (product) object 32B 'banana' 'orange' 'apple' 'peppers'\n",
" plants (product) object 32B 'fruit' 'fruit' 'fruit' 'vegetable'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"json_df = Ntv.obj(df).to_obj()[':tab']\n",
"print('json_df is the JSON-TAB format with \"full\" mode\\n')\n",
"pprint(json_df, width=200)\n",
"\n",
"json_xar = Ntv.obj(df2).to_obj()[':tab']\n",
"print('\\njson_xa is the JSON-TAB format with \"optimize\" mode\\n')\n",
"pprint(json_xar, width=200)\n",
"\n",
"df_from_xar = Ntv.obj({':tab': json_xar}).to_obj(format='obj').sort_index()\n",
"print('\\nDataFrame from the two JSON-TAB format are identical ? ', df.astype('object').equals(df_from_xar.astype('object')))\n",
"\n",
"print('\\nThe \"optimize\" JSON-TAB format is the image of the DataArray Xarray')\n",
"from tab_dataset import Sdataset\n",
"Sdataset.ntv(json_df).setcanonorder().to_xarray(varname='price')"
]
},
{
"cell_type": "markdown",
"id": "57322f1d",
"metadata": {},
"source": [
"## Appendix : Series tests"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "5f07553d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True {':field': [{'a': 2, 'e': 4}, {'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}\n",
"True {':field': [[1, 2], [3, 4], [5, 6]]}\n",
"True {':field': [[1, 2], [3, 4], {'a': 3, 'e': 5}]}\n",
"True {':field': [True, False, True]}\n",
"True {':field': ['az', 'er', 'cd']}\n",
"True {':field': ['az', 'az', 'az']}\n",
"True {':field': [1, 2, 3]}\n",
"True {':field': [1.1, 2.0, 3.0]}\n",
"True {':field': [10, 20, 30]}\n",
"True {':field': [True, False, True]}\n",
"True {':field': [1.1, 2.0, 3.0]}\n",
"True {':field': {'::datetime': [None, None, None]}}\n",
"True {':field': {'::datetime': ['2022-01-01T00:00:00.000', '2022-01-02T00:00:00.000']}}\n",
"True {':field': {'::duration': ['P1DT0H0M0S', 'P2DT0H0M0S']}}\n",
"True {':field': {'::string': ['az', 'er', 'cd']}}\n",
"True {':field': {'::int32': [1, 2, 3]}}\n",
"True {':field': {'::uint64': [1, 2, 3]}}\n",
"True {':field': {'::float32': [1.0, 2.0, 3.0]}}\n",
"True {':field': {'::int64': [1, 2, 3]}}\n",
"True {':field': {'::float64': [1.0, 2.0, 3.0]}}\n",
"True {':field': {'::array': [[1, 2], [3, 4], [5, 6]]}}\n",
"True {':field': {'::object': [{'a': 2, 'e': 4}, {'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}}\n",
"True {':field': {'::null': [None, None, None]}}\n",
"True {':field': {'::uri': ['geo:13.412 ,103.866', 'mailto:John.Doe@example.com']}}\n",
"True {':field': {'::file': ['///path/to/file', '//host.example.com/path/to/file']}}\n",
"True {':field': {'::date': ['2022-01-01', '2022-01-02']}}\n",
"True {':field': {'::time': ['10:21:01', '08:01:02']}}\n",
"True {':field': {'::day': [1, 2, 3]}}\n",
"True {':field': {'::year': [2001, 2002, 2003]}}\n",
"True {':field': {'::minute': [21, 10, 55]}}\n",
"True {':field': {'::point': [[1.0, 0.0], [1.0, 1.0], [1.0, 2.0]]}}\n"
]
}
],
"source": [
"# json interface ok\n",
"srs = [\n",
" # without ntv_type, without dtype\n",
" pd.Series([{'a': 2, 'e':4}, {'a': 3, 'e':5}, {'a': 4, 'e':6}]), \n",
" pd.Series([[1,2], [3,4], [5,6]]), \n",
" pd.Series([[1,2], [3,4], {'a': 3, 'e':5}]), \n",
" pd.Series([True, False, True]),\n",
" pd.Series(['az', 'er', 'cd']),\n",
" pd.Series(['az', 'az', 'az']),\n",
" pd.Series([1,2,3]),\n",
" pd.Series([1.1,2,3]),\n",
" \n",
" # without ntv_type, with dtype\n",
" pd.Series([10,20,30], dtype='Int64'),\n",
" pd.Series([True, False, True], dtype='boolean'),\n",
" pd.Series([1.1, 2, 3], dtype='float64'), \n",
"\n",
" # with ntv_type only in json data (not numbers)\n",
" pd.Series([pd.NaT, pd.NaT, pd.NaT]),\n",
" pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2)], dtype='datetime64[ns]'),\n",
" pd.Series(pd.to_timedelta(['1D', '2D'])),\n",
" pd.Series(['az', 'er', 'cd'], dtype='string'), \n",
"\n",
" # with ntv_type only in json data (numbers)\n",
" pd.Series([1,2,3], dtype='Int32'), \n",
" pd.Series([1,2,3], dtype='UInt64'),\n",
" pd.Series([1,2,3], dtype='float32'),\n",
"\n",
" # with ntv_type in Series name and in json data (numbers)\n",
" pd.Series([1,2,3], name='::int64'),\n",
" pd.Series([1,2,3], dtype='Float64', name='::float64'), # force dtype dans la conversion json\n",
"\n",
" # with ntv_type in Series name and in json data (not numbers)\n",
" pd.Series([[1,2], [3,4], [5,6]], name='::array'), \n",
" pd.Series([{'a': 2, 'e':4}, {'a': 3, 'e':5}, {'a': 4, 'e':6}], name='::object'), \n",
" pd.Series([None, None, None], name='::null'), \n",
" pd.Series([\"geo:13.412 ,103.866\", \"mailto:John.Doe@example.com\"], name='::uri', dtype='string'),\n",
" pd.Series([\"///path/to/file\", \"//host.example.com/path/to/file\"], name='::file', dtype='string'),\n",
"\n",
" # with ntv_type converted in object dtype (not in datetime)\n",
" pd.Series([date(2022, 1, 1), date(2022, 1, 2)], name='::date'),\n",
" pd.Series([time(10, 21, 1), time(8, 1, 2)], name='::time'),\n",
"\n",
" # with ntv_type unknown in pandas and with pandas conversion \n",
" pd.Series([1,2,3], dtype='int64', name='::day'),\n",
" pd.Series([2001,2002,2003], dtype='int64', name='::year'),\n",
" pd.Series([21,10,55], name='::minute'),\n",
"\n",
" # with ntv_type unknown in pandas and NTV conversion\n",
" pd.Series([Point(1, 0), Point(1, 1), Point(1, 2)], name='::point'),\n",
"]\n",
"for sr in srs:\n",
" print(sr.npd.equals(npd.read_json(sr.npd.to_json())), sr.npd.to_json()) "
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "090ad57b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True {':field': {'test::int32': [1, 2, 3]}}\n",
"True {':field': {'test': [1, 2, 3]}}\n",
"True {':field': [1.0, 2.1, 3.0]}\n",
"True {':field': ['er', 'et', 'ez']}\n",
"True {':field': [True, False, True]}\n",
"True {':field': {'::boolean': [True, False, True]}}\n",
"True {':field': {'::string': ['er', 'et', 'ez']}}\n",
"True {':field': {'test::float32': [1.0, 2.5, 3.0]}}\n",
"True {':field': {'::int64': [1, 2, 3]}}\n",
"True {':field': {'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}}\n",
"True {':field': {'::date': ['2021-12-31', '2022-01-01']}}\n",
"True {':field': {'::time': ['23:00:00', '23:01:00']}}\n",
"True {':field': {'::object': [{'a': 3, 'e': 5}, {'a': 4, 'e': 6}]}}\n",
"True {':field': {'::array': [[1, 2], [3, 4], [5, 6]]}}\n",
"True {':field': True}\n",
"True {':field': {':boolean': True}}\n"
]
}
],
"source": [
"# json interface ok\n",
"for a in [{'test::int32': [1,2,3]},\n",
" {'test': [1,2,3]},\n",
" [1.0, 2.1, 3.0],\n",
" ['er', 'et', 'ez'],\n",
" [True, False, True],\n",
" {'::boolean': [True, False, True]},\n",
" {'::string': ['er', 'et', 'ez']},\n",
" {'test::float32': [1.0, 2.5, 3.0]},\n",
" {'::int64': [1,2,3]},\n",
" {'::datetime': [\"2021-12-31T23:00:00.000\",\"2022-01-01T23:00:00.000\"] },\n",
" {'::date': [\"2021-12-31\", \"2022-01-01\"] },\n",
" {'::time': [\"23:00:00\", \"23:01:00\"] },\n",
" {'::object': [{'a': 3, 'e':5}, {'a': 4, 'e':6}]},\n",
" {'::array': [[1,2], [3,4], [5,6]]},\n",
" True,\n",
" {':boolean': True}\n",
" ]:\n",
" field = {':field': a}\n",
" print(npd.read_json(field).npd.to_json() == field, field)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "a878011a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True {':field': {'test': [{'::int32': [1, 2, 3]}, [0, 1, 2, 0, 1]]}}\n",
"True {':field': {'test': [[1, 2, 3], [0, 1, 2, 0, 1]]}}\n",
"True {':field': [[1.0, 2.1, 3.0], [0, 1, 2, 0, 1]]}\n",
"True {':field': [['er', 'et', 'ez'], [0, 1, 2, 0, 1]]}\n",
"True {':field': [[True, False], [0, 1, 0, 1, 0]]}\n",
"True {':field': [{'::string': ['er', 'et', 'ez']}, [0, 1, 2, 0, 1]]}\n",
"True {':field': {'test': [{'::float32': [1.0, 2.5, 3.0]}, [0, 1, 2, 0, 1]]}}\n",
"True {':field': [{'::int64': [1, 2, 3]}, [0, 1, 2, 0, 1]]}\n",
"True {':field': [{'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}, [0, 1, 0, 1, 0]]}\n",
"True {':field': [{'::date': ['2021-12-31', '2022-01-01']}, [0, 1, 0, 1, 0]]}\n",
"True {':field': [{'::time': ['23:00:00', '23:01:00']}, [0, 1, 0, 1, 0]]}\n",
"True {':field': {'test_date': [{'::datetime': ['2021-12-31T23:00:00.000', '2022-01-01T23:00:00.000']}, [0, 1, 0, 1, 0]]}}\n",
"True {':field': [{'::boolean': [True, False]}, [0, 1, 0, 1, 0]]}\n",
"True {':field': [[True], [2]]}\n",
"True {':field': {'quantity': [['1 kg', '10 kg'], [4]]}}\n"
]
}
],
"source": [
"# json interface ok (categorical data)\n",
"for a in [{'test': [{'::int32': [1, 2, 3]}, [0,1,2,0,1]]},\n",
" {'test': [[1, 2, 3], [0,1,2,0,1]]},\n",
" [[1.0, 2.1, 3.0], [0,1,2,0,1]],\n",
" [['er', 'et', 'ez'], [0,1,2,0,1]],\n",
" [[True, False], [0,1,0,1,0]],\n",
" [{'::string': ['er', 'et', 'ez']}, [0,1,2,0,1]],\n",
" {'test':[{'::float32': [1.0, 2.5, 3.0]}, [0,1,2,0,1]]},\n",
" [{'::int64': [1, 2, 3]}, [0,1,2,0,1]],\n",
" [{'::datetime': [\"2021-12-31T23:00:00.000\", \"2022-01-01T23:00:00.000\"] }, [0,1,0,1,0]],\n",
" [{'::date': [\"2021-12-31\", \"2022-01-01\"] }, [0,1,0,1,0]],\n",
" [{'::time': [\"23:00:00\", \"23:01:00\"] }, [0,1,0,1,0]],\n",
" {'test_date': [{'::datetime': [\"2021-12-31T23:00:00.000\", \"2022-01-01T23:00:00.000\"] }, [0,1,0,1,0]]},\n",
" [{'::boolean': [True, False]}, [0,1,0,1,0]],\n",
" [[True], [2]], # periodic Series\n",
" {'quantity': [['1 kg', '10 kg'], [4]]}]: # periodic Series\n",
" field = {':field': a}\n",
" print(npd.read_json(field).npd.to_json() == field, field)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "fc5c1c1f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False True {':field': [None, None]}\n",
"False True {':field': [None, None]}\n",
"False False {':field': {'::uint64': [1, 2, 3]}}\n",
"False True {':field': ['2022-01-01T00:00:00.000', '2022-01-02T00:00:00.000', '2022-01-03T00:00:00.000']}\n"
]
}
],
"source": [
"# json interface ko\n",
"srs = [# without ntv_type\n",
" pd.Series([math.nan, math.nan]), # bug pandas conversion json : datetime NaT\n",
" \n",
" # without ntv_type, with dtype\n",
" pd.Series([math.nan, math.nan], dtype='float64'), # bug pandas conversion json : datetime NaT\n",
" \n",
" # with ntv_type in Series name and in json data\n",
" pd.Series([1,2,3], dtype='UInt64', name='::uint64'), # name inutile\n",
" \n",
" # with ntv_type unknown in pandas\n",
" pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2), datetime(2022, 1, 3)], dtype='datetime64[ns, UTC]'), #à traiter\n",
"]\n",
"for sr in srs:\n",
" print(sr.npd.equals(npd.read_json(sr.npd.to_json())), \n",
" npd.read_json(sr.npd.to_json()).name == sr.name, \n",
" sr.npd.to_json(text=True)) "
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "0319c881",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False {':field': {'test_array': [{'::array': [[1, 2], [3, 4], [5, 6], [7, 8]]}, [0, 1, 0, 2, 3]]}}\n"
]
}
],
"source": [
"# json interface ko (categorical data)\n",
"for a in [{'test_array': [{'::array': [[1,2], [3,4], [5,6], [7,8]]}, [0, 1, 0, 2, 3]]}]: # list -> tuple to be hashable\n",
" field = {':field': a}\n",
" print(npd.read_json(field).npd.to_json() == field, field)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}