{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading and Writing Files\n",
    "\n",
    "## HDF5\n",
    "\n",
    "Scipp supports writing variables, data arrays, and dataset to [HDF5](https://portal.hdfgroup.org/documentation/index.html) files.\n",
    "Reading of HDF5 is supported *only* for these scipp-specific files.\n",
    "Other HDF5-based formats are not supported at this point.\n",
    "For reading the HDF5-based [NeXus](https://www.nexusformat.org/) files, see [scippneutron](https://scipp.github.io/scippneutron/).\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "**Warning**\n",
    "    \n",
    "We do not recommend to use Scipp HDF5 files for archiving or as the sole means of storing valuable data.\n",
    "The current Scipp HDF5 schema is not a standard and will likely be subject to change due to the early development status of scipp.\n",
    "**Future versions of Scipp may not be able to read older files.**\n",
    "    \n",
    "That being said, the file format is quite simple and based on the HDF5 standard so it would still be possible to recover data from such files in such a case.\n",
    "Note that the Scipp version is stored as an HDF5 attribute of the saved objects.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipp as sc\n",
    "\n",
    "x = sc.Variable(dims=['x'], values=np.arange(10))\n",
    "var = sc.Variable(dims=['x', 'y'], values=np.random.rand(9, 3))\n",
    "a = sc.DataArray(data=var, coords={'x': x})\n",
    "\n",
    "a.save_hdf5(filename='test.hdf5')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = sc.io.load_hdf5(filename='test.hdf5')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CSV\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "\n",
    "CSV support requires [pandas](https://pandas.pydata.org/) which must be installed separately.\n",
    "    \n",
    "</div>\n",
    "\n",
    "CSV files can be read into datasets with [scipp.io.load_csv](../generated/modules/scipp.io.csv.load_csv.rst).\n",
    "For example, given the following CSV-encoded data can be read into a dataset as shown:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "csv_content = '''a [m],b [s],c\n",
    "1,5,9\n",
    "2,6,10\n",
    "3,7,11\n",
    "4,8,12'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from io import StringIO\n",
    "\n",
    "ds = sc.io.load_csv(StringIO(csv_content), header_parser='bracket')\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This example uses `StringIO` to load the data directly from a string.\n",
    "But `load_csv` can also load from a file on your hard drive or even from a remote server.\n",
    "Simply pass the path or URL of the file as the first argument.\n",
    "See also [pandas.read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html).\n",
    "\n",
    "See [scipp.io.load_csv](../generated/modules/scipp.io.csv.load_csv.rst) for more options to customize how the data is structured in the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using pandas\n",
    "\n",
    "The CSV reader shown above is a wrapper around `pandas.read_csv` and provides commonly used functionality.\n",
    "But pandas supports many more file readers for, among others, Excel, JSON, and XML files.\n",
    "See [pandas IO tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html) for a complete list.\n",
    "\n",
    "It is possible to use pandas manually to load these files and then convert the result to a Scipp dataset using [from_pandas](../generated/modules/scipp.compat.pandas_compat.from_pandas.rst).\n",
    "For example, JSON can be read as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "json = '''{\"A [m]\": {\"0\": 1, \"1\": 3, \"2\": 5},\n",
    "\"B [m/s]\": {\"0\": 2, \"1\": 4, \"2\": 6}}'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_json(json)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "sc.compat.from_pandas(df, header_parser='bracket')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NeXus\n",
    "\n",
    "Scipp has no built-in support for loading [NeXus](https://www.nexusformat.org/) files.\n",
    "However, the `scippneutron` package can internally use [Mantid](https://www.mantidproject.org) to load such files, or any other Mantid-supported file type, see [scippneutron](https://scipp.github.io/scippneutron/) and in particular [scippneutron.load_with_mantid](https://scipp.github.io/scippneutron/generated/functions/scippneutron.load_with_mantid.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}