{ "cells": [ { "cell_type": "raw", "id": "f286242d-0e20-4d99-bb82-0a53537cc914", "metadata": {}, "source": [ "---\n", "title: Gzipping or not Gzipping HEALPix maps\n", "date: 2023-03-06\n", "categories:\n", " - healpy\n", " - nersc\n", "---" ] }, { "cell_type": "markdown", "id": "e17e219f-4272-4e68-9718-74bb1b4cdb90", "metadata": {}, "source": [ "HEALPix maps are generally stored in FITS format, they can be Gzipped to save disk space.\n", "Reading gzipped maps is natively supported in `healpy`, however, it takes longer because maps are uncompressed on the fly.\n", "\n", "As usual, the best way to assess this is to test.\n", "In this case, I am testing using the JupyterHub@NERSC running on a Cori shared CPU server, the maps have about 30% of unobserved pixels and are stored in double precision.\n", "\n", "For this case, the performance loss due to compression is significant, in particular if accessing just a subset of the maps. Therefore it is hard to justify compression, unless storage is really a limiting factor.\n", "\n", "This is heavily case-dependent, so a new test (possibly reusing this notebook) should be performed on a different dataset and machine." ] }, { "cell_type": "code", "execution_count": 1, "id": "13015d3b-fa07-40b1-9009-4b7db7e8ff66", "metadata": {}, "outputs": [], "source": [ "import healpy as hp" ] }, { "cell_type": "code", "execution_count": 2, "id": "fff16708-4a06-4bd7-91ad-25b7752d72c5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.16.2'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hp.version.__version__" ] }, { "cell_type": "code", "execution_count": 3, "id": "455a6b33-d6ba-4157-8ce6-e388dc9282fa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/global/cfs/cdirs/cmbs4/dc/dc1\n" ] } ], "source": [ "cd /global/cfs/cdirs/cmbs4/dc/dc1/" ] }, { "cell_type": "code", "execution_count": 4, "id": "01e8a722-d0e1-4cb5-831d-9b00dfb15466", "metadata": {}, "outputs": [], "source": [ "!rm -r test_readgzip || true" ] }, { "cell_type": "code", "execution_count": 5, "id": "a3314a16-b6f7-4b2d-97bd-5291445296d2", "metadata": {}, "outputs": [], "source": [ "%mkdir test_readgzip" ] }, { "cell_type": "code", "execution_count": 6, "id": "d25078db-70d2-40a5-97aa-2813021c9c33", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/global/cfs/cdirs/cmbs4/dc/dc1/test_readgzip\n" ] } ], "source": [ "cd test_readgzip" ] }, { "cell_type": "markdown", "id": "7fb4384a-c5ca-4ccc-9469-3b3b2a3a866b", "metadata": {}, "source": [ "## Stage data" ] }, { "cell_type": "code", "execution_count": 7, "id": "18b16b65-fe8d-4d21-b0fd-cdaa6f406eef", "metadata": {}, "outputs": [], "source": [ "folders = [\"test1\", \"test2\", \"test1gz\", \"test2gz\"]\n", "folders_string = \" \".join(folders)" ] }, { "cell_type": "code", "execution_count": 8, "id": "42a37e9e-e859-441f-9a95-3124050201cf", "metadata": {}, "outputs": [], "source": [ "!mkdir -p $folders_string" ] }, { "cell_type": "code", "execution_count": 9, "id": "5b301bbe-6e4d-4c5e-b4f0-fed29d6236b9", "metadata": {}, "outputs": [], "source": [ "input_folder = \"../staging/noise_sim/outputs_rk/coadd/LAT0_CHLAT\"\n", "m = \"coadd_LAT0_CHLAT_f150_001of001_map.fits\"\n", "cov = \"coadd_LAT0_CHLAT_f150_001of001_cov.fits\"" ] }, { "cell_type": "code", "execution_count": 10, "id": "8960051c-fc2d-41b0-a1be-30e72c528e64", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test1\n", "test2\n", "test1gz\n", "test2gz\n" ] } ], "source": [ "for f in folders:\n", " print(f)\n", " !cp $input_folder/$m $input_folder/$cov $f" ] }, { "cell_type": "markdown", "id": "dbf28189-a8b8-4c0e-9c6a-a78ed6dcc7df", "metadata": {}, "source": [ "## Compression" ] }, { "cell_type": "code", "execution_count": 11, "id": "2d2840f0-69b1-4410-a3a1-7cc516027eb9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "real\t3m24.095s\n", "user\t3m13.681s\n", "sys\t0m4.717s\n", "\n", "real\t5m27.430s\n", "user\t5m13.233s\n", "sys\t0m7.657s\n", "\n", "real\t3m22.517s\n", "user\t3m12.237s\n", "sys\t0m4.510s\n", "\n", "real\t5m21.693s\n", "user\t5m7.241s\n", "sys\t0m7.664s\n" ] } ], "source": [ "for f in folders:\n", " if f.endswith(\"gz\"):\n", " !time gzip $f/$m\n", " !time gzip $f/$cov" ] }, { "cell_type": "code", "execution_count": 12, "id": "7170db61-eeae-425f-9b11-58f0e2867ab9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "coadd_LAT0_CHLAT_f150_001of001_cov.fits.gz\n", "coadd_LAT0_CHLAT_f150_001of001_map.fits.gz\n" ] } ], "source": [ "ls test1gz" ] }, { "cell_type": "code", "execution_count": 13, "id": "2b3908a0-822c-4127-8c2f-b713e64811b1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "coadd_LAT0_CHLAT_f150_001of001_map.fits.gz compression factor: 63%\n", "coadd_LAT0_CHLAT_f150_001of001_cov.fits.gz compression factor: 42%\n" ] } ], "source": [ "for each in [m, cov]:\n", " fits = !stat -c \"%s\" test1/$each\n", " each += \".gz\"\n", " gz = !stat -c \"%s\" test1gz/$each\n", " print(f\"{each} compression factor: {int(gz[0])/int(fits[0]):.0%}\")" ] }, { "cell_type": "markdown", "id": "85aeca3a-5433-4d1d-9e52-d9ed79165e55", "metadata": {}, "source": [ "## Benchmark map access" ] }, { "cell_type": "code", "execution_count": 14, "id": "a6ddce59-9fb3-4958-8178-6aa8226843ae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 33.2 s, sys: 26.4 s, total: 59.6 s\n", "Wall time: 1min 6s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test1/{m}\", (0,1,2))" ] }, { "cell_type": "code", "execution_count": 15, "id": "76b5cb18-6ebd-4325-b385-dec5d621af9d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 36s, sys: 24.7 s, total: 2min 1s\n", "Wall time: 2min 2s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test1gz/{m}.gz\", (0,1,2))" ] }, { "cell_type": "code", "execution_count": 16, "id": "68a4c82e-ef15-4c48-aa61-d5c696c4946a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 10.6 s, sys: 9.98 s, total: 20.6 s\n", "Wall time: 27.3 s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test2/{m}\", 0)" ] }, { "cell_type": "code", "execution_count": 17, "id": "6ac0a9e5-cf37-461a-9e63-b72e667a900b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 12s, sys: 11.8 s, total: 1min 24s\n", "Wall time: 1min 25s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test2gz/{m}.gz\", 0)" ] }, { "cell_type": "markdown", "id": "d44284e0-1b9e-4735-9e73-54d7b4147c6d", "metadata": {}, "source": [ "## Benchmark noise covariance access" ] }, { "cell_type": "code", "execution_count": 18, "id": "856aad1c-f704-4cf4-9284-1f1f562252fe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 13s, sys: 50 s, total: 2min 3s\n", "Wall time: 2min 23s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test1/{cov}\", (0,1,2,3,4,5))" ] }, { "cell_type": "code", "execution_count": 19, "id": "fe0965c8-32c4-4b7e-94a8-2e01535e4181", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3min 10s, sys: 54.9 s, total: 4min 5s\n", "Wall time: 4min 6s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test1gz/{cov}.gz\", (0,1,2,3,4,5))" ] }, { "cell_type": "code", "execution_count": 20, "id": "37e36951-9bd9-4be0-a69b-19aff424cbf5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 10.9 s, sys: 13.5 s, total: 24.4 s\n", "Wall time: 44.2 s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test2/{cov}\", 0)" ] }, { "cell_type": "code", "execution_count": 21, "id": "d041ce4b-496d-4ff9-95da-f39fe1c23732", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2min 7s, sys: 19.5 s, total: 2min 27s\n", "Wall time: 2min 28s\n" ] } ], "source": [ "%%time\n", "\n", "_ = hp.read_map(f\"test2gz/{cov}.gz\", 0)" ] } ], "metadata": { "kernelspec": { "display_name": "pycmb", "language": "python", "name": "pycmb" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }