# PSA Utilities demo notebook

This notebook demonstrates some of the functions of the [github.com/msbentley/psa_utils](github.com/msbentley/psa_utils) package.

Please raise any bugs or criticisms as [GitHub issues](https://github.com/msbentley/psa_utils/issues)



In [1]:
from psa_utils import download, tap, pdap, packager, geogen, internal

# pdap

This modules uses `requests` to make simple calls to a PDAP service. By default it uses the PSA unless another URL is given:

In [2]:
p = pdap.Pdap()

The first function uses the meta-data endpoint to lists available datasets:

## `get_datasets`


In [3]:
dsets = p.get_datasets()

In [4]:
dsets.head()

Unnamed: 0,DATA_SET.DATA_SET_ID,DATA_SET.DATA_SET_NAME,DATA_SET.DATA_ACCESS_REFERENCE,DATA_SET.XML_DESCRIPTION,DATA_SET.PRODUCER.FULL_NAME,DATA_SET.PRODUCER.INSTITUTION_NAME,DATA_SET.PRODUCER.NODE_NAME,DATA_SET.START_TIME,DATA_SET.STOP_TIME,DATA_SET.NPRODUCTS,DATA_SET.MISSION_NAME,DATA_SET.INSTRUMENT_ID,DATA_SET.INSTRUMENT_NAME,DATA_SET.TARGET_NAME,RESOURCE_CLASS,DATA_SET.RELEASE_DATE
0,AIRUB-C-PHOTOCAM-2-EDR-HALLEY-1986-V1.0,AIRUB-HALLEY-PHOTOGRAPHIC-PROJECT-EDR-1986-V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='AIR...,,,,1986-02-17T06:44:00,1986-04-17T09:13:00,1833,EARTH,300,PENTACON-OPTICS-F4-300MM,1P/HALLEY,DATA_SET,2006-03-01
1,CH1ORB-L-C1XS-2-NPO-EDR-V1.0,CHANDRAYAAN-1-ORBITER MOON C1XS 2 NPO EDR V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='CH1...,,,,2008-10-22T03:37:35,2009-08-28T18:21:31,11738,CHANDRAYAAN-1,C1XS,C1XS,MOON,DATA_SET,2019-07-27
2,CH1ORB-L-C1XS-4-NPO-REFDR-V1.0,CHANDRAYAAN-1-ORBITER MOON C1XS 4 NPO REFDR V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='CH1...,,,,2008-11-20T18:32:11.358,2009-08-28T18:21:01.505,1675,CHANDRAYAAN-1,C1XS,C1XS,MOON,DATA_SET,2019-07-27
3,CH1ORB-L-SARA-2-NPO-EDR-CENA-V1.0,CHANDRAYAAN-1-ORBITER MOON SARA 2 NPO EDR CENA...,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='CH1...,,,,2008-12-08T11:54:53.374,2009-08-12T17:02:49.417,835,CHANDRAYAAN-1,SARA,SARA,MOON,DATA_SET,2019-11-26
4,CH1ORB-L-SARA-2-NPO-EDR-SWIM-V1.0,CHANDRAYAAN-1-ORBITER MOON SARA 2 NPO EDR SWIM...,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='CH1...,,,,2008-12-08T13:53:52.117,2009-08-12T17:02:49.319,1424,CHANDRAYAAN-1,SARA,SARA,MOON,DATA_SET,2019-12-05


We have ALL of the datasets now, and their basic meta data, which we can easily slice and dice.

For example, let's find those with Mars as a target:

In [5]:
dsets[dsets['DATA_SET.TARGET_NAME'].str.contains('mars', case=False)]

Unnamed: 0,DATA_SET.DATA_SET_ID,DATA_SET.DATA_SET_NAME,DATA_SET.DATA_ACCESS_REFERENCE,DATA_SET.XML_DESCRIPTION,DATA_SET.PRODUCER.FULL_NAME,DATA_SET.PRODUCER.INSTITUTION_NAME,DATA_SET.PRODUCER.NODE_NAME,DATA_SET.START_TIME,DATA_SET.STOP_TIME,DATA_SET.NPRODUCTS,DATA_SET.MISSION_NAME,DATA_SET.INSTRUMENT_ID,DATA_SET.INSTRUMENT_NAME,DATA_SET.TARGET_NAME,RESOURCE_CLASS,DATA_SET.RELEASE_DATE
34,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT1-V1.0,MARS EXPRESS ASPERA-3 RAW-CAL NTRL PARTICLE IM...,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='MEX...,,,,2006-01-01T22:42:28.096,2007-10-01T01:42:54.52,1532,MARS EXPRESS,ASPERA-3,ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (...,MARS,DATA_SET,2007-01-31
35,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT2-V1.0,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT2-V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='MEX...,,,,2007-10-01T04:39:48.77,2009-12-31T22:38:25.988,2302,MARS EXPRESS,ASPERA-3,ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (...,MARS,DATA_SET,2007-01-31
36,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT3-V1.0,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT3-V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='MEX...,,,,2010-01-01T15:49:39.192,2012-12-31T22:26:30.631,2026,MARS EXPRESS,ASPERA-3,ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (...,MARS,DATA_SET,2011-01-19
37,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT4-V1.0,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT4-V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='MEX...,,,,2013-01-01T01:34:28.707,2015-01-01T03:40:04.823,1948,MARS EXPRESS,ASPERA-3,ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (...,MARS,DATA_SET,2017-02-07
38,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT5-V1.0,MEX-M-ASPERA3-2/3-EDR/RDR-NPI-EXT5-V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='MEX...,,,,2015-01-01T04:00:26.611,2017-01-01T02:45:17.442,1465,MARS EXPRESS,ASPERA-3,ANALYZER OF SPACE PLASMA AND ENERGETIC ATOMS (...,MARS,DATA_SET,2017-02-07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7080,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='RO-...,,,,2006-07-29T00:00:27.741,2007-05-28T23:57:32.769,304,INTERNATIONAL ROSETTA MISSION,,STANDARD RADIATION ENVIROMENT MONITOR,MARS,DATA_SET,2020-12-08
7129,RO-X-SREM-5-MARS-V1.0,ROSETTA-ORBITER X SREM 5 MARS V1.0,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='RO-...,,,,2006-07-29T00:00:27.741,2007-05-28T23:57:32.769,303,INTERNATIONAL ROSETTA MISSION,,STANDARD RADIATION ENVIROMENT MONITOR,MARS,DATA_SET,2020-12-08
7177,urn:esa:psa:em16_tgo_acs,ACS,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='urn...,,,,2016-03-14T00:00:00,2021-03-07T20:06:09.614,2281756,ExoMars 2016,ACS,ACS,urn:nasa:pds:context:target:planet.mars,DATA_SET,2020-07-31
7178,urn:esa:psa:em16_tgo_cas,Instrument CASSIS,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,http://psa.esa.int/pdap/files?DATA_SET_ID='urn...,,,,2016-03-14T00:00:00,2021-03-07T21:44:08.425,4166552,ExoMars 2016,CaSSIS,CASSIS,urn:nasa:pds:context:target:planet.mars,DATA_SET,2020-04-30


## `get_products`

In [6]:
p.get_products?

[0;31mSignature:[0m [0mp[0m[0;34m.[0m[0mget_products[0m[0;34m([0m[0mdataset_id[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Queries the meta-data endpoint for products in the dataset ID
given in the call
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/pdap.py
[0;31mType:[0m      method


In [7]:
products = p.get_products(dataset_id='RO-X-SREM-2-MARS-V1.0')

In [8]:
products.head()

Unnamed: 0,PRODUCT.PRODUCT_ID,PRODUCT.DATA_ACCESS_REFERENCE,DATA_SET.DATA_SET_ID,DATA_SET.DATA_SET_NAME,DATA_SET.MISSION_NAME,DATA_SET.PRODUCER.FULL_NAME,DATA_SET.PRODUCER.INSTITUTION_NAME,DATA_SET.PRODUCER.NODE_NAME,PRODUCT.TARGET_NAME,PRODUCT.TARGET_TYPE,PRODUCT.INSTRUMENT_ID,PRODUCT.INSTRUMENT_NAME,PRODUCT.START_TIME,PRODUCT.STOP_TIME,PRODUCT.ICON_ACCESS_REFERENCE,RESOURCE_CLASS,VID
0,RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,INTERNATIONAL ROSETTA MISSION,,,,MARS,PLANET,,STANDARD RADIATION ENVIROMENT MONITOR,2007-05-28T00:03:03.755,2007-05-28T23:57:32.769,,PRODUCT,1.0
1,RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070527,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,INTERNATIONAL ROSETTA MISSION,,,,MARS,PLANET,,STANDARD RADIATION ENVIROMENT MONITOR,2007-05-27T00:00:04.24,2007-05-27T23:59:04.255,,PRODUCT,1.0
2,RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070526,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,INTERNATIONAL ROSETTA MISSION,,,,MARS,PLANET,,STANDARD RADIATION ENVIROMENT MONITOR,2007-05-26T00:00:28.225,2007-05-26T23:56:03.74,,PRODUCT,1.0
3,RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070525,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,INTERNATIONAL ROSETTA MISSION,,,,MARS,PLANET,,STANDARD RADIATION ENVIROMENT MONITOR,2007-05-25T00:00:00.211,2007-05-25T23:56:32.225,,PRODUCT,1.0
4,RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070524,http://psa.esa.int/pdap/download?RESOURCE_CLAS...,RO-X-SREM-2-MARS-V1.0,ROSETTA-ORBITER X SREM 2 MARS V1.0,INTERNATIONAL ROSETTA MISSION,,,,MARS,PLANET,,STANDARD RADIATION ENVIROMENT MONITOR,2007-05-24T00:02:32.696,2007-05-24T23:56:00.211,,PRODUCT,1.0


Let's look at what one entry contains:

In [9]:
products.iloc[0]

PRODUCT.PRODUCT_ID                          RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528
PRODUCT.DATA_ACCESS_REFERENCE         http://psa.esa.int/pdap/download?RESOURCE_CLAS...
DATA_SET.DATA_SET_ID                                              RO-X-SREM-2-MARS-V1.0
DATA_SET.DATA_SET_NAME                               ROSETTA-ORBITER X SREM 2 MARS V1.0
DATA_SET.MISSION_NAME                                     INTERNATIONAL ROSETTA MISSION
DATA_SET.PRODUCER.FULL_NAME                                                            
DATA_SET.PRODUCER.INSTITUTION_NAME                                                     
DATA_SET.PRODUCER.NODE_NAME                                                            
PRODUCT.TARGET_NAME                                                                MARS
PRODUCT.TARGET_TYPE                                                              PLANET
PRODUCT.INSTRUMENT_ID                                                               N/A
PRODUCT.INSTRUMENT_NAME         

## `get_product`

In [10]:
p.get_product?

[0;31mSignature:[0m [0mp[0m[0;34m.[0m[0mget_product[0m[0;34m([0m[0mproduct_id[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/pdap.py
[0;31mType:[0m      method


In [11]:
p.get_product('RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528')

PRODUCT.PRODUCT_ID                          RO-X-SREM-2-MARS-V1.0:DATA:SREM_L2_20070528
PRODUCT.DATA_ACCESS_REFERENCE         http://psa.esa.int/pdap/download?RESOURCE_CLAS...
DATA_SET.DATA_SET_ID                                              RO-X-SREM-2-MARS-V1.0
DATA_SET.DATA_SET_NAME                               ROSETTA-ORBITER X SREM 2 MARS V1.0
DATA_SET.MISSION_NAME                                     INTERNATIONAL ROSETTA MISSION
DATA_SET.PRODUCER.FULL_NAME                                                            
DATA_SET.PRODUCER.INSTITUTION_NAME                                                     
DATA_SET.PRODUCER.NODE_NAME                                                            
PRODUCT.TARGET_NAME                                                                MARS
PRODUCT.TARGET_TYPE                                                              PLANET
PRODUCT.INSTRUMENT_ID                                                               N/A
PRODUCT.INSTRUMENT_NAME         

## `get_files`

Uses the files endpoint to retrieve a list of files in a given dataset.

In [12]:
files = p.get_files(dataset_id='RO-X-SREM-2-MARS-V1.0')

In [13]:
files.head()

Unnamed: 0,Reference,DataSetId,ProductId,RELATIVE_DIRECTORY,Filename
0,http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...,RO-X-SREM-2-MARS-V1.0,SREM_L2_20060918,/DATA/,SREM_L2_20060918.TAB
1,http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...,RO-X-SREM-2-MARS-V1.0,SREM_L2_20061227,/DATA/,SREM_L2_20061227.TAB
2,http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...,RO-X-SREM-2-MARS-V1.0,SREM_L2_20060926,/DATA/,SREM_L2_20060926.TAB
3,http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...,RO-X-SREM-2-MARS-V1.0,,/EXTRAS/,SREM_ROSETTA_PACC_20070508.CDF
4,http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...,RO-X-SREM-2-MARS-V1.0,,/EXTRAS/,SREM_ROSETTA_PACC_20070419.CDF


In [14]:
files.iloc[0]

Reference             http://psa.esa.int/pdap/fileaccess?ID=INTERNAT...
DataSetId                                         RO-X-SREM-2-MARS-V1.0
ProductId                                              SREM_L2_20060918
RELATIVE_DIRECTORY                                               /DATA/
Filename                                           SREM_L2_20060918.TAB
Name: 0, dtype: object

# tap

The `tap` module contains a single class `PsaTap` and various convenience functions that call this class. It is basically a very thin wrapper around `astroquery`'s TAP functionality.

In [15]:
psa = tap.PsaTap()

Currently PsaTap includes only a single method, `query` which itself calls the `astroquery` Tap function and converts the returned data to a Pandas DataFrame.

## `query`

In [16]:
psa.query?

[0;31mSignature:[0m
[0mpsa[0m[0;34m.[0m[0mquery[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mq[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msync[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdropna[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mjob_wait_cycles[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mjob_wait_time[0m[0;34m=[0m[0;36m2[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Make a simple query and return the data as a pandas DataFrame
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/tap.py
[0;31mType:[0m      method


By default you can simply pass a query and this will run a synchronous query job and return. For queries that may return >2k results, set sync=False - you can adjust the number of wait cycles, and the time (in seconds) of each, before query aborts.

Note also the dropna boolean - if True, any columns which are *all* NaN will be dropped from the returned DataFrame. This is useful because the epn_core schema contains a lot of fields which are not _yet_ populated in the PSA database.

In [17]:
top10 = psa.query('select top 10 * from epn_core')

In [18]:
top10.head()

Unnamed: 0,access_estsize,access_format,access_url,creation_date,dataproduct_type,granule_gid,granule_uid,instrument_host_name,instrument_name,measurement_type,...,processing_level,release_date,service_title,spatial_frame_type,s_region,target_class,target_name,thumbnail_url,time_max,time_min
0,24,application/x-pds-zip,https://archives.esac.esa.int/psa/pdap/downloa...,2021-03-08T14:04:30.978734,ci,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...,Chandrayaan-1,C1XS,,...,3,2019-07-27T00:00:00.0,psa,none,,satellite,Moon,,2008-11-20 18:55:51.000009472,2008-11-20 18:27:32.000011520
1,25,application/x-pds-zip,https://archives.esac.esa.int/psa/pdap/downloa...,2021-03-08T14:04:30.978734,ci,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...,Chandrayaan-1,C1XS,,...,3,2019-07-27T00:00:00.0,psa,none,,satellite,Moon,,2008-11-23 00:24:13.999999232,2008-11-22 23:38:54.999981568
2,24,application/x-pds-zip,https://archives.esac.esa.int/psa/pdap/downloa...,2021-03-08T14:04:30.978734,ci,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...,Chandrayaan-1,C1XS,,...,3,2019-07-27T00:00:00.0,psa,none,,satellite,Moon,,2008-11-24 02:31:26.000002816,2008-11-24 02:31:26.000002816
3,24,application/x-pds-zip,https://archives.esac.esa.int/psa/pdap/downloa...,2021-03-08T14:04:30.978734,ci,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...,Chandrayaan-1,C1XS,,...,3,2019-07-27T00:00:00.0,psa,none,,satellite,Moon,,2008-11-24 04:00:05.999994368,2008-11-24 04:00:05.999994368
4,24,application/x-pds-zip,https://archives.esac.esa.int/psa/pdap/downloa...,2021-03-08T14:04:30.978734,ci,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA,CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...,Chandrayaan-1,C1XS,,...,3,2019-07-27T00:00:00.0,psa,none,,satellite,Moon,,2008-11-28 14:45:38.000008448,2008-11-28 14:45:38.000008448


Let's look at one single entry:

In [19]:
top10.iloc[0]

access_estsize                                                         24
access_format                                       application/x-pds-zip
access_url              https://archives.esac.esa.int/psa/pdap/downloa...
creation_date                                  2021-03-08T14:04:30.978734
dataproduct_type                                                       ci
granule_gid                             CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA
granule_uid             CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...
instrument_host_name                                        Chandrayaan-1
instrument_name                                                      C1XS
measurement_type                                                         
modification_date                              2021-03-08T14:04:30.978734
obs_id                  CH1ORB-L-C1XS-2-NPO-EDR-V1.0:DATA:C1XS_NECAX_R...
processing_level                                                        3
release_date                          

Here note also that `time_min` and `time_max` have been converted from Julian dates to standard date/times

Now let's try exceeding that 2k limit, also increasing astroquery's verbosity:

In [20]:
size_test = psa.query("select * from epn_core where instrument_name='OSIRIS'", verbose=True)

Launched query: 'select  TOP 2000 * from epn_core where instrument_name='OSIRIS''
------>https
host = archives.esac.esa.int:443
context = /psa/epn-tap/tap//sync
Content-type = application/x-www-form-urlencoded
200 
[('Date', 'Mon, 08 Mar 2021 17:46:50 GMT'), ('Server', 'Apache/2.4.6 (Red Hat Enterprise Linux)'), ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'), ('Pragma', 'no-cache'), ('Expires', '0'), ('X-XSS-Protection', '1; mode=block'), ('X-Frame-Options', 'DENY'), ('X-Content-Type-Options', 'nosniff'), ('Content-Type', 'application/x-votable+xml'), ('Set-Cookie', 'JSESSIONID=1C2A96F3B34D2B9EAF6F85EDAD9FCA3A;path=/psa/epn-tap;HttpOnly'), ('Vary', 'Accept-Encoding'), ('Transfer-Encoding', 'chunked')]
Retrieving sync. results...
Query finished.


In [21]:
len(size_test)

2000

So here we see that astroquery actually inserts an extra `TOP 2000` clause into the query statement. Note that if you specify `TOP` in your query, you automatically override this:

In [22]:
size_test = psa.query("select top 3000 * from epn_core where instrument_name='OSIRIS'", verbose=True)

Launched query: 'select top 3000 * from epn_core where instrument_name='OSIRIS''
------>https
host = archives.esac.esa.int:443
context = /psa/epn-tap/tap//sync
Content-type = application/x-www-form-urlencoded
200 
[('Date', 'Mon, 08 Mar 2021 17:46:55 GMT'), ('Server', 'Apache/2.4.6 (Red Hat Enterprise Linux)'), ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'), ('Pragma', 'no-cache'), ('Expires', '0'), ('X-XSS-Protection', '1; mode=block'), ('X-Frame-Options', 'DENY'), ('X-Content-Type-Options', 'nosniff'), ('Content-Type', 'application/x-votable+xml'), ('Set-Cookie', 'JSESSIONID=BC585CFC3BAFDB3CE865FD0271FC609F;path=/psa/epn-tap;HttpOnly'), ('Vary', 'Accept-Encoding'), ('Transfer-Encoding', 'chunked')]
Retrieving sync. results...
Query finished.


In [23]:
len(size_test)

3000

There is also the function, especially for larger queries, to run asynchronously:

In [24]:
asyn = psa.query("select top 10000 * from epn_core where instrument_name='ACS'", 
          sync=False, job_wait_cycles=2, job_wait_time=10)

In [25]:
len(asyn)

10000

You can of course also perform queries that return values other than than a product list:

In [26]:
psa.query("select count(*) from epn_core where instrument_name='MCAM'")

Unnamed: 0,count_all
0,1715


Note that products have an access_url which will download the product - this is used in the `download` module.

In [27]:
psa.query("select count(*) from epn_core where instrument_name='CaSSIS'")

Unnamed: 0,count_all
0,4087604


In [28]:
psa.query("select count(*) from epn_core where instrument_name='CaSSIS' and granule_uid like '%sti%'")

Unnamed: 0,count_all
0,24435


In [29]:
cassis = psa.query("select top 10 * from epn_core where instrument_name='CaSSIS' and granule_uid like '%sti%'")

In [30]:
cassis.iloc[0]

access_estsize                                                     112442
access_format                                       application/x-pds-zip
access_url              https://archives.esac.esa.int/psa/pdap/downloa...
creation_date                                  2021-03-08T14:04:30.978734
dataproduct_type                                                       ci
granule_gid                      urn:esa:psa:em16_tgo_cas:data_calibrated
granule_uid             urn:esa:psa:em16_tgo_cas:data_calibrated:cas_c...
instrument_host_name                                         ExoMars 2016
instrument_name                                                    CaSSIS
measurement_type                                                         
modification_date                              2021-03-08T14:04:30.978734
obs_id                  urn:esa:psa:em16_tgo_cas:data_calibrated:cas_c...
processing_level                                                        3
release_date                          

So you can always use this `access_url` to download the data product - the `download` module has functions to do just this.

# download

## `download_by_query`
This is the key function in this module - it accepts an ADQL query string, uses the `tap` module to run the query, downloads the referenced files (if they are public) and unzips them to the location of your choice.

In [31]:
download.download_by_query?

[0;31mSignature:[0m [0mdownload[0m[0;34m.[0m[0mdownload_by_query[0m[0;34m([0m[0mquery[0m[0;34m,[0m [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m [0munzip[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mtidy[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Runs a query against the PSA's EPN-TAP interface. Any products which match,
and are public (have a download URL) will be downloaded and the zips placed
into output_dir. If unzip=True they will be unzipped into output_dir and
if tidy=True the zips will be removed after use
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/download.py
[0;31mType:[0m      function


In [32]:
q = "select top 2 * from epn_core where instrument_name='OSIRIS' and granule_uid like '%.FIT%'"

In [33]:
filelist = download.download_by_query(q, output_dir='/tmp')

INFO 2021-03-08 18:47:36 (psa_utils.download): downloading product N20080814T023137516ID20F22.FIT
INFO 2021-03-08 18:47:39 (psa_utils.download): downloaded file Download-20210308184736.zip
INFO 2021-03-08 18:47:39 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT
INFO 2021-03-08 18:47:40 (psa_utils.download): downloaded file Download-20210308184739.zip


In [34]:
filelist

['/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080814T023137516ID20F22.LBL',
 '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080804T023124677ID20F22.LBL',
 '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080804T023124677ID20F22.FIT',
 '/tmp/inventory.txt',
 '/tmp/RO-A-OSINAC-2-AST1-STEINSFLYBY-V2.0/DATA/FIT/N20080814T023137516ID20F22.FIT']

Note that in most cases these settings are what you want to retrieve products. If you prefer to keep the zips after extract, set tidy=False - you will still get the list of individual files returned. If you just want to download the zips, set unzip=False and you will get the list of zips returned:

In [35]:
filelist = download.download_by_query(q, output_dir='/tmp', unzip=False)

INFO 2021-03-08 18:47:46 (psa_utils.download): downloading product N20080814T023137516ID20F22.FIT
INFO 2021-03-08 18:47:48 (psa_utils.download): downloaded file Download-20210308184746.zip
INFO 2021-03-08 18:47:48 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT
INFO 2021-03-08 18:47:49 (psa_utils.download): downloaded file Download-20210308184748.zip


In [36]:
filelist

['/tmp/Download-20210308184746.zip', '/tmp/Download-20210308184748.zip']

There is a convenience function to download by the logical identifer, or product ID:

In [37]:
download.download_by_lid?

[0;31mSignature:[0m [0mdownload[0m[0;34m.[0m[0mdownload_by_lid[0m[0;34m([0m[0mlid[0m[0;34m,[0m [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m [0munzip[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mtidy[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/download.py
[0;31mType:[0m      function


In [38]:
download.download_by_lid('N20080804T023124677ID20F22', output_dir='/tmp')

INFO 2021-03-08 18:47:54 (psa_utils.download): downloading product N20080804T023124677ID20F22.FIT
INFO 2021-03-08 18:47:55 (psa_utils.download): downloaded file Download-20210308184755.zip
INFO 2021-03-08 18:47:55 (psa_utils.download): downloading product N20080804T023124677ID20F22.IMG
INFO 2021-03-08 18:47:57 (psa_utils.download): downloaded file Download-20210308184755.zip


Note that all this really does is look for records with granule_uid matching the substring passed. So in this case the same data were archived as a PDS3 .IMG image and FITS file, so both were matched.

## download_labels_by_query

Sometimes you cannot narrow down a search using TAP or PDAP, but need to look into the custom meta-data in the products. In this case you do not want to download gigabytes of data, but only the labels. 

This is a bit of a "hack" using the PDAP `files` endpoint with EPN-TAP, but it works with the following caveats:
- large (especially PDS4) datasets currently time-out on PDAP - this is a known issue
- PDS3 products with attached labels are skipped
  - since downloading these would defeat the purpose of trying to retrieve small labels only

In [39]:
download.download_labels_by_query("select top 10 * from epn_core where instrument_name='MCAM'", output_dir='/tmp')

INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170632_01_f__a0100.xml
INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170738_02_f__t0005.xml
INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170748_03_f__t0020.xml
INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170758_04_f__t0040.xml
INFO 2021-03-08 18:47:59 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170808_05_f__t0080.xml
INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181020t170818_06_f__t0200.xml
INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181026t103952_00_f__t0020.xml
INFO 2021-03-08 18:48:00 (psa_utils.download): downloaded file cam_raw_sc_cam1_image_20181027t060956_39_f__t0020.xml
INFO 2021-03-08 18:48:01 (psa_utils.download): downloaded file c

# packager

This is a module specifically designed to package PDS4 products into a delivery package recognised by the PSA. It is probably not useful unless you are an external data provider preparing PDS4 products (not bundles), or an internal user!

In [40]:
packager.Packager?

[0;31mInit signature:[0m
[0mpackager[0m[0;34m.[0m[0mPackager[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mproducts[0m[0;34m=[0m[0;34m'*.xml'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrecursive[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0muse_dir[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mclean[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      <no docstring>
[0;31mInit docstring:[0m
Initialise the packager class. Accepts the following:

products - file pattern to match labels (*.xml default)
input_dir - the root directory for the labels (default=.)
use_dir - uses the product directory stru

Most inputs should be clear - you need to specify the location and wildcard to match the input files - note that a delivery can only be for a single bundle, so use these filters to ensure you select products belonging to a single bundle!

A template PDS4 label is needed for complete the product delivery label. A default template is needed, and bundled with the source code. But should you wish to specify a different template, this can be done with the template keyword.

The main additional option is to set `use_dir`. If true, the path from `input_dir` will be used to build the archive structure (e.g. the directory structure users will see when downloading the files). If false, then all files will be placed in the root of the collection.

# geogen

This module contains some routines to help working with the PSA geometry generator (geogen). These are likely only of interest to internal users.

## `generate_plf`

In [41]:
geogen.generate_plf?

[0;31mSignature:[0m
[0mgeogen[0m[0;34m.[0m[0mgenerate_plf[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mconfig_file[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfiles[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdirectory[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtable[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mextras[0m[0;34m=[0m[0;34m{[0m[0;34m}[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Generates a GEOGEN plf input file.

pds4_utils.Database() is used to scrape meta-data according to the config_file.
files= specifies the label file pattern (defaults to *.xml)
directory= specified the root of the input files (and the output location)
table= specifies the table name in case the input file is configured to
    produce more than one. Default (None) assumes only one table.
extras = a dictionary which provides extra static key/value 

Note that for this to work, `pds4_utils` must be installed. The `config_file` listed here is a `pds4_utils.Database` configuration file which is used to scrape relevant meta-data from a collection of PDS4 products and output to a json file in the format expected by geogen.

# internal

## `Ingest_Test`

This uses a configuration and template to effectively replicate a single PDS4 product according to the name, type, sub-instruments etc. in the configuration file

In [42]:
internal.Ingest_Test?

[0;31mInit signature:[0m
[0minternal[0m[0;34m.[0m[0mIngest_Test[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mconfig_file[0m[0;34m=[0m[0;34m'ingestion_test.yml'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate_label[0m[0;34m=[0m[0;34m'test_product.xml'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpackage[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
A class for generating test products from a label and data product
template and a configuration file specifying the instrument-specific
data
[0;31mFile:[0m           ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


## `build_context_json`

This routine builds a local_context_products.json file as used by the PDS validate. It is designed for PSA internal use.

It requires `pds4_utils`.

In [43]:
internal.build_context_json?

[0;31mSignature:[0m
[0minternal[0m[0;34m.[0m[0mbuild_context_json[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mconfig_file[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mjson_name[0m[0;34m=[0m[0;34m'local_context_products.json'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtable[0m[0;34m=[0m[0;34m'context_bundle'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Generates a json file listing the name, type and LIDVID of all
context files in input_dir. Generates a local context json file
which can be used by the PDS validate tool and writes it to
output_dir

pds4_utils.Database() is used to scrape meta-data according to the config_file.
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py
[0;31mType:[0m      function


## `collection_summary`

This routine generates an html table corresponding to specific meta-data scraped from instrument and mission context products. It is used to prototype information which is needed o populate Google Dataset Search and for a DOI landing page.

In [44]:
internal.collection_summary?

[0;31mSignature:[0m
[0minternal[0m[0;34m.[0m[0mcollection_summary[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mconfig_file[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcontext_dir[0m[0;34m=[0m[0;34m'.'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
collection_summary accesses meta-data in a collection label
or referenced from it, to produce a set of summary information
needed to register a DOI and/or create a Google Dataset
Search landing page.
[0;31mFile:[0m      ~/Dropbox/work/bepi/software/psa_utils/psa_utils/internal.py
[0;31mType:[0m      function
