Reference¶

class cds_downloader.Downloader(cds_product, cds_filter, **kwargs)¶

The Downloader class provides common functionality for automated climate data download from cds.climate.copernicus.eu

In order to use the downloader, one has to create a file with user credentials api-how-to. Alternatively, define two environment variables ‘CDSAPI_URL’ and ‘CDSAPI_KEY’ with the user credentials from cds.

classmethod from_cds(cds_product, cds_filter, **kwargs)¶

Create Downloader from cds example

Parameters:	cds_product (string) – the cds product string cds_filter (dict) – the cds filter dictionary

classmethod from_dict(dct_config)¶

Create Downloader from dictionary

Parameters:	dct_config (dict) – a dictionary with keys ‘cds_product’ and ‘cds_filter’

classmethod from_json(json_config_path)¶

Create Downloader from json file

Parameters:	json_config_path (string) – path to json config file

get_data(storage_path, split_keys=None, overwrite=False)¶

This method downloads requested data from climate data store.

Parameters:	storage_path (string) – target storage path as string split_keys (list-like, optional) – The maximum single data request size depends on the copernicus climate data store and is automatically extracted from their metadata webapi. If split_keys=None, the method automatically chunks the cds request into multiple smaller requests and spawns a single process for each of them. Therefore, it extracts all list-like objects from the cds_filter (e.g. “year, “month, …) and splits the data into single requests/files. By setting split_keys as a list of keys from the cds_filter, one can manually control the splitting (e.g. split_keys=[“year”, “month”, “day”]) overwrite (boolean) – Default is False, Set to true if you want to overwrite existing files. This implies new requests on the climate data store.
Returns:	processes – List of download process objects
Return type:	list of multiprocessing.Process

Examples

Download small data collection with manual split_keys

>>> from cds_downloader import Downloader
>>> x = Downloader.from_cds(
...         "reanalysis-era5-single-levels",
...         {
...             "product_type": "reanalysis",
...             "format": "grib",
...             "variable": ["total_precipitation"],
...             "year": ["2020"],
...             "month": ["09"],
...             "day": ["01", "02", "03"],
...             "area": [50.7, 3.6, 42.9, 17.2]
...         },
...     )
...
>>> x.get_data("/tmp", ["year","month","day"])

get_data_for_date(storage_path, eval_date=datetime.datetime(2022, 1, 24, 11, 16, 30, 118379), **kwargs)¶

This method uses temporal information from the webapi and downloads data for a specified date.

Parameters:	storage_path (string) – storage path of data collection as string eval_date (datetime.timedelta or str) – date of the data fields

get_latest_daily_data(storage_path, date_latency=None, **kwargs)¶

This method uses temporal information from the webapi and downloads only the latest day of the data. Hereby, one can define a latency in days with respect to the current datetime.

Parameters:	storage_path (string) – storage path of data collection as string date_latency (datetime.timedelta or str or int, optional) – Latency with respect to the current utc date and time. If integer is passed the latency is interpreted as days.

update_data(storage_path, split_keys, date_until=datetime.datetime(2022, 1, 24, 11, 16, 30, 118384), date_latency=None, start_from_files=False)¶

This method provides update functionality for climate data collections retrieved with cds_downloader.Downloader.get_data()

It uses temporal information from cds metadata webapi and evaluates missing data. Redownload latest file in order to avoid missing data.

Under development, only temporal split_keys allowed: split_keys in [“year”, “month”, “day”, “time”]

Parameters:

storage_path (string) – storage path of data collection as string
split_keys (list of strings) – list of keys in cds_filter
date_until (datetime.datetime, optional) – update data collection until this date
date_latency (datetime.timedelta or str, optional) – temporal latency in relation to date_until
start_from_files (boolean, optional) – use first file of sorted file list as start reference date