GeoEco.DataProducts.NASA.Earthdata.CMRGranuleSearcher

class GeoEco.DataProducts.NASA.Earthdata.CMRGranuleSearcher(username, password, queryParams, linkTitleRegEx, queryableAttributes=None, timeout=60, maxRetryTime=300, cacheDirectory=None, metadataCacheLifetime=None)

Bases: DatasetCollection

A DatasetCollection that queries the NASA Earthdata Common Metadata Repository (CMR) for granules.

This is a base class that should not be instantiated directly. Instead, derive a new class from it. The derived class constructor should be sure to call the base class constructor. The derived class should also override two methods:

  • _GetQueryableAttributeValuesForUrl(self, url, title) - given a URL to a granule and its title, return a dictionary that maps queryable attribute names to their values. The values should be derived from the URL. In general, the title should only be used in log messages or for similar display purposes.

  • _ConstructFoundObjectForUrl(self, url, title, queryableAttributeValues) -

    given the URL to a granule and its title and dictionary of queryable attribute values, return a DatasetCollection instance that represents the granule. For example, if the granule is a netCDF file, return a NetCDFFile instance. The class you derived from CMRGranuleSearcher should be used as the parent collection for the instance. queryableAttributeValues should be also be used to initialize the given instance.

For an example of a derived class, see GHRSSTLevel4EarthdataGranules.

Requires: Python requests module, Python netCDF4 module.

Parameters:
  • username (str) – NASA Earthdata account user name. Minimum length꞉ 1.

  • password (str) – NASA Earthdata account password. Minimum length꞉ 1.

  • queryParams (dict mapping str to str) – Dictionary of HTTP query parameters, as defined by the NASA Earthdata Common Metadata Repository (CMR) API.

  • linkTitleRegEx (str) – Regular expression to check against link titles, to determine if the link is to the desired type of resource. For example, to access OPeNDAP datasets, use ^opendap request url$ when accessing OPeNDAP datasets. To access netCDF datasets, ^download.+\.nc$ will usually work. The regular expression is case-insensitive. Minimum length꞉ 1.

  • queryableAttributes (tuple of QueryableAttribute, optional) – Queryable attributes defined for this object.

  • timeout (int, optional) –

    Number of seconds to wait for the server to respond before failing with a timeout error.

    If you also provide a Maximum Retry Time and it is larger than the timeout value, the failed request will be retried automatically (with the same timeout value) until it succeeds or the Maximum Retry Time has elapsed.

    If you receive a timeout error you should investigate the server to determine if it is malfunctioning or just slow. Check the Earthdata website to see if NASA has posted a notice about the problem, or contact the NASA directly. If the server just slow, increase the timeout value to a larger number, to give the server more time to respond.

    Minimum value꞉ 1.

  • maxRetryTime (int, optional) –

    Number of seconds to retry requests to the server before giving up.

    Use this parameter to cope with transient failures. For example, you may find that the server is rebooted nightly during a maintenance cycle. If you start a long running operation and want it to run overnight without failing, set the maximum retry time to a duration that is longer than the time that the server is offline during the maintenance cycle.

    To maximize performance while minimizing load during failure situations, retries are scheduled with progressive delays:

    • The first retry is issued immediately.

    • Then, so long as fewer than 10 seconds have elapsed since the original request was issued, retries are issued every second.

    • After that, retries are issued every 30 seconds until the maximum retry time is reached or the request succeeds.

    Minimum value꞉ 1.

  • cacheDirectory (str, optional) –

    Directory for caching local copies of downloaded data. A cache directory is optional but highly recommended if you plan to repeatedly access data for the same range of dates.

    When data are requested, the cache directory will be checked for data that was downloaded and cached during prior requests. If cached data exists that can fulfill part of the current request, the request will be serviced by reading from cache files rather than downloading from the server. If the entire request can be serviced from the cache, the server will not be accessed at all and the request will be completed extremely quickly. Any parts of the request that cannot be serviced from the cache will be downloaded from the server and added to the cache, speeding up future requests for the same data.

    If you use a cache directory, be aware of these common pitfalls:

    • The caching algorithm permits the cache to grow to infinite size and never deletes any cached data. If you access a large amount of data (e.g. an entire 20 terabyte collection of satellite images) it will all be added to the cache. Be careful that you do not fill up your hard disk. To mitigate this, manually delete the entire cache or selected directories or files within it.

    • The caching algorithm stores data in uncompressed files, so that subsets of those files may be quickly accessed. To save space on your hard disk, you can enable compression of the cache directory using the operating system. On Windows, right click on the directory in Windows Explorer, select Properties, click Advanced, and enable “Compress contents to save disk space”.

    • The caching algorithm cannot detect when portions of a dataset have been replaced on the server, thereby making the cached data obsolete. Thus, if a data provider republishes a dataset with improved data values, the caching algorithm will continue to use the old, obsolete values. To mitigate this, you should monitor when data providers reprocess their datasets, and delete the cached files when they become obsolete.

    Minimum length꞉ 1.

  • metadataCacheLifetime (float, optional) –

    Maximum amount of time, in seconds, that granule metadata downloaded from the NASA Earthdata Common Metadata Repository (CMR) will be cached.

    Downloading metadata from the NASA Earthdata CMR can be slow. If this parameter and a cache directory are both provided, when the CMR is queried for all granule metadata for a given collection_concept_id, the downloaded metadata will be cached in the directory for this amount of time. During this period, the cached metadata will be accessed instead of the server, which can greatly speed up processing involving NASA Earthdata granules. However, if new datasets are stored in the CMR, they will not be discovered until the cached metadata has expired.

    If this parameter is not provided (the default), then granule metadata will not be cached.

    Minimum value꞉ 1.0.

Returns:

CMRGranuleSearcher instance.

Return type:

CMRGranuleSearcher

Properties

property CacheDirectory

(str or None) Directory for caching local copies of remote datasets. Minimum length꞉ 1. If a cache directory is not provided, then after a remote dataset is downloaded it will be kept either only in memory or in a temporary directory on disk, depending on the type of data it is. The temporary directory will be automatically deleted when Close() is called.

If a cache directory is provided, remote datasets will be stored in it when they are downloaded. Before a download is attempted, the cache directory will be checked first for the relevant dataset, and if it is found, the download will be skipped, speeding up execution.

The datasets are organized in the cache directory in an undocumented format that is specific to the collection. Once a dataset is stored in the cache directory, it is never changed or deleted. If the original remote datasets are changed, these changes will not be detected and the cache will not be updated. If the disk fills up, cached datasets will not be automatically deleted to mitigate the problem.

If you determine that the cached datasets are obsolete or the disk is too full, delete the entire cache directory. You may also be able to delete a portion of it, if you can reverse engineer how datasets are stored within it, but the organizational structure is not documented.

property DisplayName

(str) Informal name of this object, suitable to be displayed to the user. Read only. Minimum length꞉ 1.

property ParentCollection

(DatasetCollection or None) Parent DatasetCollection that this object is part of (if any). Read only.

Methods

Close

Closes any open files or connections associated with this object and releases any other resources allocated to access it.

DeleteLazyPropertyValue

Deletes the lazy property with the specified name.

GetAllQueryableAttributes

Returns a list of all queryable attributes.

GetLazyPropertyValue

Returns the value of the lazy property with the specified name.

GetNewestDataset

Queries the collection and returns the newest Dataset that matches the search expression.

GetOldestDataset

Queries the collection and returns the oldest Dataset that matches the search expression.

GetQueryableAttribute

Returns the queryable attribute with the specified name.

GetQueryableAttributeValue

Returns the value of the queryable attribute with the specified name.

GetQueryableAttributesWithDataType

Returns a list queryable attributes having the specified data type.

HasLazyPropertyValue

Returns True if the specified lazy property has a value.

ImportDatasets

Copies each Dataset in a list into this DatasetCollection.

QueryDatasets

Queries the collection and returns a list of Datasets that match a search expression.

SetLazyPropertyValue

Sets the lazy property with the specified name to the specified value.

TestCapability

Tests whether a capability is supported by this class or an instance of it.