GeoEco.DataManagement.Files.File.FindAndCreateTable

classmethod File.FindAndCreateTable(directory, database, table, fileField, wildcard='*', searchTree=False, minSize=None, maxSize=None, minDateCreated=None, maxDateCreated=None, minDateModified=None, maxDateModified=None, relativePathField=None, basePath=None, sizeField=None, dateCreatedField=None, dateModifiedField=None, parsedDateField=None, dateParsingExpression=None, unixTimeField=None, pathFieldsDataType='string', sizeFieldDataType='float64', dateFieldsDataType='datetime', unixTimeFieldDataType='int32', maxPathLength=None, overwriteExisting=False)

Finds files within a directory and creates a table that lists them.

On Windows, this function makes no distinction between hidden and visible directories. Hidden directories are traversed and handled just like visible directories.

Files are returned in an arbitrary order determined by the operating system and the search algorithm.

Parameters:
  • directory (str) – Directory to search. Minimum length꞉ 1. Must exist.

  • database (Database) – Database that will receive the new table.

  • table (str) – Name of the table to create. The table must not exist. Minimum length꞉ 1.

  • fileField (str) – Name of the field to receive absolute paths to the files that were found. Minimum length꞉ 1.

  • wildcard (str, optional) –

    UNIX-style “glob” wildcard expression specifying the pathnames to find.

    The glob syntax supports the following patterns:

    • ? - matches any single character

    • * - matches zero or more characters

    • [seq] - matches any single character in seq

    • [!seq] - matches any single character not in seq

    seq is one or more characters, such as abc. You may specify character ranges using a dash. For example, a-z0-9 specifies all of the characters in the English alphabet and the decimal digits 0 through 9.

    You may specify subdirectories in the glob expression. For example, the expression cruise*/sst* will find all paths beginning with sst that are contained in directories beginning with cruise.

    The operating system determines whether / or \ is used as the directory separator. On Windows, both will work. On Linux, / must be used.

    The operating system determines if matching is case sensitive. On Windows, matching is case-insensitive. On Linux, matching is case-sensitive.

    Minimum length꞉ 1.

  • searchTree (bool, optional) – If True, subdirectories will be searched.

  • minSize (int, optional) – Minimum size, in bytes, of files to find. If provided, only files that are this size or larger will be found. Minimum value꞉ 0.

  • maxSize (int, optional) – Maximum size, in bytes, of files to find. If provided, only files that are this size or smaller will be found. Minimum value꞉ 0.

  • minDateCreated (datetime, optional) – Minimum creation date, in the local time zone, of the files to find, as reported by the operating system. If provided, only files that were created on or after this date will be found. You may provide a date with or without a time. If you do not provide a time, it is assumed to be midnight.

  • maxDateCreated (datetime, optional) – Maximum creation date, in the local time zone, of the files to find, as reported by the operating system. If provided, only files that were created on or before this date will be found. You may provide a date with or without a time. If you do not provide a time, it is assumed to be midnight.

  • minDateModified (datetime, optional) – Minimum modification date, in the local time zone, of the files to find, as reported by the operating system. If provided, only files that were modified on or after this date will be found. You may provide a date with or without a time. If you do not provide a time, it is assumed to be midnight.

  • maxDateModified (datetime, optional) – Maximum modification date, in the local time zone, of the files to find, as reported by the operating system. If provided, only files that were modified on or before this date will be found. You may provide a date with or without a time. If you do not provide a time, it is assumed to be midnight.

  • relativePathField (str, optional) –

    Name of the field to receive paths of the files that were found, relative to basePath. For example, if basePath was:

    C:\Data\Files
    

    the relative paths for the files:

    C:\Data\Files\Group1\f1
    C:\Data\Files\f1
    C:\Data\f1
    C:\f1
    D:\f1
    \\MyServer\Data\f1
    

    would be:

    Group1\f1
    f1
    ..\f1
    ..\..\f1
    D:\f1
    \\MyServer\Data\f1
    

    Minimum length꞉ 1.

  • basePath (str, optional) – Absolute path from which relative paths will be calculated and stored in the relativePathField. Please see the documentation for that field for more information. Minimum length꞉ 1.

  • sizeField (str, optional) – Name of the field to receive the sizes of the files that were found. Minimum length꞉ 1.

  • dateCreatedField (str, optional) – Name of the field to receive the creation dates of the files that were found. Minimum length꞉ 1.

  • dateModifiedField (str, optional) – Name of the field to receive the modification dates of the files that were found. Minimum length꞉ 1.

  • parsedDateField (str, optional) – Name of the field to receive dates parsed from the paths of the files that were found. You must also specify a date parsing expression. Minimum length꞉ 1.

  • dateParsingExpression (str, optional) –

    Expression for parsing dates from the paths of the files that were found. The expression will be ignored if you do not also specify a field to receive the dates or the equivalent UNIX time.

    The expression is a standard Python Regular Expression Syntax with additional codes for matching fragments of dates:

    %d - Day of the month as a decimal number (range: 01 to 31)

    %H - Hour (24-hour clock) as a decimal number (range: 00 to 23)

    %j - Day of the year as a decimal number (range: 001 to 366)

    %m - Month as a decimal number (range: 01 to 12)

    %M - Minute as a decimal number (range: 00 to 59)

    %S - Second as a decimal number (range: 00 to 61)

    %y - Year without century as a decimal number (range: 00 to 99)

    %Y - Year with century as a decimal number (range: 0001 to 9999)

    %% - A literal % character

    A date is parsed from a path as follows:

    1. The date fragment codes in your expression are replaced by regular expression groups to produce a true regular expression. For example, if your expression is %Y_%m_%d, it is converted to the regular expression (\d\d\d\d)_(\d\d)_(\d\d).

    2. re.search() is invoked to find the first occurrence of the regular expression in the path. The search proceeds from left to right.

    3. If an occurrence is found, the regular expression groups are extracted and time.strptime() is invoked to parse a date from the groups.

    Notes:

    • Your expression must include at least one date fragment code, but it need not include all of them. If a particular code is missing, the following default values will be used: year 1900, month 01, day 01, hour 00, minute 00, second 00.

    • You cannot specify a given date fragment code more than once.

    • You cannot specify date fragment codes that might conflict. For example, you cannot specify both %j and %d because this could result in conflicting values for the day.

    • For %y, values 00 to 68 are interpreted as years 2000 through 2068, while 69 through 99 are interpreted as years 1969 through 1999.

    • Remember that the entire path is searched for your expression, from left to right. The first occurrence of it may be in the parent directories.

    • The date fragment codes are case-sensitive.

    • If the underlying storage format can hold the time as well as the date in a single field, the time will be stored along with the date. If the table cannot hold the time and date in a single field, then only the date will be stored. This is the case, for example, with dBASE III and IV tables (.dbf files), often used by ArcGIS.

    • The timezone of the parsed date is assumed to be UTC.

    Examples:

    The expression:

    %Y%j
    

    will parse dates from many popular oceanographic satellite data products, such as:

    A2007006.L3b_DAY.main.bz2           MODIS Aqua from NASA OceanColor
    S1997247.L3b_DAY.main.bz2           SeaWiFS from NASA OceanColor
    1990182.s04d1pfv50-sst-16b.hdf      AVHRR Pathfinder version 5.0 SST from NOAA NODC
    QS_XWGRD3_2003033.20070991747.gz    QuikSCAT winds from NASA JPL PO.DAAC
    

    The expression:

    %Y_%j_%H
    

    will parse dates from the hourly and 3-hour GOES SST products offered by NASA JPL PO.DAAC:

    sst1_2005_033_17.gz                 An hourly GOES SST file
    sst3_2005_033_06.gz                 A 3-hour GOES SST file
    

    The expression:

    %Y_%j_%H%M
    

    will parse dates from the CoastWatch AVHRR SST product offered in HDF format by NOAA CLASS (the CW_REGION product). Note that this product includes the hour and minute of the satellite pass:

    2007_207_2214_n15_sr.hdf            A CoastWatch AVHRR file
    

    Minimum length꞉ 1.

  • unixTimeField (str, optional) –

    Name of the field to receive dates, in “UNIX time” format, parsed from the paths of the files that were found. You must also specify a date parsing expression.

    UNIX times are 32-bit signed integers that are the number of seconds since 1970-01-01 00:00:00 UTC. This tool assumes the date that was parsed is in the UTC timezone. The UNIX time values produced by this tool do not include leap seconds; this tool assumes that a regular year is 31536000 seconds and a leap year is 31622400 seconds.

    Minimum length꞉ 1.

  • pathFieldsDataType (str, optional) – Data type to use when creating the file path fields. This should be string unless you have a specific reason to choose something else. Minimum length꞉ 1.

  • sizeFieldDataType (str, optional) – Data type to use when creating the file size fields. This should be a numeric type that supports large numbers, such as float64 or int64. Minimum length꞉ 1.

  • dateFieldsDataType (str, optional) – Data type to use when creating the file creation date, file modification date, and parsed date fields. This should be datetime if the underlying storage format supports dates with times, or date if only dates are supported. Minimum length꞉ 1.

  • unixTimeFieldDataType (str, optional) – Data type to use when creating the UNIX date field. Because UNIX dates are 32-bit signed integers, this should be int32 or int64. Minimum length꞉ 1.

  • maxPathLength (int, optional) – Maximum length of a path for this operating system. This value is used to specify the width of the field that is created. You should provide a value only if the underlying database requires that you specify a width for string fields. If you provide a value that is too small to hold one of the paths that is found, this function will fail when it finds that path. Minimum value꞉ 1.

  • overwriteExisting (bool, optional) – If True, the output table will be overwritten, if it exists. If False, a ValueError will be raised if the output table exists.

Returns:

Name of the table that was created.

Return type:

str