radproc.raw.radolan_binaries_to_dataframe¶
-
radproc.raw.
radolan_binaries_to_dataframe
(inFolder, idArr=None)¶ Import all RADOLAN binary files in a directory into a pandas DataFrame, optionally clipping the data to the extent of an investigation area specified by an ID array.
Parameters: - inFolder : string
- Path to the directory containing RADOLAN binary files. All files ending with ‘-bin’ or ‘-bin.gz’ are read in. The input folder path does not need to have any particular directory structure.
- idArr : one-dimensional numpy array (optional, default: None)
- containing ID values to select RADOLAN data of the cells located in the investigation area. If no idArr is specified, the ID array is automatically generated from RADOLAN metadata and RADOLAN precipitation data are not clipped to any investigation area.
Returns: - (df, metadata) : tuple with two elements:
- df : pandas DataFrame containing…
- RADOLAN data of the cells located in the investigation area
- datetime row index with defined frequency depending on the RADOLAN product and time zone UTC
- ID values as column names
- metadata : dictionary
- containing metadata from the last imported RADOLAN binary file
In case any binary files could not be read in due to processing errors, these are skipped and the respective intervals are filled with NoData (NaN) values. A textfile with the names and error messages for the respective monthly input data folder is written for information. For example, errors due to obviously corrupted file formats are known for the RADOLAN RW dataset in July and August 2005 and May 2007.
Format description and examples: Every row of the output DataFrame equals a precipitation raster of the investigation area at the specific date. Every column equals a time series of the precipitation at a specific raster cell.
Data can be accessed and sliced with the following Syntax:
df.loc[row_index, column_name]
with row index as string in date format ‘YYYY-MM-dd hh:mm’ and column names as integer values
Examples:
>>> df.loc['2008-05-01 00:50',414773] #--> returns single float value of specified date and cell >>> df.loc['2008-05-01 00:50', :] #--> returns entire row (= raster) of specified date as one-dimensional DataFrame >>> df.loc['2008-05-01', :] #--> returns DataFrame with all rows of specified day (because time of day is omitted) >>> df.loc[, 414773] #--> returns time series of the specified cell as Series