Data Utilities

General utilities for common data file manipulations

Making Labels

Insert Realization Index

pygeostat.datautils.labels.insert_real_idx(data, num_real=0, bindex=True, real_column='Realization', bi_column='BlockIndex')

This will insert realization index columns. By default it will use the griddef associated with the file.

Parameters:
  • num_real (int) – If you do not have a griddef associated with the file you can tell it how many realizations there are
  • bindex (bool) – True or False for adding a block index
  • real_column (str) – Set the name of the column used for the Realizations Index
  • bi_column (str) – Set the name of the column used for the Block Index

Process

If there are already a “real_column” or “bi_column” columns it will overwrite the values in these columns If the “real_column” and “bi_column” columns aren’t in the dataframe it will insert these columns at the front.

Code author: Tyler Acorn - 2015-Sept-30

Make Labels

pygeostat.datautils.labels.make_labels(prefix, num, padding=0)

Returns a series of lables combining a prefix and a number with leading zeros

Parameters:
  • prefix (str) – any letter(s) that you want as the prefix (for example B for blockindex)
  • num (int) – The number of labels you want.
  • padding (int) – if given an integer value will pad the numbers with zeros until the prefix + the numbers equal the length of the padding value
Returns:

This will return a series with “n” number of labels starting from 1

Return type:

Series

Examples

Creating an array of labels

>>> label = gs.datautils.make_labels('R', 3, padding=3)
>>> label
>>> [R001, R002, R003]

Code author: Tyler Acorn 2015-09-21

Assorted Utility Functions

Check Len of Gridded Ascii File

pygeostat.datautils.utils.check_grid_file_size(gridfl, griddef, nreal=None)

Check the gridded data file to see if the number of lines in the file matches the number of cells specified in the griddef. Returns true if nreal * griddef.count() = nlines - (2 + nvar)

Relies on the GNU wc tool. Comes with cygwin. I think.

Parameters:
  • gridfl (str) – griddef file to check
  • griddef (GridDef) – standard pygeostat griddef
  • nreal (int) – optional number of realizations in the file
Returns:

True if there is a match, False if there is a mismatch

Round to Significant Figures

pygeostat.datautils.utils.round_sigfig(value, sigfigs)

Round a float or integer to a specified number of significant figures. Also handles effectively zero, infinity, and negative infinity values.

From: http://stackoverflow.com/questions/3410976/

Parameters:
  • value (int or float) – Value that requires rounding
  • sigfigs (int) – Number of significant figures to round the value to
Returns:

Rounded value

Return type:

new_value (int or float)

Example

>>> gs.round_sigfig(-0.00032161, 3)
>>> -0.00322

Code author: Warren Black - 2015-10-13

Get Collocated Data from Grid

pygeostat.datautils.utils.getcollocated(data, secdatfl, seccols=None, concat=True)

Retrieve gridded exhaustive secondary data at the collocated sample locations.

If concat is True, the secondary data will be added to the gs.DataFile passed and nothing will be returned.

Warning

The Fortran code as not be rigorously tested, use at your own risk.

Parameters:
  • datafl (gs.DataFile) – A gs.DataFile class that must contain the appropriate coordinate information (i.e., x, y, and z attributes). The gs.DataFile must also have it’s griddef parameter specified, pointing to a gs.GridDef class.
  • secdatfl ('str') – Location of the gridded secondary data
  • seccols (list) – List of the columns containing the secondary data to extract. Default is to extract all columns in the file
  • concat (bool) – Indicate if the secondary data should be concatenated onto the input gs.DataFile dataframe (i.e., data.data)
Returns:

Dataframe containg the secondary data. Its return is dependent on the value of concat

Return type:

secdat (pd.DataFrame)

Example

A simple call:

>>> data = gs.DataFile(data, griddef=griddef, x='x', y='y', z='z')
>>> secdatfl = '../secdat.dat'
>>> gs.getcolloccated(data, secdatfl)

Code author: Warren E. Black - 2016-02-15

Get File Header

pygeostat.datautils.utils.fileheader(datafl, mute=False)

Read a GSLIB file from python and return the header information. Useful for large files.

Code author: Warren E. Black - 2016-02-15

Convert Corrmat to GSLIB string for USGSIM

pygeostat.datautils.utils.corrmatstr(corrmat, fmt)

Converts a correlation matrix that is currently a numpy matrix or a pandas dataframe, into a space delimited string. Correlation matrix strings are required in the parameter files of CCG programs such as USGSIM and supersec.

Currently, this function is hard coded to return two formats as specified by the fmt argument, one for 'usgsim' and one for 'supersec'. 'usgsim' returns the full correlation matrix while 'supersec' returns only the upper triangle of the matrix, without the diagonal values.

Parameters:
  • corrmat – Correlation matrix as either a pandas dataframe (pd.DataFrame) or numpy matrix (np.ndarray).
  • fmt (str) – Indicate which format to return. Accepts only one of ['usgsim', 'supersec']
Returns:

Correlation matrix as a space delimited string.

Return type:

corrstr (str)

Code author: Warren E. Black - 2016-03-15

Get the 2D Slice of a 3D Grid

pygeostat.datautils.utils.slicegrid(data, griddef, orient, sliceno, slicethickness=None, nullv=None)

Slice a 3-D grid.

Parameters:
  • data – 1-D array or a tidy long-form dataframe with a single column containing the variable in question and each row is an observation
  • griddef (GridDef) – A pygeostat GridDef class created using gs.GridDef
  • orient (str) – Orientation to slice data. 'xy', 'xz', 'yz' are the only accepted values
  • sliceno (int) – Grid cell location along the axis not plotted to take the slice of data to plot
Returns:

1-D array of the sliced data

Return type:

view (np.ndarray)

Code author: Matthew Deutsch - 2014-04-19

Get a Slice of a 3D Point Dataset

pygeostat.datautils.utils.slicescatter(data, orient, sliceno, slicetol, griddef=None, x=None, y=None, z=None)

Slice scattered data based on a GSLIB style grid definition.

Parameters:
  • data (pd.DataFrame or gs.DataFile) – Dataframe where each column is a variable and each row is an observation. Must contain the coordinate columns required depending on the value of orient. If a gs.DataFile class is passed, its attribute griddef, x, y, and z will be extracted.
  • var (str) – Column header of variable under investigation
  • orient (str) – Orientation to slice data. 'xy', 'xz', 'yz' are the only accepted values
  • sliceno (int) – Grid cell location along the axis not plotted to take the slice of data to plot
  • slicetol (float) – Slice tolerance to plot point data (i.e. plot +/- slicetol from the center of the slice). Any negative value plots all data. Default is to plot all data.
  • griddef (GridDef) – A pygeostat GridDef class created using gs.GridDef. Required if the attribute cannot be retrieved from data if it is a gs.DataFile class.
  • x (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from data if it is a gs.DataFile class.
  • y (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from data if it is a gs.DataFile class.
  • z (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from data if it is a gs.DataFile class.
Returns:

pd.DataFrame of the sliced data

Return type:

pointview (pd.DataFrane)

Code author: Warren E. Black - 2016-04-11

Get Absolute Filepath

pygeostat.datautils.utils.fixpath(path)

Convert a file path to an absolute path if required and make sure there are only forward slashes.

If copying the path directly from windows explorer or something that will produce a path like that, make sure to indicate to python that the string is raw. This is done by placing a r in front of the string. For example:

>>> string = r"A string with backslashes \ \ \ \"

Example

Make sure to place an r in front of the string so funny things don’t happen. A simple call:

>>> gs.fixpath(r"D:\Data\data.dat")

Code author: Warren E. Black - 2016-02-07

Test if Data is Numeric

pygeostat.datautils.utils.is_numeric(s)

Returns true if a value can be converted to a floating point number

Ensure a Directory Exists

pygeostat.datautils.utils.ensure_dir(f)

Function to make sure that directory(s) exists and if not, create it

Ensure a Path-to-Directory Exists

pygeostat.datautils.utils.ensure_path(path)

Function ensures that all folders in a given path or list of paths are created if they do not exist

Get the Euclidean Distance to the Nearest Sample

pygeostat.datautils.utils.nearest_eucdist(x, y=None, z=None)

Calculate the euclidean distance to the nearest sample for each sample.

Parameters:

x (np.array) – Array of the coordinate in the x direction

Keyword Arguments:
 
  • y (np.array) – Array of the coordinate in the y direction
  • z (np.array) – Array of the coordinate in the z direction
Returns:

Array of the euclidean distance to the nearest sample for each sample

Return type:

dist (np.array)

Code author: Warren E. Black - 2016-07-28