Recent Pygeostat Changes

DataFile Changes

Notebook

Demo of the DataFile Functionality

The following notebook demos new:

  1. New DataFile functionality as it relates to regular gridded data
  2. New DataFile functionality as it relates to irregular point data
  3. A modification to infergriddef, which is a class function of DataFile
  4. Extensions of pandas DataFrame functionality to the DataFile class
  5. A new data spacing calculation function
  6. Assignment of NaN's based on DataFile.null on write output
In [1]:
import pygeostat as gs
import matplotlib.pyplot as plt
import copy
% matplotlib inline
gs.gsParams['plotting.locmap.s'] = 3

1. DataFile with Gridded Data

The gsParams will be used for setting a default griddef

In [2]:
gs.gsParams['data.griddef'] = gs.GridDef(gridfl='../data/griddef.txt')
print(gs.gsParams['data.griddef'])
229 1.0 2.0 
155 1.0 2.0 
1 0.5 1.0
In [3]:
dat = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)

The griddef default matches the length of dat, so it is assigned

In [4]:
print(dat.griddef)
229 1.0 2.0 
155 1.0 2.0 
1 0.5 1.0

The DataFile also initializes as dftype='grid' since the griddef is assigned

In [5]:
print(dat.dftype)
grid

In the absence of other kwargs, the variables attribute is all non-specialized columns

Not very interesting in this case since variables are equal to dat.data.columns, but this is more useful with the irregular point dat that follows. Note that pixelplt no longer requires a griddef kwarg (seperate update).

In [6]:
fig, axes = gs.subplots(2, 2, cbar_mode='single', 
                        axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
    gs.pixelplt(dat, var=var, ax=ax, vlim=(0, .5),
                cbar_label='standardized units')

2. DataFile with Irregular Data

This data file does not match the length of the gsParams griddef, so it is not associated with that grid definition by default.

In [7]:
dat = gs.DataFile(flname='../data/data.dat')
print(dat.griddef)
None

The variables attribute is the non-specialized columns

In [8]:
print('dat.x, dat.y, dat.z:', dat.x, dat.y, dat.z)
print('dat.variables:', dat.variables)
dat.x, dat.y, dat.z: X Y None
dat.variables: ['Au', 'Sulfides', 'Carbon', 'Organic Carbon', 'Keyout']

These can be further reduced with various kwargs

In [9]:
# A list of notvariables may be provided, leading to their exclusion from variables
dat = gs.DataFile(flname='../data/data.dat', notvariables='Keyout')
print('dat.variables:', dat.variables)
dat.variables: ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']
In [10]:
# A list of variables may be provided, leading to their isolated selection
dat = gs.DataFile(flname='../data/data.dat',
                  notvariables=['Au', 'Sulfides'])
print('dat.variables:', dat.variables)
dat.variables: ['Carbon', 'Organic Carbon', 'Keyout']

Note that a cat attribute has also been added.

This is intended for use as the categorical modeling variable (e.g., rocktype), which is generally singular. Keyout is a quasi-rocktype variable that often faciliates simulation with usgsim. Specifying it on initialization leads to its exclusion from variables.

In [11]:
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')
print('dat.variables = ', dat.variables)
print('dat.cat = ', dat.cat)
dat.variables =  ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']
dat.cat =  Keyout

The addition of a variables attribute should allow for future wrapping convenience

For example, providing a data file object to an nscore routine, which then assumes that all variables should be normal scored in the absence of kwargs that say otherwise.

For now, it provides marginal convenience as the initialization and storage of information. Consider that no variable list needs to be initialized for plotting maps of each variable.

In [12]:
fig, axes = gs.subplots(2, 2, cbar_mode='single', 
                        axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
    gs.locmap(dat, var=var, ax=ax, vlim=(0, .5), 
              cbar_label='standardized units')

Also consider added convenience of the list for functions such as DataFile.gscol

As well as the utility of the nvar that is calculated based on the number of variables

In [13]:
print('columns of the variables in the data file = ', dat.gscol(dat.variables))
print('number of variables = ', dat.nvar)
columns of the variables in the data file =  3 4 5 6
number of variables =  4

Similarly, an xyz attribute has been added to the DataFile object, which reports the x, y and z attributes as a list

Variables and xyz columns are frequently used within a list for iterating loops, parameter specifications, etc. They're presence as a data.variables and data.xyz simplifies things.

In [14]:
print('dat.xyz is a list:', dat.xyz)
print('this simplifies gscols, among other things:', dat.gscol(dat.xyz))
dat.xyz is a list: ['X', 'Y']
this simplifies gscols, among other things: 1 2

3. Infer Grid Definition

New functionality allows for the block sizes to be specified, before inferring the required number of blocks from the data extents

In [15]:
griddef = dat.infergriddef(blksize=(2, 2, None), databuffer=1.5)
print('A grid definition is output as a variable\n', griddef)
print('Though it is also added as an attribute of the data\n', dat.griddef)
A grid definition is output as a variable
 230 0.0 2.0 
157 0.0 2.0 
1 0.5 1.0
Though it is also added as an attribute of the data
 230 0.0 2.0 
157 0.0 2.0 
1 0.5 1.0

The old functionality remains

Specify the number of blocks before inferring the block size.

In [16]:
griddef = dat.infergriddef(nblk=(115, 78, None), databuffer=1.5)
print('A grid definition is output as a variable\n', griddef)
print('Though it is also added as an attribute of the data\n', dat.griddef)
A grid definition is output as a variable
 115 1.0 4.0 
78 1.005 4.01 
1 0.5 1.0
Though it is also added as an attribute of the data
 115 1.0 4.0 
78 1.005 4.01 
1 0.5 1.0

4. Extension of the Pandas DataFrame Functionality to the DataFile Class

Previously, creating/setting a new column would require the following syntax

Here, the underlying pandas DataFrame is accessed directly

In [17]:
dat.data['new column'] = 0

Now, you can simply use the same notation on the DataFile object

This provides the setitem function of the DataFile.

In [18]:
dat['new column 2'] = 0

Similarly, the getitem functionality of a pandas DataFrame is extended to DataFile

In [19]:
dat['new column 2'].head()
Out[19]:
0    0
1    0
2    0
3    0
4    0
Name: new column 2, dtype: int64
In [20]:
list(dat.columns)
Out[20]:
['X',
 'Y',
 'Au',
 'Sulfides',
 'Carbon',
 'Organic Carbon',
 'Keyout',
 'new column',
 'new column 2']
In [21]:
## Clean for the next section
dat.drop(['new column', 'new column 2', 'Keyout'])
In [22]:
cdat = copy.deepcopy(dat)
In [23]:
dat = copy.deepcopy(cdat)

The pandas DataFrame columns functionality has also been extended to DataFile, including get and set

Unlike the Pandas columns, special attributes such as data.x, data.variables, etc. are updated if the previously set name is changed.

In [24]:
% load_ext autoreload
% autoreload 2
In [25]:
print('the columns:', list(dat.columns), '\n')
print('the x and variables attributes:', dat.x, dat.variables, '\n')
columns = copy.deepcopy(dat.columns.values)
columns[[0, 2]] = 'Easting', 'Gold'
dat.columns = columns
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
the columns: ['X', 'Y', 'Au', 'Sulfides', 'Carbon', 'Organic Carbon'] 

the x and variables attributes: X ['Au', 'Sulfides', 'Carbon', 'Organic Carbon'] 

the columns after altering: ['Easting', 'Y', 'Gold', 'Sulfides', 'Carbon', 'Organic Carbon'] 

the x and variables attributes after altering: Easting ['Gold', 'Sulfides', 'Carbon', 'Organic Carbon']

The Pandas rename functionality is applied similarly

Note that the x and variables attributes are adjusted back

In [26]:
dat.rename({'Easting': 'X', 'Gold': 'Au'})
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
the columns after altering: ['X', 'Y', 'Au', 'Sulfides', 'Carbon', 'Organic Carbon'] 

the x and variables attributes after altering: X ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']

As is the drop functionality

In [27]:
dat.drop(['X', 'Organic Carbon'])
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
the columns after altering: ['Y', 'Au', 'Sulfides', 'Carbon'] 

the x and variables attributes after altering: None ['Au', 'Sulfides', 'Carbon']

DataFrame.shape is now extended

In [28]:
dat.shape
Out[28]:
(2268, 4)
In [29]:
# Reset
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')

5. A 2-D Data Spacing Function Allows its for its Fast Calculation

Data spacing is calculated at the data location only, and refers (for now) to the distance in the x/y plane (not downhole or 3-D spacing).

kwargs can override, but otherwise the function is based on DataFile properties (dh, x and y). If a dh is present, then the data spacing is based on the distance to the average x/y location of each drill hole.

The calculation is based on the average distance to the n_nearest locations. It is vector-based and very fast, but memory intensive. A compiled version will likely be required for extending this calculation to 3-D or larger data sets.

The output is concatenated as another array in the DataFile, unless the kwarg inplace is set to False.

Data spacing using the DataFile attributes (no kwargs here)

Here, the output dspace array is the average distance to the nearest 8 data.

In [30]:
dat.spacing(8)

This is useful for plotting as a distribution

Informs declustering cell size, variogram lag distance, etc.

In [31]:
gs.histplt(dat['Data Spacing (m)'], icdf=True, stat_blk='all')
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d5874e8a58>

This may also be useful for plotting in map view

E.g., for determining modeling domains/strategies

In [32]:
gs.locmap(dat, var='Data Spacing (m)', vlim=(6, 10))
Out[32]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x1d587605320>

You can also perform the calculation on specific variables

Where the calculation ignores records that are NaN for that variable

In [33]:
fig, axes = gs.subplots(2, 2, cbar_mode='single', 
                        axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
    dat.spacing(8, var=var)
    gs.locmap(dat, var=var+' Data Spacing (m)', 
              ax=ax, vlim=(6, 10))
In [34]:
fig, axes = gs.subplots(2, 2, label_mode='all', aspect=False,
                        axes_pad=(0.8, 0.8), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
    gs.histplt(dat[var+' Data Spacing (m)'], ax=ax,
               icdf=True, stat_blk='all')

6. NaN's are Assigned a Valid GSLIB Null Value on Output

May be specified in the function call, otherwise based on DataFile.null and gsParams['data.null'] in that order of priority.

In [35]:
# Inspect to see the -99's in place of NaN's
print(dat.null)
dat.writefile('test.dat')
-99.0
In [36]:
gs.rmfile('test.dat')

GridDef Changes

Notebook

Demo of Updates to GridDef

The following notebook demos new:

  1. Deprecation of old functions
  2. index3d_to_index1d, which replaces and improves Indices_to_Index
  3. index1d_to_index3d, which replaces and improves Index_to_Indices
  4. coord_to_index1d, which replaces and improves gridIndexCoords
  5. coord_to_index3d, which replaces and improves gridIndicesCoords
  6. gridcoord, which replaces the semi-redundant gridcoords and gengridpoints
In [1]:
import pygeostat as gs
import matplotlib.pyplot as plt
import numpy as np
% matplotlib inline
In [2]:
griddef = gs.GridDef(gridfl='../data/griddef.txt')
griddef
Out[2]:
Pygeostat GridDef:
229 1.0 2.0 
155 1.0 2.0 
1 0.5 1.0

1. Deprecation of old naming conventions

All class functions of GridDef are now lower case, aligning with the Python standard. Based on discussion with contributors, more intuitive naming conventions are now used as well. Merging of potentially redundant functions has been initiated, though more work is required in this regard.

The current function name returns no warning

In [3]:
vol = griddef.blockvolume()
print('block volume = {}'.format(vol))
block volume = 4.0

2. index3d_to_index1d

Highlights include:

  • index3d_to_index1d replaces Indices_to_Index, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
  • The previous functionality, operating on a single set of indices may still be used as well
  • The old function name is deprecated, pointing to the current function with a warning (now removed!)
In [4]:
ix, iy, iz = 2, 4, 0

Execution with a single 3-D index

In [5]:
idx, ingrid = griddef.index3d_to_index1d(ix, iy, iz)
print('idx={}, ingrid={}'.format(idx, ingrid))
idx=918, ingrid=True

Execution with arrays of 3-D indices

In [6]:
ix, iy, iz = np.arange(0, 5), np.arange(0, 5), np.zeros(5)
idx, ingrid = griddef.index3d_to_index1d(ix, iy, iz)
print('idx={}, ingrid={}'.format(idx, ingrid))
idx=[  0 230 460 690 920], ingrid=[ True  True  True  True  True]

3. index1d_to_index3d

Highlights include:

  • index1d_to_index3d replaces Index_to_Indices, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
  • The previous functionality, operating on a single index may still be used as well
  • The old function name is deprecated, pointing to the current function with a warning (now removed!)
In [7]:
idx = 918
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
ix=[0 1 2 3 4], iy=[0 1 2 3 4], iz=[ 0.  0.  0.  0.  0.], ingrid=[ True  True  True  True  True]

Execution with a single 1-D index

In [8]:
ix, iy, iz, ingrid = griddef.index1d_to_index3d(idx)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
ix=2, iy=4, iz=0, ingrid=True

Execution with an array of 1-D indices

In [9]:
idx = np.array([0, 230, 460, 690, 920])
ix, iy, iz, ingrid = griddef.index1d_to_index3d(idx)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
ix=[0 1 2 3 4], iy=[0 1 2 3 4], iz=[0 0 0 0 0], ingrid=[ True  True  True  True  True]

4. coord_to_index1d

Highlights include:

  • coord_to_index1d replaces gridIndexCoords, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
  • The previous functionality, operating on a single coordinate may still be used as well
  • The old function name is deprecated, pointing to the current function with a warning (now removed!)

Execution with a single coordinate

In [10]:
x, y, z = 15, 30, .5
idx, ingrid = griddef.coord_to_index1d(x, y, z)
print('idx={}, ingrid={}'.format(idx, ingrid))
idx=3213, ingrid=True

Execution with arrays of coordinates

In [11]:
x, y = np.linspace(30.5, 100.5, 5), np.linspace(30.5, 100.5, 5)
z = np.zeros(x.shape)
idx, ingrid = griddef.coord_to_index1d(x, y, z)
print('idx={}, ingrid={}'.format(idx, ingrid))
idx=[ 3450  5290  7360  9430 11500], ingrid=[ True  True  True  True  True]

5. coord_to_index3d

Highlights include:

  • coord_to_index3d replaces gridIndicesCoords, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
  • The previous functionality, operating on a single coordinate may still be used as well
  • The old function name is deprecated, pointing to the current function with a warning (now removed!)

Execution with a single coordinate

In [12]:
ix, iy, iz, ingrid = griddef.coord_to_index3d(x, y, z)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
ix=[15 23 32 41 50], iy=[15 23 32 41 50], iz=[0 0 0 0 0], ingrid=[ True  True  True  True  True]

Execution with arrays of coordinates

In [13]:
x, y = np.linspace(30.5, 100.5, 5), np.linspace(30.5, 100.5, 5)
z = np.zeros(x.shape)
ix, iy, iz, ingrid = griddef.coord_to_index3d(x, y, z)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
ix=[15 23 32 41 50], iy=[15 23 32 41 50], iz=[0 0 0 0 0], ingrid=[ True  True  True  True  True]

6. gridcoord

Highlights include:

  • gridcoord replaces gridcoords and gengridpoints, which previously provided the single execution (coordinates of one grid node) and global execution (coordinates of all grid nodes) respectively
  • The old function names are deprecated, pointing to the current function in the appropriate manner with a warning (now removed!)

gridcoord in the context of gridcoords

In [14]:
x, y, z = griddef.gridcoord(ix, iy, iz)
print('x={}, y={}, z={}'.format(x, y, z))
x=[  31.   47.   65.   83.  101.], y=[  31.   47.   65.   83.  101.], z=[ 0.5  0.5  0.5  0.5  0.5]

Deprecated gengridpoints

The old gengridpoints outputs a single array where each column corresponds with x, y and z. gridcoord is instead consistent with the execution above, outputing 3 seperate arrays.

gridcoord in the context of gengridpoints

In [15]:
x, y, z = griddef.gridcoord()
print('x={}, y={}, z={}'.format(x[:5], y[:5], z[:5]))
x=[ 1.  3.  5.  7.  9.], y=[ 1.  1.  1.  1.  1.], z=[ 0.5  0.5  0.5  0.5  0.5]

GsParams Addition

Notebook

Demo of the gsParams Functionality

The following notebook demos how the gsParams object:

  1. May be inspected, described and used for setting defaults
  2. May be used for modifying default plotting label behaviour
  3. May be used for modifying default plotting style behaviour
  4. May be used for modifying default grid-related behaviour
  5. May be used for modifying default data-related behaviour
  6. May have its user settings saved/loaded within each notebook instance, providing consistency and convenience
In [1]:
import pygeostat as gs
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
In [2]:
dat = gs.DataFile(flname='../data/data.dat')

1. Introducing the gsParams Object

Stores project defaults, excluding matplotlib parameters that are handled by the set_style functionality or with native matplotlib functionality. This object mirrors the matplotlib rcParams object, which is a dictionary object that validates inputs and is queried by pygeostat plotting functions. Use of keyword arguments in individual functions override the object.

As a dictionary object, all of the standard functionality applies, such as printing the keys and values

In [3]:
print(gs.gsParams)
config.autoload.gsparams: False
config.autoload.gsplotstyle: False
config.getpar: False
config.nprocess: 4
config.verbose: True
data.cat: None
data.catdict: None
data.dh: None
data.griddef: None
data.ifrom: None
data.ito: None
data.nreal: None
data.null: -99.0
data.tmin: -98.0
data.write_vtk.cdtype: float64
data.write_vtk.vdtype: float64
data.wts: None
data.x: None
data.y: None
data.z: None
plotting.assumecat: 11
plotting.axis_xy: False
plotting.axis_xy_spatial: False
plotting.cmap: viridis
plotting.cmap_cat: tab20
plotting.gammasize: 3.0
plotting.grid: False
plotting.histplt.cdfcolor: .5
plotting.histplt.edgecolor: k
plotting.histplt.facecolor: .9
plotting.histplt.histbins: 15
plotting.histplt.stat_blk: all
plotting.histplt.stat_xy: [0.95, 0.95]
plotting.histplt.stat_xy_cdf: [0.95, 0.05]
plotting.histpltsim.alpha: 0.5
plotting.histpltsim.refclr: C3
plotting.histpltsim.simclr: 0.2
plotting.lagname: Lag Distance
plotting.locmap.c: .4
plotting.locmap.s: None
plotting.rotateticks: [0, 0]
plotting.roundstats: True
plotting.scatplt.alpha: 1.0
plotting.scatplt.c: kde
plotting.scatplt.cmap: viridis
plotting.scatplt.s: None
plotting.scatplt.stat_blk: pearson
plotting.scatplt.stat_xy: [0.95, 0.05]
plotting.sigfigs: 2
plotting.stat_fontsize: None
plotting.stat_ha: right
plotting.unit: m
plotting.varplt.color: .5
plotting.varplt.ms: 2.0
plotting.varpltsim.alpha: 0.5
plotting.varpltsim.refclr: C3
plotting.varpltsim.simclr: 0.2
plotting.vplot.colors: ['C0', 'C1', 'C2']
plotting.xabbrev: E
plotting.xname: Easting
plotting.yabbrev: N
plotting.yname: Northing
plotting.zabbrev: Elev
plotting.zname: Elevation

Additionally, a description function may be used for an explicit description of individual cells

In [4]:
gs.gsParams.describe('data.griddef')
data.griddef:
When initializing a DataFile, this will be used as DataFile.GridDef if GridDef.count() matches DataFile.shape[0]. A pygeostat.GridDef object or valid gridstr/gridarr may be used for intitialization.

Using that function without a key leads to printing of the entire dictionary

In [5]:
gs.gsParams.describe()
config.autoload.gsparams:
If `True` the `gsParams` configurations found in
``%USER%/.gsParams`` are parsed and loaded when pygeostat loads

config.autoload.gsplotstyle:
If `True` the `gsPlotStyle` configurations found in
``%USER%/.gsParams`` are parsed and loaded when pygeostat loads

config.getpar:
If True, getpar=True when calling pygeostat.Program() with getpar=None

config.nprocess:
The number of parallel processes to run functions that provide that functionality

config.verbose:
If `true`, mentions when the `gsParams` are updated automatically when pygeostat is loaded. Useful for interactive sessions

data.cat:
When initializing a DataFile, this str will be used as DataFile.cat if it is in DataFile.columns, e.g., Facies

data.catdict:
When initializing a DataFile, this dictionary will be used as DataFile.catdict (if catdict.keys() matches DataFile.cat codes). This dictionary is formaatted as {catcodes: catnames}.

data.dh:
When initializing a DataFile, this str will be used as DataFile.dh if it is in DataFile.columns, e.g., WellID

data.griddef:
When initializing a DataFile, this will be used as DataFile.GridDef if GridDef.count() matches DataFile.shape[0]. A pygeostat.GridDef object or valid gridstr/gridarr may be used for intitialization.

data.ifrom:
When initializing a DataFile, this str will be used as DataFile.ifrom if it is in DataFile.columns, e.g., From

data.ito:
When initializing a DataFile, this str will be used as DataFile.ito if it is in DataFile.columns, e.g., To

data.nreal:
The number of realizations for modeling (once implemented) and model checking (Variogram.varsim, histpltsim, etc.) routines.

data.null:
NaN values are assigned this number (float or int) by io write functions, e.g., -99, -999, or -999.0. Using None/False disables the functionality, but will likely lead to issues in the case of subsequent GSLIB operations.

data.tmin:
Values less than this number (float or int) are assigned NaN on import by all io read functions, e.g., -98 or -998. Using -1.0e21 or None disables this functionality.

data.write_vtk.cdtype:
Precision of coordinates when writing to VTK. Must be a valid numpy specifier, such as "float32", "float64", etc.

data.write_vtk.vdtype:
Precision of variables when writing to VTK. Must be a valid numpy specifier, such as "float32", "float64", "int8", etc.

data.wts:
When initializing a DataFile, this str or list will be used as DataFile.wts if wts is in DataFile.columns, e.g., DeclusteringWeight

data.x:
When initializing a DataFile, this st will be used as DataFile.x if it is in DataFile.columns, e.g., Easting

data.y:
When initializing a DataFile, this str will be used as DataFile.y if it is in DataFile.columns, e.g., Northing

data.z:
When initializing a DataFile, this str will be used as DataFile.z if it is in DataFile.columns, e.g., Elevation

plotting.assumecat:
When executing plotting functions, data is assumed to be categorical for the purposes of selecting colormaps, if less than this number of unique values are found

plotting.axis_xy:
Converts plotting axes to GSLIB-style visibility (only left and bottom visible) if axis_xy is True. See plotting.axis_xy_spatial for the default setting of functions such as pixelplt and locmap.

plotting.axis_xy_spatial:
axis_xy setting that is specific to spatial plotting functions such as locmap and pixelplt. Provided since many users prefer axis_xy to be applied to all plots (via plotting.axis_xy) other than spatial plots. 

plotting.cmap:
Matplotlib colormap object or a registered Matplotlib colormap, which is used for continuous variables

plotting.cmap_cat:
Matplotlib colormap object or a registered Matplotlib colormap, which is used for categorical variables

plotting.gammasize:
The size of gamma symbols in variogram plots are Matplotlib.rcParams["font.size"] multiplied with this value.

plotting.grid:
If True, a grid is plotted in pygeostat ploting function unless, overridden by the associated kwarg

plotting.histplt.cdfcolor:
Color of the CDF for the histplt function.

plotting.histplt.edgecolor:
Color of the histogram edges for the histplt function.

plotting.histplt.facecolor:
Color of the histogram faces for the histplt function.

plotting.histplt.histbins:
Number of bins for a histogram (not CDF) calculation.

plotting.histplt.stat_blk:
Default stat_blk setting, which is either a string code, such as all or minimal, or a list of valid statistic names

plotting.histplt.stat_xy:
Location of the histogram stats location as a tuple of (xloc, yloc), where the locations should be between 0 and 1.

plotting.histplt.stat_xy_cdf:
Location of the CDF stats location as a tuple of  (xloc, yloc), where the locations should be 0 to 1.

plotting.histpltsim.alpha:
Transparency of the realizations CDFs

plotting.histpltsim.refclr:
Color of the reference CDF

plotting.histpltsim.simclr:
Color of the realization CDFs

plotting.lagname:
Name (str) to use on the x-axis of variogram plots, e.g., Lag or h

plotting.locmap.c:
Color of scatter in locmap (and pixelplt), which is a valid Matplotlib color specifier

plotting.locmap.s:
Size of scatter in locmap (and pixelplt), which is based on Matplotlib.rcParams if None

plotting.rotateticks:
If None, the pygeostat tickoverlap attempts to optimize tick, label angles to minimize overlap in the absence of a kwarg., If [xangle, yangle] (e.g., [0, 0], then these angles are used in the absence of a kwarg.

plotting.roundstats:
Display a set number of significant figures (False) or digits (True) for statistics in plots.

plotting.scatplt.alpha:
Alpha transparency of scatter in scatplt, which should be between 0 and 1.

plotting.scatplt.c:
Color of scatter in scatplt, which is either a valid Matplotlib color specifier, or the "KDE" string. KDE leads to calculation of the kernel density estimate at each scatter location

plotting.scatplt.cmap:
Matplotlib colormap object or a registered Matplotlib colormap which overrides plotting.cmap

plotting.scatplt.s:
Size of scatter in scatplt, which is based on Matplotlib.rcParams if None

plotting.scatplt.stat_blk:
Statistics that are plotted in scatplt, which should be either "all" or a list that may contains the strings: ["count", "pearson", "spearmanr"]

plotting.scatplt.stat_xy:
A 2-list that provides the x/y location of the statistics block

plotting.sigfigs:
Significant figures (roundstats=False) or digits (roundstats=True) for statistics in plots.

plotting.stat_fontsize:
Font size of statistics in pygeosat plots. If None, then the font size matches rcParams[font.size]. If a fraction, then the font size is rcParams[font.size]*stat_fontsize. If greater than or equal to 1, then stat_fontsize is the font size.

plotting.stat_ha:
The horizontal alignment of statistics in plots, which should be one of "right", "left" or "center"

plotting.unit:
The unit (str) that appears in spatial/map/variogram plots e.g., m or ft. Use None if no unit should be displayed.

plotting.varplt.color:
Color for varplt 

plotting.varplt.ms:
Marker size for varplt (e.g. experimental dot size) 

plotting.varpltsim.alpha:
Transparency of the realizations variograms

plotting.varpltsim.refclr:
Color of the reference variogram

plotting.varpltsim.simclr:
Color of the realization variograms

plotting.vplot.colors:
Colors of the major, minor and vertical direction variograms, which is used by the Variogram.plot function. May be a 3-list, or a color pallete in gs.avail_palettes

plotting.xabbrev:
Abbreviated name (str) that is used for the x coordinate label for map/cross-section plots, e.g., Easting or X

plotting.xname:
Name (str) that is used for the x coordinate label for map/cross-section plots, e.g., Easting or X

plotting.yabbrev:
Abbreviated name (str) that is used for the y coordinate label for map/cross-section plots, e.g., Northing or Y

plotting.yname:
Name (str) that is used for the y coordinate label for map/cross-section plots, e.g., Northing or Y

plotting.zabbrev:
Abbreviated name (str) that is used for the z coordinate label for map/cross-section plots, e.g., Elevation or Z

plotting.zname:
Name (str) that is used for the z coordinate label for map/cross-section plots, e.g., Elevation or Z

Error checking is performed when describing parameters

In [6]:
# Describe with an invalid key
try:
    gs.gsParams.describe('plotting.test')
except Exception as e:
    print(e)
'plotting.test is not a valid parameter. See gsParams.keys() for a list of valid parameters.'

Error checking of keys and values is also performed when setting parameters

In [7]:
# Set with an invalid key
try:
    gs.gsParams['plotting.test'] = 'something'
except Exception as e:
    print(e)
'plotting.test is not a valid parameter. See gsParams.keys() for a list of valid parameters.'
In [8]:
# Set with an invalid value
try:
    gs.gsParams['data.tmin'] = 'something'
except Exception as e:
    print(e)
Key data.tmin: Could not convert "something" to float

2. Plotting Label Functionality

Modify the plotting unit and marker size

The following change to gsParams will impact pixelplt, locmap, pitplt, drillplt, varplt, etc. Note that the plotting generalizes to all functions, whereas the locmap setting is specific to that function.

In [9]:
gs.locmap(dat, title='Units are meter, based on the standard default')
gs.gsParams['plotting.unit'] = 'ft'
gs.gsParams['plotting.locmap.s'] = 3
gs.locmap(dat, title='Units are feet, based on the modified default')
Out[9]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb3d5da0>

Modify the default axis labels

The following change to gsParams will impact pixelplt, locmap, pitplt, drillplt, varplt, etc.

In [10]:
gs.gsParams['plotting.xname'] = 'X'
gs.gsParams['plotting.yname'] = 'Y'
gs.locmap(dat)
Out[10]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fb735f7b8>

Logical labels if unit == None

Entering None or empty quotations for the unit.spatial, leads to exclusion from labels (with no empty brackets behind label)

In [11]:
gs.gsParams['plotting.unit'] = None
gs.locmap(dat)
Out[11]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb4da278>

Override the defaults with kwarg of each function

Providing keyword labels lead to overriding of defaults for a particular plot, without impacting the defaults.

In [12]:
gs.locmap(dat, xlabel='U', ylabel='V', title='Overriding Defaults')
gs.locmap(dat, title='Reverting to Defaults')
Out[12]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb5f4a58>

Restore the original pygeostat defaults

If wishing to restore the original defaults, a class function is provided

In [13]:
gs.gsParams.restore_defaults()
gs.locmap(dat)
Out[13]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb637710>

3. Plotting Style Functionality

gsParams may be used for altering the default plotting style in several ways. Note that provided customization via gsParams is intended to compliment matplotlib.rcParams, which provides basic defaults such as font families, font size, figure size, ,etc.

The background grid may be globally disabled in pygeostat plotting functions

In [14]:
gs.gsParams['plotting.locmap.s'] = 3
gs.locmap(dat, title='Without Grid Lines')
gs.gsParams['plotting.grid'] = True
gs.locmap(dat, title='With Grid Lines')
Out[14]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb7bdda0>

The face and edge color of histplt may be set globally

Also note that a GSLIB-style axis_xy may be set, removing top and left borders.

In [15]:
gs.histplt(dat['Organic Carbon'], title='Default Histogram Color Now Mimics GSLIB')
gs.gsParams['plotting.axis_xy'] = True
gs.histplt(dat['Organic Carbon'], 
           title='Top and right borders are now hidden for non-spatial plots')
gs.gsParams['plotting.histplt.facecolor'] = 'C0'
gs.gsParams['plotting.histplt.edgecolor'] = 'C1'
gs.histplt(dat['Organic Carbon'], title='Any Face and Edge Color Can be Used')
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x26fbb929fd0>

The color of CDFs may also be set globally

In [16]:
gs.histplt(dat['Organic Carbon'], icdf=True, title='Default CDF Color')
gs.gsParams['plotting.histplt.cdfcolor'] = 'C2'
gs.histplt(dat['Organic Carbon'], icdf=True, title='Any CDF Color Can be Used')
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x26fbb5e9fd0>

Various statistic block parameters may be altered

In [17]:
gs.gsParams['plotting.histplt.stat_blk'] = 'minimal'
gs.histplt(dat['Organic Carbon'], icdf=True, 
           title=('Minimal stats with 2 significant digits'))
gs.gsParams['plotting.sigfigs'] = 4
gs.histplt(dat['Organic Carbon'], icdf=True, 
           title=('4 significant digits'))
gs.gsParams['plotting.sigfigs'] = 3
gs.gsParams['plotting.roundstats'] = False
gs.gsParams['plotting.histplt.stat_xy_cdf'] = (0.7, 0.05)
gs.gsParams['plotting.stat_ha'] = 'left'
gs.histplt(dat['Organic Carbon'], icdf=True, title=('3 sig figs and left alignment'))
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x26fbcc98748>
In [18]:
# Restore for future plots
gs.gsParams['plotting.stat_ha'] = 'right'
gs.gsParams['plotting.axis_xy'] = False

Note that the color is validated when set as a default

In [19]:
try:
    gs.gsParams['plotting.histplt.facecolor'] = 'this isnt a valid color'
except Exception as e:
    print(e)
Key plotting.histplt.facecolor: this isnt a valid color does not look like a color arg
In [20]:
# Return some defaults
gs.gsParams['plotting.histplt.facecolor'] = '.9'
gs.gsParams['plotting.histplt.edgecolor'] = 'k'

The 'axis_xy' style of GSLIB may be globally disabled

In [21]:
def subplot_figure(title):
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    gs.histplt(dat['Organic Carbon'], ax=axes[0], stat_blk='all')
    gs.locmap(dat, var='Organic Carbon', ax=axes[1])
    fig.tight_layout()
    fig.suptitle(title, **{'y': 1.02})

    
subplot_figure('The default axis borders follow matplotlib')

gs.gsParams['plotting.axis_xy'] = True
subplot_figure('The axis borders now mimic GSLIB-style (only full borders for spatial plots)')

gs.gsParams['plotting.axis_xy_spatial'] = True
subplot_figure('The axis borders are now consistent')

Grid definitions must be repeatedly specified for initializing DataFiles, using grid related functionalities, etc. gsParams allows for a default to be set and applied in the absence of kwarg defaults

Observe the behaviour of DataFile with no griddef default

No griddef is associated with the DataFile when loaded, in the absence of kwarg

In [22]:
keyout = gs.DataFile(flname='../data/keyout.gsb')
print(keyout.griddef)
None

Now, initialize and set a default grid definition with one line

Here, the griddef is initialized through passing a file name with the new gridfl option in GridDef

In [23]:
# The grid definition works
gs.gsParams['data.griddef'] = gs.GridDef(gridfl='../data/griddef.txt')

The grid definition is now automatically associated with a DataFile when initialized, if its length matches

It will be assigned so long as DataFile.DataFrame.shape[0] matches gs.gsParams['data.griddef'].count()

In [24]:
keyout = gs.DataFile(flname='../data/keyout.gsb')
print(keyout.griddef)
229 1.0 2.0 
155 1.0 2.0 
1 0.5 1.0

NaN is the standard for missing values in Pandas, Numpy, Scipy, Paraview, Matplotlib and others, and is therefore adopted within Pygeostat.

Observe the default behaviour of the DataFile

Here, the gsParams['tmin'] default of -98.0 is trimming null values, leading to their assignment as NaN. The pandas describe then ignores them in the count, stats, etc., the pixelplt displays them as white, the np.nanmean ignores them, etc.

In [25]:
sgsim = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)
sgsim.describe()
Out[25]:
variable_001 variable_002 variable_003 variable_004
count 2.249300e+04 22493.000000 22493.000000 22493.000000
mean 5.640677e-02 0.213798 0.336117 0.154909
std 8.920696e-02 0.074965 0.119345 0.077681
min 1.799983e-15 0.069718 0.024641 0.015232
25% 1.300000e-02 0.162857 0.272894 0.100307
50% 2.563005e-02 0.194773 0.351056 0.126065
75% 5.949404e-02 0.241119 0.410054 0.187968
max 9.940173e-01 0.926000 0.997787 0.560000
In [26]:
# Note that matplotlib prints nans as white by default
gs.pixelplt(sgsim, vlim=(0, .2))
Out[26]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbb87cbe0>
In [27]:
# Note that this matches the pandas describe above
print(np.nanmean(sgsim.data['variable_001']))
0.0564067744178

Observe the behaviour of DataFile if tmin is altered (not used in this case)

No tmin is used, so that -999's in the data file are included in the pandas describe. This functionality may be more useful, however, if a differing tmin tolerance must be used (e.g., tmin = -998, tmin = -9998, etc.)

In [28]:
gs.gsParams['data.tmin'] = None
sgsim = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)
# Note that -999s are now included in the count and stats by pandas
sgsim.describe()
Out[28]:
variable_001 variable_002 variable_003 variable_004
count 35495.000000 35495.000000 35495.000000 35495.000000
mean -366.265702 -366.165964 -366.088451 -366.203282
std 481.823373 481.899203 481.958142 481.870830
min -999.989990 -999.989990 -999.989990 -999.989990
25% -999.989990 -999.989990 -999.989990 -999.989990
50% 0.011000 0.157402 0.258210 0.095869
75% 0.034786 0.211291 0.375364 0.140100
max 0.994017 0.926000 0.997787 0.560000
In [29]:
# Note that -999's are plotted as blue since matplotlib considers them valid, motivating the use of NaN 
gs.pixelplt(sgsim, vlim=(0, .2))
Out[29]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x26fbcb8d7b8>
In [30]:
# Note that there is no built-in functionality within numpy for ignoring 
# -999, which again motivates the use of NaN
print(np.mean(sgsim.data['variable_001']))
-366.26570208340155

Default NaN Replacement Behaviour with Output

When writing to external files, calling fortran wrappers, etc., the NaN's are replaced with gsParams['data.null'] by default. This may be overriden with function kwargs, which is recommended when writing to VTK where NaN's are implicitly handled.

In [31]:
# Read in data to initialize NaNs
gs.gsParams['data.tmin'] = -98.0
dat = gs.DataFile(flname='../data/data.dat')
dat.describe()
Out[31]:
Au Sulfides Carbon Organic Carbon Keyout
count 2266.000000 1399.000000 1399.000000 1398.000000 2268.0
mean 0.047560 0.211255 0.286897 0.154391 1.0
std 0.090554 0.120603 0.145569 0.107061 0.0
min 0.000000 0.012000 0.000000 0.000000 1.0
25% 0.008000 0.135500 0.166000 0.077000 1.0
50% 0.017000 0.185000 0.305000 0.124000 1.0
75% 0.043000 0.248000 0.389000 0.213000 1.0
max 1.000000 1.012000 1.000000 1.000000 1.0
In [32]:
# Write this data to a file, which may be inspected (note the -99's)
# before removing
dat.writefile('test.dat')

Altered NaN Replacement Behaviour with Output

The gsParams may be used for modifying this default output behaviour, as well as function kwargs.

In [33]:
# First, note that -999s are now present in the output file (rather than -99s) due
# to the use of the kwarg below
dat.writefile('test.dat', null=-999.0)
In [34]:
# gsParams may be used for globally modifying null to anything, including NaN
gs.gsParams['data.null'] = -9
# The DataFile must be re-initialized for the global null to be applied
# as its attribute
dat = gs.DataFile(flname='../data/data.dat')
# -9s are now visible in this file
dat.writefile('test.dat')
In [35]:
gs.rmfile('test.dat')

6. Save and Load gsParams Settings

In [36]:
gs.gsParams.save('gsParams_user_defaults.txt')
In [37]:
gs.gsParams.load('gsParams_user_defaults.txt')

Plotting Styles Addition

Notebook

Demo of new default style performance and set_style functionality

The following notebook demos how:

  1. The appearance of pygeostat plots in the absence of style specifications now matches matplotlib defaults, or whatever the previously defined style of a user is (according to modifications in matplotlib.rcParams)
  2. The default style may be modified with custom dictionary settings, before being used in all future plots
  3. The default style may be modified with pre-defined style dictionaries, such as the 'ccgpaper' style that used to be the default pygeostat plot style
  4. The original matplotlib defaults may be restored if changes are made
  5. The use of style specifications in individual pygeostat plot functions are only applied to those individual plots, and do not impact the default settings
  6. The complimentary (and not redundant) nature of pygeostat.set_style (via matplotlib.rcParams) and the pygeostat.gsParams settings.

Although these changes are demoed with locmap, they apply to all pygeostat plot functions. A paramdiff function is also defined in this notebook to explicit display the changes to the plotting style from each step.

In [1]:
import pygeostat as gs
import matplotlib as mpl
% matplotlib inline
In [2]:
dat = gs.DataFile(flname='../data/data.dat')

Generate a function for displaying changes to Matplotlib Parameters

The altered parameters will be visible, but the paramdiff function explicitly represents changes throughout this notebook.

In [3]:
origparams = mpl.rcParams.copy()
def paramdiff(): return {k: [v, mpl.rcParams[k]] for k, v in origparams.items() if v != mpl.rcParams[k]}

1. Plot without any modifications to style

No mangling of style occurs in the absence of provided style specifications

In [4]:
gs.locmap(dat, title='This plot uses the matplotlib defaults')
paramdiff()
Out[4]:
{}

2. Modify the default plot style with a custom dictionary

Note that this amounts to using the gs.set_style one-liner functionality in place of: import matplotlib as mpl mpl.rcParams['font.size'] = 30

This may be easier for some users...

In [5]:
gs.set_style(custom={'font.size':20, 'figure.figsize':(10, 10)})
gs.locmap(dat, title='The default style is modified through custom specifications')
paramdiff()
Out[5]:
{'font.size': [10.0, 20.0], 'figure.figsize': [[6.0, 4.0], [10.0, 10.0]]}

3. Modify the default plot style with preset styles

This is the required command for pygeostat to plot in its old default, 'ccgpaper'

In [6]:
gs.set_style('ccgpaper')
gs.locmap(dat, figsize=(12, 12), title='The ccgpaper style is now the default')
paramdiff()
Out[6]:
{'lines.markeredgewidth': [1.0, 0.0],
 'lines.markersize': [6.0, 7.0],
 'lines.solid_capstyle': ['projecting', 'round'],
 'patch.linewidth': [1.0, 0.3],
 'font.family': [['sans-serif'], ['Calibri']],
 'font.weight': ['normal', '400'],
 'font.size': [10.0, 8.0],
 'text.color': ['k', 'black'],
 'axes.axisbelow': ['line', True],
 'axes.facecolor': ['w', 'white'],
 'axes.edgecolor': ['k', 'black'],
 'axes.linewidth': [0.8, 0.5],
 'axes.titlesize': ['large', 8.0],
 'axes.labelsize': ['medium', 8.0],
 'axes.labelcolor': ['k', 'black'],
 'legend.fontsize': ['medium', 8.0],
 'legend.frameon': [True, False],
 'xtick.major.size': [3.5, 0.0],
 'xtick.minor.size': [2.0, 0.0],
 'xtick.major.width': [0.8, 1.0],
 'xtick.minor.width': [0.6, 0.5],
 'xtick.major.pad': [3.5, 3.0],
 'xtick.color': ['k', 'black'],
 'xtick.labelsize': ['medium', 8.0],
 'ytick.major.size': [3.5, 0.0],
 'ytick.minor.size': [2.0, 0.0],
 'ytick.major.width': [0.8, 1.0],
 'ytick.minor.width': [0.6, 0.5],
 'ytick.major.pad': [3.5, 3.0],
 'ytick.color': ['k', 'black'],
 'ytick.labelsize': ['medium', 8.0],
 'grid.color': ['#b0b0b0', 'lightgray'],
 'grid.linewidth': [0.8, 0.5],
 'figure.figsize': [[6.0, 4.0], [3.0, 3.0]],
 'figure.facecolor': [(1, 1, 1, 0), 'white'],
 'figure.edgecolor': [(1, 1, 1, 0), 'black'],
 'ps.useafm': [False, True]}

4. Restore default styles

The original matplotlib defaults may be restored via restore_mpl_style()

In [7]:
gs.gsPlotStyle.restore_defaults()
gs.locmap(dat, title='The matplotlib defaults are restored')
paramdiff()
Out[7]:
{}
In [8]:
gs.set_style("pt3")
gs.locmap(dat, title='The pt3 style is now the default')
paramdiff()
Out[8]:
{'lines.markeredgewidth': [1.0, 0.0],
 'lines.markersize': [6.0, 7.0],
 'lines.solid_capstyle': ['projecting', 'round'],
 'patch.linewidth': [1.0, 0.3],
 'font.family': [['sans-serif'], ['Calibri']],
 'font.weight': ['normal', '400'],
 'font.size': [10.0, 3.0],
 'text.color': ['k', 'black'],
 'axes.axisbelow': ['line', True],
 'axes.facecolor': ['w', 'white'],
 'axes.edgecolor': ['k', 'black'],
 'axes.linewidth': [0.8, 0.5],
 'axes.titlesize': ['large', 3.0],
 'axes.labelsize': ['medium', 3.0],
 'axes.labelcolor': ['k', 'black'],
 'legend.fontsize': ['medium', 3.0],
 'legend.frameon': [True, False],
 'xtick.major.size': [3.5, 0.0],
 'xtick.minor.size': [2.0, 0.0],
 'xtick.major.width': [0.8, 1.0],
 'xtick.minor.width': [0.6, 0.5],
 'xtick.major.pad': [3.5, 3.0],
 'xtick.color': ['k', 'black'],
 'xtick.labelsize': ['medium', 3.0],
 'ytick.major.size': [3.5, 0.0],
 'ytick.minor.size': [2.0, 0.0],
 'ytick.major.width': [0.8, 1.0],
 'ytick.minor.width': [0.6, 0.5],
 'ytick.major.pad': [3.5, 3.0],
 'ytick.color': ['k', 'black'],
 'ytick.labelsize': ['medium', 3.0],
 'grid.color': ['#b0b0b0', 'lightgray'],
 'grid.linewidth': [0.8, 0.5],
 'figure.figsize': [[6.0, 4.0], [3.0, 3.0]],
 'figure.facecolor': [(1, 1, 1, 0), 'white'],
 'figure.edgecolor': [(1, 1, 1, 0), 'black'],
 'ps.useafm': [False, True]}
In [9]:
gs.restore_mpl_style()
gs.locmap(dat, title='The matplotlib defaults are restored')
paramdiff()
Out[9]:
{}

5. Styles set using the pltstyle kwarg are non-permanent

In [10]:
gs.locmap(dat, pltstyle="pt3", title='pt3 style due to pltstyle kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, pltstyle="presentation", 
          title='presentation style due to pltstyle kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, title='return to the default style, which was not impacted by pltstyle kwargs')
gs.plt.show()
print(paramdiff())
{}
{}
{}
In [11]:
gs.locmap(dat, cust_style={'font.family':'Times New Roman'},
          title='Times due to cust_style kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, title='Not Times since the default is used')
gs.plt.show()
print(paramdiff())
{}
{}

6. gsParams for complimentary defaults.

pygeostat.gsParams provides many default style choices that are not accessible via matplotlib.rcParams (and by extension, pygeostat.set_styles). Some examples are shown below.

Default histplt

In [12]:
gs.histplt(dat['Au'], title='Default histplt style')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x1da88968898>

Alter defaults to provide a GSLIB look

GSLIB style plots remove the top and right axes. Also add a grid to all plots.

In [13]:
gs.gsParams['plotting.axis_xy'] = True
gs.gsParams['plotting.grid'] = True
gs.histplt(dat['Au'], title='GSLIB-style histplt')
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1da8898f208>

Alter the default content and style of the statistics

In [14]:
# Horizontal alignment of all pygeostat statistics
gs.gsParams['plotting.stat_ha'] = 'left'
# Use of a fraction leads to the statistics font being a fraction of the 
# regular font size
gs.gsParams['plotting.stat_fontsize'] = 0.8
# Histplt in the argument means that these arguments only pertain to that function
gs.gsParams['plotting.histplt.stat_blk'] = 'minimal'
gs.gsParams['plotting.histplt.stat_xy'] = (.8, .05)
gs.histplt(dat['Au'], title='Altered statistics content and style')
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x1da8898f4e0>

Specifics of spatial plots

GSLIB-style plots for spatial programs (locmap, pixelplt, etc.) displayed the top and right border. This is therefore a seperate argument within gsParams that is not impacted by the axis_xy default above.

gs.locmap(dat, title='Spatial plots maintain a GSLIB-style axis') gs.gsParams['plotting.axis_xy_spatial'] = True gs.locmap(dat, title='Spatial plots now are consistent with borders of the histplt above')

In [15]:
gs.locmap(dat, title='Spatial plots maintain a GSLIB-style axis')
gs.gsParams['plotting.axis_xy_spatial'] = True
gs.locmap(dat, title='Spatial plots are now consistent with all pygeostat Plots')
Out[15]:
<mpl_toolkits.axes_grid1.axes_divider.LocatableAxes at 0x1da88aac908>

Scatter Plot Addition

Notebook

Scatplt Demo

The following notebook demonstrates:

  1. Basic scatplt functionality
  2. Scatplt statistics
  3. Multiple scatterplots via scatplts
  4. Scatterplot comparisons via scatplts_lu

Import a few packages

In [1]:
import pygeostat as gs
import numpy as np
% matplotlib inline
In [2]:
gs.DataFile()
Out[2]:
<pygeostat.data.data.DataFile at 0x1ca5aec9518>

Load the data

In [3]:
dat = gs.DataFile(flname='../data/data.dat', notvariables=['Keyout'])
dat.variables
Out[3]:
['Au', 'Sulfides', 'Carbon', 'Organic Carbon']

Set some default plot parameters

In [4]:
gs.set_style(custom={'font.size':12, 'figure.figsize':(5, 5)})

1. Basic Scatplt Properties

Scatplt is heavily integrated with gsParams. All of the kwargs that are demonstrated in this section, may be set as project defaults to avoid their repetition.

Basic defaults

Axis labels are are drawn from the Pandas series if not provided. Coloring according to calculated KDE is the default.

In [5]:
gs.scatplt(dat['Au'], dat['Sulfides'])
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca603ff5f8>

Color bar functionality

Note that a specialized (and relatively clean) color bar labeling is provided for KDE.

In [6]:
gs.scatplt(dat['Au'], dat['Sulfides'], cbar=True)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca604ecef0>

Coloring with any arbitrary array

Here, the colorbar label is drawn from the provided data in the absence of a kwarg

In [7]:
gs.scatplt(dat['Au'], dat['Sulfides'], c=dat['Carbon'], cbar=True,
           clim=(0, .5))
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca60492518>

Other color and opacty options

Commonly manipulated properties are found in the function kwargs, although any Matplotlib.scatter kwargs may be passed as well.

In [8]:
gs.scatplt(dat['Au'], dat['Sulfides'], c='k', alpha=.1)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca6092e208>

2. Scatplt Statistics

Available statisistics

As with histplt, scatplt provides an 'all' argument for stat_blk, which for now, displays the number of pairs, pearson correlation and spearman rank correlation

In [9]:
gs.scatplt(dat['Au'], dat['Sulfides'], figsize=(5, 5), stat_blk='all')
# Also accomplished as a default via: 
# gs.gsParams['plotting.scatplt.stat_blk'] = 'all'
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca60944fd0>

Custom statistics

A list of the desired statistics maybe be provided, as well as the stat location.

In [10]:
gs.scatplt(dat['Au'], dat['Sulfides'], 
           stat_blk=('count', 'pearson'), stat_xy=(.95, .95))
# Also accomplished as a default via:
# gs.gsParams['plotting.scatplt.stat_xy'] = (.95, .95)
# gs.gsParams['plotting.scatplt.stat_blk'] = ('count', 'pearson')
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca609b7400>
In [11]:
# Setting new defaults for the next section
gs.gsParams['plotting.scatplt.stat_blk'] = 'all'
gs.gsParams['plotting.scatplt.stat_xy'] = (0.95, 0.95)

Declustered Stats

Declustering weights may be used for the calculated statistics. In the future, this may also be used for functions such as KDE calculations.

In [12]:
# Generate some random weights for this demo
wt = np.random.rand(dat.shape[0])
gs.scatplt(dat['Au'], dat['Sulfides'], wt=wt)
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ca609f4940>

3. Multiple Scatter Plots via scatplts

Multiple scatterplots may be plotted with the scatplts wrapper, which contintues to provide all of the flexibility that was demonstrated with scatplt.

In [13]:
# Set a larger default figure size
gs.set_style(custom={'figure.figsize':(10, 10)})
# Note the variables attribute of the DataFile
print('variables:', dat.variables)
variables: ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']

Basic Functionality

The defaults of this wrapper function are largely drawn from the underlying scaplt defaults and its related gsParams. Note that heterotopic data may be passed, as scatplt automatically determines the pairs with no NaN values (oberve the differing $n$ in each panel).

If a DataFile is passed, the function defaults to using the DataFile.variables attribute.

In [14]:
print(dat.columns)
print(dat.variables)
fig = gs.scatplts(dat, pad=(-5, -3.5))
Index(['X', 'Y', 'Au', 'Sulfides', 'Carbon', 'Organic Carbon', 'Keyout'], dtype='object')
['Au', 'Sulfides', 'Carbon', 'Organic Carbon']

Another example with Au coloring and specified variables

Here, a DataFrame is passed to control the variables that are plotted.

In [15]:
fig = gs.scatplts(dat[dat.variables], figsize=(10, 10), pad=(-5, -3.5), 
                  c=dat['Au'],)

Another example with differing plot parameters

Demonstrating plot flexibility with a few parameters.

In [16]:
fig = gs.scatplts(dat, figsize=(10, 10), pad=(-4.5, -3.2), c='k',
                  alpha=.1, s=6, stat_blk=False, axis_xy=True, 
                  grid=True)

4. Scatterplot Comparisons via scatplts_lu

The scatplts_lu function facilitates the comparison of multiple scatterplots, placing pairs in the upper and lower triangle. The orientation of the lower triangle plots may be aligned with the upper plots to further ease comparison.

Generate transformed data for comparison

Note that the following block of code is placed as an example of program calling with the latest DataFile attributes, though it is commented since the PPMT program is required.

In [17]:
# ppmt = gs.Program(program='ppmt', getpar=True)
ppmtpar = """                         Parameters for PPMT
                         *******************
START OF PARAMETERS:
{datafl}             -input data file
{nvar} {varcols} 0   -  number of variables, variable cols, and wt col
-5 1.0e7             -  trimming limits
25 50 50             -min/max iterations and targeted Gauss perc. (see Note 1)
1                    -spatial decorrelation? (0=no,1=yes) (see Note 2)
1 2 0                -  x, y, z columns (0=none for z)
50 25                -  lag distance, lag tolerance
nscore.out           -output data file with normal score transformed variables
{outfl}              -output data file with PPMT transformed variables
ppmt.trn             -output transformation table (binary)

Note 1: Optional stopping criteria, where the projection pursuit algorithm will terminate
after reaching the targetted Gaussian percentile. The input percentile range is 1 (very Gaussian)
to 99 (barely Gaussian); the percentiles are calculated using random Gaussian distributions.
The min/max iterations overrides the targetted Gaussian percentile.

Note 2: Option to apply min/max autocorrelation factors after the projection pursuit algorithm
to decorrelate the variables at the specified non-zero lag distance.
"""
ppmtfl = '../data/ppmt.out'
# ppmt.run(parstr=ppmtpar.format(datafl=dat.flname, nvar=dat.nvar, 
#                                varcols=dat.gscol(dat.variables), 
#                                outfl=ppmtfl))
gvariables = ['PPMT:'+a for a in dat.variables]
datg = gs.DataFile(ppmtfl, variables=gvariables)

Set some new defaults

All of these items could also be set via kwargs, though using gsParams allows for them to not be repeated.

In [18]:
gs.gsParams['plotting.scatplt.s'] = 3
gs.gsParams['plotting.scatplt.stat_blk'] = ['pearson', 'noweightflag']
gs.gsParams['plotting.scatplt.stat_xy'] = (1., 1.05)
gs.gsParams['plotting.sigfigs'] = 3
gs.gsParams['plotting.roundstats'] = True
In [19]:
fig = gs.scatplts_lu(dat, datg, pad=(-4, -3.2))

Orientation and titles

Here, the orientation of the axes in the lower and upper are aligned to ease comparison. Also note that the lower and upper triangles may be titled

In [20]:
dat['Weight'] = wt
fig = gs.scatplts_lu(dat, dat, lowwt='Weight', pad=(-4, -3.2), 
                     titles=('Data', 'Realizations'), titlesize=14,
                     align_orient=True)

Visualization Toolkit (VTK) Changes

Notebook

Write VTK Demo

The following notebooks demonstrates how the write_vtk function (and its DataFile.writefile wrapper) may be used for outputing:

  1. Point data
  2. Regular grid data
  3. Structured surface data
  4. Structured grid data
In [1]:
import pygeostat as gs
import pandas as pd
import numpy as np

1. Point Data

Drill hole data is stored as a point VTK format, which is relatively inefficient in terms of storage. Users may consider options displayed below for reducing file size (increasing speed of load in Paraview).

Load the point data and inspect its attributes

Note that these attributes are detected automatically based on the names (variables are unassigned columns).

In [2]:
datadir = '../data/pycourse/'
dat = gs.DataFile(datadir+'data.dat')
# 
print('special attributes:', dat.dh, dat.xyz, dat.ifrom, dat.ito)
print('variables:', dat.variables)
print('dftype:', dat.dftype)
special attributes: HoleID ['X', 'Y', 'Elevation'] from to
variables: ['Prop1-Phi', 'Prop2-Vsh', 'Prop3-Kv', 'Prop4-Kh', 'Prop5-Sw']
dftype: point

Convenience with DataFile.writefile

Minimal options are provided with the DataFile.writefile, but this convenience function is permitted since x, y, z and dftype are registered.

In [3]:
dat.writefile('point.vtk')

Flexibility with write_vtk

Specific variables may be specified for writing, reducing file size. Further, the precision of variables (vdtype) and coordinates (cdtype) may be specified to reduce file sizes if the default 'float64' is not necessary.

In [4]:
gs.write_vtk(dat, 'point_float32_variables.vtk', variables=[dat.dh]+dat.variables,
             vdtype='float32', cdtype='float32')

Integration of gsParams for defaults

Note that options such as vdtype and cdtype may have their defaults altered, allowing for the use of writefile without the required kwarg.

In [5]:
gs.gsParams['data.write_vtk.vdtype'] = 'float32'
gs.gsParams['data.write_vtk.cdtype'] = 'float32'
dat.writefile('point_float32.vtk')

2. Regular Grid Data

The vast majority of grid data that is generated by CCG programs are regular or rectinilear grids. They are stored efficiently by the rectlinear VTK file (.vtr), although large file sizes can still result. Users should consider using float and integer precision as appropriate to reduce file sizes (see previous section).

Generate some grid data

Load the grid definition and create a variable that is a function of the x, y and z coordinates (completely arbitrary).

In [6]:
griddef = gs.GridDef(gridfl=datadir+'griddef.txt')
x, y, z = griddef.gridcoord()

sim = gs.DataFile(data=pd.DataFrame(np.multiply(np.multiply(x, y), z), 
                                    columns=['Multiplied Coordinates']), 
                  griddef=griddef, dftype='grid')
sim.dftype
Out[6]:
'grid'

Convenience with DataFile.writefile

Since the file has a grid dftype and a grid definition, writefile outputs it as a regular grid format.

In [7]:
sim.writefile('regular_grid.vtk')

3. Structured Surface Data

Structured grids differ from regular or rectilinear grids, in that the coordinates of each centroid does not align with a simple definition. Rather, the coordinates of each grid location must be specified.

Like regular grids, however, the relative location/ordering of each grid node is maintained. Structured grids should iterate in the x, y and then z direction, as with GSLIB-style regular grids.

The first form of structured grid data is 2-D surfaces, which are relatively common. In the below example, the x/y coordinates of the surface are regularly spaced, but the z coordinate of the grid (surface elevation) is irregular.

Load the surface data

Note that x, y and z are included in this file, and are recognized on import.

In [8]:
sim = gs.DataFile(datadir+'surface.gsb', griddef=griddef.convert_to_2d())
print('Coordinate columns of the surface:', sim.x, sim.y, sim.z)
Coordinate columns of the surface: X Y Z

Output the VTK with two available methods

Two methods are presented below. The dftype must be specified, either as a datafile attribute, or as kwarg in the low-level write function. The x, y and z columns must also be specified, either as attributes or kwargs.

In [9]:
gs.write_vtk(sim, 'structured_surface_method1.vtk', dftype='sgrid')
sim.dftype = 'sgrid'
sim.writefile('structured_surface_method2.vtk')

4. Structured Grid

A 3-D structured grid is a direct extension of the 2-D structured surface above.

Generate structured grid data

The 3-D grid will conform to the output surface above.

In [10]:
# The 3-D structured grid will confor
surf = sim['Z'].values.reshape(griddef.nx, griddef.ny, order='F')

# Generate the z coordinates
_, _, z = griddef.gridcoord()

# Add some variability to the z coordinates of the grid, making it structured
z = z.reshape(griddef.nx, griddef.ny, griddef.nz, order='F')
for iy in range(griddef.ny):
    for ix in range(griddef.nx):
        z[ix, iy, :] = z[ix, iy, :] + surf[ix, iy]
z = z.reshape(griddef.nx*griddef.ny*griddef.nz, order='F')
# Create some cell data for output - simply the grid index
idx = np.arange(griddef.count())
# Create a structured datafile
sim = gs.DataFile(data=pd.DataFrame(np.stack((z, idx), axis=1), 
                                    columns=['Z', 'Index']), 
                  griddef=griddef, dftype='sgrid')

Output the VTK

Note that since x and y are not attributes of the sgrid object, they are assumed to follow the regular grid definition.

In [11]:
sim.writefile('structured_grid.vtk')

Weight Changes

Notebook

Declustering Weights Demo

This notebook demonstrates how the declustering weights attribute of a DataFile (wts) and the weight kwarg of functions (wt) may be used.

Note that wts is plural, because data may have multiple declustering weights for each variable. wt kwargs are generally singular (excluding future functions such as nscore, which will accept a wts argument for multiple variables).

The notebook is comprised of 7 sections:

  1. Load and inspect the data
  2. Calculate data spacing in advance of declustering
  3. Calculate declustering weights for each variable
  4. Set the DataFile.wts attribute
  5. Use the wts attribute with histplt
  6. Use the wts attribute with scatplts
  7. Use the wts attribute with histplt (with one variable/weight)
In [1]:
import pygeostat as gs
import matplotlib.pyplot as plt
% matplotlib inline
gs.gsParams['plotting.locmap.s'] = 5

1. Load and Inspect the Data

Note that Au is more densely sampled, and will therefore have differing declustering weights relative to the secondary variables.

In [2]:
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')
print('variables = ', dat.variables)
variables =  ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']
In [3]:
fig, axes = gs.subplots(2, 2, cbar_mode='single', 
                        axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
    gs.locmap(dat, var=var, ax=ax, vlim=(0, .5),
              cbar_label='standardized units')
In [4]:
dat.columns
Out[4]:
Index(['X', 'Y', 'Au', 'Sulfides', 'Carbon', 'Organic Carbon', 'Keyout'], dtype='object')

2. Calculate Data Spacing

The data spacing is calculated to select an appropriate declustering cell size for each variable (corresponding with the ~P95 percentile here).

In [5]:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
tnames = []
for ax, var in zip(axes, dat.variables):
    dat.spacing(3, var)
    tnames.append(var+' Data Spacing (m)')
    gs.histplt(dat, var=tnames[-1], ax=ax, icdf=True)   
# Don't need these variables anymore
tnames.append('Keyout')
dat.drop(tnames)
In [6]:
cellsizes = [8, 16, 16, 16]

3. Decluster Each Variable

In [7]:
# Note that users must have a declus program on their system path
# or in the folder where this is executed
declus = gs.Program(program='declus', getpar=True)
C:\python\pygeostat\dev_testing\demos_current\tmp4i0eu5yl/declus.par has been copied to the clipboard
In [8]:
parstr = """                  Parameters for DECLUS
                  *********************

START OF PARAMETERS:
../data/data.dat            -file with data
1   2   0   {varcol}        -  columns for X, Y, Z, and variable
-1.0e21     1.0e21          -  trimming limits
declus.sum                  -file for summary output
declus.out                  -file for output with data & weights
1.0   1.0                   -Y and Z cell anisotropy (Ysize=size*Yanis)
0                           -0=look for minimum declustered mean (1=max)
1  {cellsize} {cellsize}    -number of cell sizes, min size, max size
5                           -number of origin offsets
"""
tnames = []
for var, cellsize in zip(dat.variables, cellsizes):
    declus.run(parstr=parstr.format(varcol=dat.gscol(var),
                                    cellsize=cellsize),
               liveoutput=False)
    temp = gs.DataFile('declus.out')
    tnames.append(var+' Wt')
    dat[tnames[-1]] = temp['Declustering Weight']

gs.rmfile(['declus.out', 'declus.sum', 'temp'])
Calling:  ['declus', 'temp']
Calling:  ['declus', 'temp']
Calling:  ['declus', 'temp']
Calling:  ['declus', 'temp']

4. Setting DataFile.wts

The DataFile.wts functionality is now detailed.

Note that the wts attribute initializes as None

In [9]:
print('Data columns = ', list(dat.columns))
print('Data weights = ', dat.wts)
Data columns =  ['X', 'Y', 'Au', 'Sulfides', 'Carbon', 'Organic Carbon', 'Au Wt', 'Sulfides Wt', 'Carbon Wt', 'Organic Carbon Wt']
Data weights =  None

wts may be set using the setcol function

This is consistent with using setcol for dat.x, dat.y, etc.

In [10]:
dat.setcol('wts', tnames)
print('Data weights = ', dat.wts)
Data weights =  ['Au Wt', 'Sulfides Wt', 'Carbon Wt', 'Organic Carbon Wt']

wts may be specified on initialization

The data is written to a temporary file below to permit initialization. wts are specified on the re-intitialization of the DataFile, which must be done in this case due to their non-standard naming conventions. Note that these weight columns are not registered as variables due to their registry as special attributes.

In [11]:
dat.writefile('declus.out')
dat = gs.DataFile('declus.out', wts=tnames)
gs.rmfile('declus.out')
print('Data weights = ', dat.wts)
print('Variables = ', dat.variables)
Data weights =  ['Au Wt', 'Sulfides Wt', 'Carbon Wt', 'Organic Carbon Wt']
Variables =  ['Au', 'Sulfides', 'Carbon', 'Organic Carbon']

5. Use of wts with histplt

The wts naming convention allows for natural iteration of wt in a loop.

In [12]:
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for var, wt, ax in zip(dat.variables, dat.wts, axes):
    gs.histplt(dat, var=var, wt=wt, ax=ax)

6. Use of wts with scatplts

Error catching

Scatplts currently requires the input of a single wt specifier. This could be altered in the future (e.g., averaging of declustering weights for each pair).

In [13]:
try:
    fig = gs.scatplts(dat, wt=dat.wts)
except Exception as e:
    print(e)
invalid wt type!

Dropping wts columns alters the wts attribute

DataFile.drop leads to altering of the DataFile.wts attributes (if necessary). The below function leads to only a single wts column.

In [14]:
dat.drop(dat.wts[1:])
print('Data weights = ', dat.wts)
Data weights =  Au Wt

Scatplt now accepts dat.wts, since it is not a list

Note that a entry of dat.wts could previously be used as well.

In [15]:
fig = gs.scatplts(dat, wt=dat.wts, figsize=(10, 10), pad=-3,
                  stat_xy=(.95, .95))

Scatplt with a boolean wt

If dat.wts is a single column, a boolean may also be used for activating weights, adding further convenience.

In [16]:
fig = gs.scatplts(dat, wt=True, figsize=(10, 10), pad=-3,
                  stat_xy=(.95, .95))

7. Hisplt with One Variable/Weight column

Error catching

Histplt throws an error with the below function call since multiple variables are present within dat.

In [17]:
try:
    gs.histplt(dat, wt=True)
except Exception as e:
    print(e)
Could not coerce data (DataFile) into a 1D dataset!

Histplt with a boolean wt

After reducing the number of variables to 1, histplt operates on the DataFile. It also permits a boolean wt.

In [18]:
dat.drop(dat.variables[1:])
print('Data variables = ', dat.variables)
Data variables =  Au
In [19]:
gs.histplt(dat, wt=True)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x28e01fc5cf8>