Fortran in pygeostat

Various Fortran subroutines have been wrapped for Python using a compiling tool Fortran Builder developed to permit F2PY compilation with a wide variety of Fortran and C compiler combinations. Compiling Fortran extensions for python allows data to be passed directly from Python to Fortran. This allows users to take advantage of the speed of Fortran, the breadth of historical libraries and the seamless integration with the wide variety of flexible and powerful libraries accessible through Python.

When pygeostat is imported, these compiled extensions are not automatically imported. However, full functionality of pygeostat assumes that the compiled extensions are present. If they are not found then usage of the package is still possible; various errors and warnings are thrown if functions requiring the compiled extensions are used.

The simplest way to compile the extensions is to run the following commands from a python interpreter after installing pygeostat:

>>> import pygeostat as gs
>>> gs.build_pygeostat_fortran()

Alternatively, a compile script is provided that is command line callable. Starting a terminal in pygeostat/fortran/ and using the following command will result in the extensions built in the current directory:

> python compile.py -compiler=gnu all

NOTES

  1. If pygeostat was installed from a .whl file, the fortran extensions are already present and this step is not required.

  2. If installing using the setup.py, try python setup.py install -buildfortran to pre-build the fortran extensions and copy them over to the site-packages. The wheel installation is preferred over this step

  3. The Fortran Builder is compatible with gfortran and intel compilers and tested with the following configurations:

    • MinGW 4.7.1 installed with conda install mingw
    • MinGW 7.3.0 installed from sourceforge
    • Intel Fortran 12 and Visual Studio 2012
    • Intel Fortran 15+ and Visual Studio 2015

4. To compile custom fortran extensions (e.g. fortran code not contained in pygeostat), consider the custom fortran builder

Fortran Extesion Building Functions

Functions developed to permit compilation of pygeostat and arbitrary fortran modules. Using f2py and flexible Fortran and C compilers.

Note

compile.py is command line callable. Open a terminal in pygeostat/fortran/ and build the pygeostat fortran extensions with python compile.py -compiler=gnu all

Code author: Ryan Martin - 24-04-2018

Build Pygeostat Fortran

pygeostat.fortran.compile.build_pygeostat_fortran(pygeostatmodules='all', compiler='gnu', mode='release', exflags='', opt='O2', clean=True, verbose=False)

This function builds the f2py extensions with minimal interaction from the user. The goal is to make this function callable with gs.build_pygeostat_fortran(), and have all requirements sorted out automatically resulting in a set of compiled pygeostat fortran extensions.

Modules are compiled to pygeostat/fortran/, regardless of where the function is called from. If no gnu compiling tools are found on the path (gfortran -v, gcc -v returns nothing), then conda install mingw -y is run to install MinGW, and this compiler is used.

Parameters:
  • pygeostatmodules (str) – either "all" or one of the dictionary keys found in pygeostat/fortran/sourcedefs.py
  • compiler (str) – either "intel", "gnu" or a path to a local folder containing gfortran compilers, e.g., C:/mingw/bin/
  • mode (str) – “release” or “debug”
  • exflags (str) – compiler-specific permissable fortran compiler flags
  • opt (str) – optimization level, defaults to O2
  • clean (bool) – if True then the build directory is cleaned before compiling (recommended)
  • verbose (bool) – write all output to the terminal

Code author: Ryan Martin - 07-04-2018

Pygeostat Fortran Modules

backtr

backtr.f90

covasubs

covasubs.for

dgm

dgm.for

esri_io

esri_io.f90

fgslib_io

fgslib_io.f90

gausskde

gausskde.f90

getcollocated

subs/sortem.f90, getcollocated.f90

histcorrect

subs/random.f90, subs/normaldist.f90, subs/sortem.f90, histcorrect.f90

nscore

subs/quicksort.f90, subs/acorni.f90, subs/normaldist.f90, subs/nscore_module.f90, nscore.f90

pygsb

subs/fileprocessing.f90, subs/gslib_binary.f90, pygsb.f90

sgvertices

sgvertices.f90

spbusim

subs/random.f90, subs/sortem.f90, subs/normaldist.f90, linearinterp.f90, covasubs.for, gausskde.f90, spbusim.f90

subsample

subs/random.f90, subsample.f90

supersec

supersec.f90

varcalc

subs/sortem.f90, subs/random.f90, subs/varsubs.f90, varcalc.f90

varmodel

subs/random.f90, covasubs.for, varmodel.f90

varsim

subs/fileprocessing.f90, subs/gslib_binary.f90, subs/varsim.for, varsim_wrap.f90

Build Custom Fortran

pygeostat.fortran.compile.build_custom_fortran(sources, includes={}, wraponly={}, name=None, srcdir='./', outdir='./', tmpdir='./tmp/', compiler='gnu', mode='release', exflags='', opt='O2', clean=True, verbose=True)

This function is intended to allow arbitrary f2py extensions to be constructed using the tooling provided in this module. This function replaces FortranBuild as a more succinct methodology to compile the requred sources.

Parameters:
  • sources (list or dict) – either a list of fortran files where the last depends on the first and the last file in the list contains the f2py wrapping code, or a dictionary where keys in the dictionary indicate the name of the module to be built and the values are the lists of associated sources to generate that module. See pygeostat/fortran/sourcedefs.py for inspiration on the structure of these dictionaries
  • includes (list or dict) – a matching item to sources that contains a list or dict of extra items to include on the link step of compilation. See pygeostat/fortran/sourcedefs.py.
  • wraponly (list or dict) – matching item to sources and includes that contains a list or dictionary of functions that get wrapped for python, other functions in the f2py fortran code are ignored. See pygeostat/fortran/sourcedefs.py.
  • name (str) – if sources is a list, a name must be provided for the resulting <name>.pyd
  • tmpdir, outdir (srcdir,) – various locations to consider
  • compiler (str) – either "intel", "gnu" or a path to a compiler bin directory
  • mode (str) – "release" or "debug"
  • exflags (str) – compiler-specific permissable fortran compiler flags
  • opt (str) – optimization level, defaults to O2
  • clean (bool) – if True then the build directory is cleaned before compiling (recommended)
  • verbose (bool) – write all output to the terminal

Code author: Ryan Martin - 07-04-2018

Fortran Subroutines

Simulated Realizations Accuracy Plot

Used and wrapped within gs.accpltsim()

Covariance Calculation Subroutines

cova3

Used and wrapped within gs.VariogramModel.getcova()

sqdist

Used and wrapped within gs.VariogramModel.setcova()

setrot

Used and wrapped within gs.VariogramModel.setcova()

Discrete Gaussian Model

Used and wrapped within gs.dgm()

Fast GSLIB File I/O Subroutines

File Read

Used and wrapped within gs.read_gslib_f() and gs.DataFile()

File Write

Used and wrapped within gs.write_gslib_f() and gs.DataFile()

Gaussian KDE Subroutines

Not used in any functions, but can be called.

Get Collocated Exhaustive Secondary Data

Used and wrapped within gs.getcollocated()

Histogram Correction

Not used in any functions, but can be called.

GSLIB-Binary I/O Subroutines

File Read

Used and wrapped within gs.read_gsb() and gs.DataFile()

File Write

Used and wrapped within gs.write_gsb() and gs.DataFile()

Semiparametric Bayesian Updating

Not used in any functions, but can be called.

Subsample Dataset

pygeostat.fortran.wrappers.subsample(datafl, col, ncell, nsub, nreal, rseed)

When datasets become too large to efficiently use, a sub-sample maybe extracted and used in ieu. A Fisher-Yates shuffle is implemented.This subroutine was designed with the intent of wrapping it for usein python. Motivated from Jarred L. Deutsch’s use of Fisher-Yates shuffle in gslib program histpltsim.

Assumes that the data file being read is in GSLIB format. Outputs an 1D array in the case that only one realization is sub-sampled, or a2D array in the case where multiple realizations are sub-sampled.Each realizations sub-sample is within a unique column.

Parameters:
  • datafl (str) – A single input datafile with the variable being sampled
  • col (int) – Column containing data to sub-sample
  • ncell (int) – The number of cells within a single realization, can also be interpreted as the number of data to read, then sub-sample
  • nsub (int) – Number of sub-samples to output
  • nreal (int) – Number of realizations to sub-sample
  • rseed (int) – A seed value to pass to the random number generator
Returns:

Output array with a realization in each column

Return type:

subsamp (np.array)

Examples

First, the function needs to be loaded into Python:

>>> from pygeostat.fortran.subsample import subsample

A simple call:

>>> subsamp = subsample(datafl='./sgsim.out', col=1, ncell=griddef.count(), nsub=5000,
...                     nreal=100, rseed=gs.rseed())

Code author: Warren E. Black - 2015-09-22

Experimental Variogram Calculation

Used and wrapped within gs.varcalc()

Variogram Modeling and Fitting

Used and wrapped within gs.Variogram.fitmodel()

Gridded Variogram Calculation

Used and wrapped within gs.varsim()

F2PY

F2PY Intro, Installation, Troubleshooting Tips

F2PY compiles Fortran subroutines into .pyd files, allowing them to be called as Python functions.The module is included within numpy. Prerequisites for F2PY include (possibly not limited to, depending on your system, setup, and experience with the F2PY quirks):

  1. A C compiler from one of:
    1. MinGW (cygwin or other)
    2. MSVC compilers (probably the hardest option)
  2. A Fortran compiler:
    1. gfortran from MinGW
    2. Intel Fortran Compiler (see: resource/f2py_ifort_Makefile/)

3. The right path variables (this list may not be exhaustive, and is a common source of frustration in Windows)

  1. Anaconda path variables (should already be setup)
  2. Path to MinGW bin (<Anacondapath>/MinGW/bin)
  3. Path to Cygwin bin (<Cygwinpath>/bin)
  1. “Wrappable” Fortran code
    1. see here for more details

Getting F2PY to work directly and properly on windows can be an issue as the right combination of C and Fortran compilers are required based on your operating system and whether your machine has 32 or 64-bit architecture. Libraries also need to be in the correct location.

Warning

F2PY compilation with Cygwin does not presently work in Anaconda3 2.4.1 for Python 3.5 due to a change in Visual Studio Libraries (Bug Report: https://bugs.Python.org/issue25251). Recommendation is to use Anaconda3 2.3.0 in the meantime.

Installation instructures are outlined below, some examples can be found here. It is assumed that the Anaconda distribution is being used and is already installed

Windows 64-Bit

This procedure works for Windows 7 OS, it may work for others but has not been tested.

Install mingw to Anaconda by running the following in the command line or cygwin prompt:

>>> conda install mingw

Install cygwin with the following packages from the development category:

  • mingw64-x86_64-gcc-core
  • mingw64-x86_64-gcc-g++

F2PY Examples

Typical Fortran code allows for dynamically allocatable arrays depending on size constraints determined during run time. This can be accounted for in 2 ways with F2PY. Either write your Fortran subroutines to explicitly define all array sizes in the subroutine call. i.e.:

subroutine example(data,size)
integer, intent(in) :: size
real*8, intent(inout) :: data(size)
...
end subroutine example

If the above subroutine was contained in a file called ‘example.f90’ it could immediately be wrapped with F2PY using the command line F2PY functionality:

>>> f2py -c example.f90 -m example

There are commonly errors thrown at this stage, some of which are covered in the installation section, others can be sorted out by combing ‘the forumns’… The simplest method for troubleshooting is to use various combinations of --fcompiler= or --compiler= flags to determine the working combination for your machine. So the modified call may be something like this:

>>> f2py --fcompiler=gfortran --compiler=cygwin -c example.f90 -m example

The above procedure assumes that your functions are passed and return variables of known size. That is, sizes of the variables are known (there are no ‘allocatable’ arrays here). Additionally, all types are standard, derived types cannot be passed. If the code you are developing MUST account for unknown sizes, or needs to work with some kind of derived type, a working methodology is to use module allocatable arrays and private module variables, respectively:

module example
! things we can access from Python !
real*8,allocatable,public :: data(:,:)
public :: usedata
! things we cannot access from Python, but can use in Fortran !
type(customtype),private :: derived_variable
  contains
subroutine usedata()
 do
  !something with data and/or derived_variable
 enddo
end subroutine
end module example

In this case, you define the size of the allocatable arrays from Python by assigning data to the module variable after compiling the module with F2PY (more on this below):

>>> import numpy as np
>>> from example import example as ex
>>> ex.data = np.array([[1,1,1],[0,0,0]])

Using these simple rules, it is possible to wrap existing Fortran codes that are too ‘complex’ for F2PY, so they can be used in Python. Conider the example below. The kdtree2_module (from the CCG Knowledge Base) contains complex Fortran code including pointers, allocatable arrays, derived types, recursive subroutines. It also relies on allocatable arrays and pointers in order to be used from calling code, since you need to have dynamic storage for nearest neighbor searches, and must store pointers to the kdtree once it is constructed. The following wrapping module can be compiled by F2PY, to allow access to the complex code from Python:

module kd
    use kdtree2_module
    implicit none
    !things that we can access from Python!
    public :: createtree,nnearest,rnearest
    integer,allocatable,dimension(:),public :: inds
    real*8,allocatable,dimension(:),public :: dis
    real*8,allocatable,dimension(:,:),public :: dlocs
    integer, public :: nd
    !things we can't access from Python:
    type(kdtree2),pointer,private :: tree
    type(kdtree2_result),allocatable,target,private :: datainds(:)
contains
    subroutine createtree()
    real*8,allocatable,dimension(:,:) :: tlocs
        if(allocated(dlocs))then
            if(size(dlocs,1)>3)then
                write(*,*)'Data must be dim x nd dimensioned!'
                return
            endif
            if(associated(tree)) call kdtree2_destroy(tree)
            tree => kdtree2_create(dlocs,3,.true.)
            nd = size(dlocs,2)
        else
            write(*,*)'Data not defined'
        endif
    end subroutine createtree

    subroutine nnearest(n,pt)
    integer,intent(in) :: n
    real*8,intent(in),dimension(3) :: pt
        if(allocated(inds))deallocate(datainds,inds,dis)
        allocate(datainds(n),inds(n),dis(n))
        call kdtree2_n_nearest(tree,pt,n,datainds)
        inds=datainds%idx
        dis=sqrt(datainds%dis)
    end subroutine nnearest

    subroutine rnearest(r,pt)
    real*8,intent(in) :: r
    real*8,intent(in),dimension(3) :: pt
    integer :: n,nf
    real*8 :: r2
        r2 = r * r
        n = kdtree2_r_count(tree,pt,r2)
        if(allocated(inds))deallocate(datainds,inds,dis)
        allocate(datainds(n),inds(n),dis(n))
        call kdtree2_r_nearest(tree,pt,r2,nf,n,datainds)
        inds=datainds%idx
        dis=sqrt(datainds%dis)
    end subroutine
end module kd

In order to build the above module with F2PY, we need to first pre-compile the kdtree2.f90 codes into a linkable module for F2PY. Do this with:

>>> gfortran -c -O2 kdtree2.f90 -fstack-arrays

The -c flag instructs gfortran to build the codes without linking to an executable, which generates a .mod file for every module in the kdtree2.f90 codes, and links all .mod modules into the .o object file that is used in the next step. The -fstack-arrays flag is required in this case since intels compilers assign arrays to the stack by default, whereas gfortran does not. This flag may not be required for all modules. Additionally it may be necessary to define the stack size explicitly here with another flag (see website for more options). Pre-compiling the complex codes is the critical step for using complex Fortran codes in Python. The next step is to build the wrapping module kd, contained in the kd_wrap.f90 codes (shown above), with F2PY. Building the module with F2PY generates the C-wrappers that allows the Fortran code to be called from Python. This step may be a bit of trial and error with compiler flags and whatnot, but in this case, and from a machine with both mingw, cygwin and Python 3.4 installed, the following call worked:

>>> f2py --fcompiler=gfortran --compiler=cygwin --opt=-O2 --f90flags=-fstack-arrays -I.
... kdtree2.o -c kd_wrap.f90 -m kd_wrap

Various errors will likely be thrown at this step, varying from 64 bit compiler issues, Python issues, distutils, paths, etc. For some troubleshooting see the F2PY install section.

Once compiled into a .pyd file, it can be called within Python as a module, and any module variables can be assigned data, any module subroutines may also be called, for example:

from kd_wrap import kd
import pygeostat as gs
import numpy as np

#import some data:
dat = gs.DataFile(flname='GC_df.out',readfl=True)
#in the kdtree2 module, data is assumed to be dim x nd dimensioned
kd.dlocs = np.array(dat.data[[0,1,2]]).transpose()

kd.createtree()

# Find points around some arbitrary location
pt = np.array([10,10,10])
# Do a nearest neighbor search with 10 neighbors
kd.nnearest(10,pt)
# Now we can access the resulting distances and indices
print(kd.dis)
print(kd.inds)
# Do a radius search
kd.rnearest(150,pt)
# Now we can access the resulting distances and indices
print(kd.dis)
print(kd.inds)

Compiling and calling Fortran dll’s

The standard Python library contains a package called ctypes which provides interfaces to foreign functions that are found in shared libraries. With the Intel compiler, a Fortran function or subroutine can be exported to dll by ensuring that the !DEC$ is present to define the exported symbol:

! Note: this only works with the Intel Compiler
subroutine sqr(val)
    !DEC$ ATTRIBUTES DLLEXPORT, ALIAS:'sqr' :: sqr
    integer, intent(inout) :: val
    val = val ** 2
end subroutine sqr

If the above subroutine is contained in the example.f90 file, it is compiled with :

>>> ifort /dll example.f90

which produces the shared library example.dll. Inspecting this object with dumpbin:

>>> dumpbin /exports example.dll
1    0 00001000 sqr

The function can be imported to Python with the ctypes module and called with the following Python code:

import ctypes as ct

fortlib = ct.CDLL('example.dll')
val = 100
val = ct.pointer(ct.c_int(val)) # setup the pointer to the correct data structure
_ = fortlib.sqr(val)            # call the function
print(val[0])                   # index the memory address setup above

Considering the F2PY examples above, immediately it is clear that compiling Fortran in this way is easier, and calling the Fortran from Python in this way is more difficult. The advantages are that this is a C-independent (except for Python and the ctypes library) method to call a Fortran function from Python, and the same dll called from Python here could also be called from other languages (i.e. C# for petrel).

Maintaining GNU - Intel compatibility

Since many of the pygeostat functions can be compiled with either GNU or Intel tools using the FortranBuild class, a method to maintain a compatible Python calling sequence for the dll’s is required since one set of Python code should be able to call the Fortran function regardless of how it was compiled. This can be done by using the iso_c_binding module:

subroutine sqr2(val) BIND(C, NAME='sqr2')
    use iso_c_binding
    !DEC$ ATTRIBUTES DLLEXPORT :: sqr2
    integer, intent(inout) :: val
    val = val ** 2
end subroutine sqr2

The iso_c_binding in this case overwrites the name mangling for each compiler (gfortran and ifort), and ensures the call sequence from Python is constant. In this case the !DEC$ statement is ignored by gfortran and the function is available in the exported dll. Notably, the ALIAS is dropped from the !DEC$ export for Intel Fortran since the name in the binding overwrites this setting.

More Complicated Data Types

The main concern for calling Fortran functions in the dll’s from Python is how to setup the data structures. Consider the following function which take a 2-dimensional array of integers and computes some arbitrary value from the inputs by replacing the values in the array:

module example
    use iso_c_binding
    implicit none
contains
    subroutine sqr_2d_arr(nd, val) BIND(C, NAME='sqr_2d_arr')
        !DEC$ ATTRIBUTES DLLEXPORT :: sqr_2d_arr
        integer, intent(in) :: nd
        integer, intent(inout) :: val(nd, nd)
        integer :: i, j
        do j = 1, nd
        do i = 1, nd
            val(i, j) = (val(i, j) + val(j, i)) ** 2
        enddo
        enddo
    end subroutine sqr_2d_arr
end module example

The module is compiled with:

>>> ifort /dll example.f90

or:

>>> gfortran -shared example.f90 -o example.dll

and can be called from Python with:

import ctypes as ct
import numpy as np

# import the dll
fortlib = ct.CDLL('example.dll')

# setup the data
N = 10
nd = ct.pointer( ct.c_int(N) )          # setup the pointer
pyarr = np.arange(0, N, dtype=int) * 5  # setup the N-long
for i in range(1, N):                   # concatenate columns until it is N x N
    pyarr = np.c_[pyarr, np.arange(0, N, dtype=int) * 5]

# call the function by passing the ctypes pointer using the numpy function:
_ = fortlib.sqr_2d_arr(nd, np.ctypeslib.as_ctypes(pyarr))

print(pyarr)

Important Considerations

In the above Python code some things are notable:

  1. All data is passed by pointer reference to the Fortran dll, the data is modified in place if the variable is intent(inout), or replaced if the variable is intent(out). All memory is controlled on the Python side of things

  2. The memory must be allocated in some form on the Python side using np.zeros(size, dtype=type) (or similar) even if the variable is intent(out)

  3. The types of all data initialized on the Python size must match those being called in the Fortran module
    1. This is very important. For example, use int for integer, float for real*8, etc. Python float is double precision by default. This may not be a correct list, and is definitely not exhaustive, but it has worked with the limited testing that has been done with this method of calling Fortran

    2. the function ctypes.c_pointer() can be used to create the necessary references to non-array elements, be sure to use ctypes.c_int() and ctypes.c_double() (and others) as required for the needed parameter

    3. initializing arrays on the python end is strange. For example, say a Fortran subroutine is defined with an integer, intent(out) array with size nd, to initialize this array on the python side, you can use:

      import ctypes as ct
      intarray = (ct.c_int * nd)()
      

    and to give the array values for input:

    import ctypes as ct
    pyarray = np.zeros(nd, dtype=np.int)
    valarray = (ct.c_int * nd)(*pyarray)
    
  4. Numpy provides some convenient methods np.ctypeslib.as_ctypes() to pass initialized arrays with the correct types to the Fortran functions (as demonstrated above)

Mixed Language Debugging

The use of dll’s and ctypes library for calling the Fortran code provides some exciting opportunities to speed up code development while leveraging the plotting and data management capabilities of pygeostat with the GSLIB Fortran geostatistical library. Visual Studio 2015 Community is currently free and has recently developed a set of Python tools that will utilize Python distributions found on the machine. Combined with the Intel Compiler, projects containing Fortran code compiled to dll, and called VIA the Python libraries and tools presented above, can be debugged after it is called from Python!