bmorph

bmorph is a package for streamflow bias correction. bmorph is designed to work in conjunction with the mizuRoute streamflow routing model. bmorph provides methods for performing bias corrections that are spatially consistent as well as offers methods which can account for process-dependent biases. We discuss bmorph’s methodological details on the bias correction page. For an overview of the structure of bmorph, see the package overview, below.

Installation

We recommend installing bmorph using mamba using the command:

mamba create -n bmorph -c conda-forge bmorph

If you want to install from source, we provide an environment in environment.yml to manage the dependencies. You can build the environment by running:

mamba env create -f environment.yml

Then, to install bmorph run,

conda activate bmorph
python setup.py develop
python -m ipykernel install --user --name bmorph

Getting started

You can run through our interactive tutorial here. A static version of the tutorial can be found here.

bmorph Overview

Flowchart describing bmorph package directory structure to key directories util, core, and evaluation. util describes mizuroute_utils as "Configuring output from mizuRoute into input for bmorph. Blending and Conditioning pre-processing occurs here." core describes workflows as "Automated workflows for applying bmorph across multiple sites and time intervals" and bmorph as "Where bias correction computations take place for individual site and time increments." evaluation describes plotting as "Plotting functions for analyzing and visualizing bias corrections relative to reference and uncorrected data" and simple_river_network as "Pseudo-physical river network visualization for depicting spatial variables at stream segments across the watershed."

core

The core directory is where the functions for performing bias correction are located. The bmorph.py module contains the functions for individual bias corrections. workflows.py contains the functions that define some helper workflows that make it easier to apply bias corrections across a stream network. More on the specifics of how bias correction is performed can be found in the Bias Correction page.

util

The util directory contains the mizuroute_utils.py module for organizing data exported by mizuRoute into an easily accessible form for bmorph. More on how data is handled can be found on the Data Overview page.

evaluation

The evaluation directory provides tools for plotting and analyzing results. More on plotting functions and implemented statistics can be found on the Evaluation of Bias Correction page. More on the Simple River Network tool can be found on the Simple River Network (SRN) page.

Sitemap

Bias Correction

This page describes the implementation of bmorph bias correction for streamflow.

bmorph Functionality

Flowchart describing bmorph bias correction process from initial routing to bias correction to secondary routing, outlining the steps that must occur for conditioning and spatial consistency to be utilized in bias correction.

The figure above shows the full workflow and options available in bmorph. The input, in the upper left, shows that bmorph takes in routed streamflows. Then, if necessary, we perform some pre-processing steps as implemented in the bmorph.utils.mizuroute_utils.to_bmorph function. These preprocessing steps are necessary if either spatially consistent bias correction (through blending) and/or process-aware bias correction (through conditioning) are selected. Following the pre-processing steps bias correction is performed according to the selected options. If spatially consistent bias correction is performed the bias corrected local flows are then rerouted through mizuRoute.

PresRat and EDCDFm

The basic underlying bias correction technique that bmorph implements is the PresRat bias correction from Pierce et al. (2015), which is an extension of Equidistant quantile matching (EDCDFm) technique of Li et al. (2010). bmorph uses the amended EDCDFm to compute multiplicative changes in the quantiles of a Cumulative Distribution Function (CDF). bmorph also modifies these underlying techniques by taking a rolling window over successive bias correction intervals, which eliminates statistical artifacts at the edges of these intervals.

Conditional Quantile Mapping (CQM)

Conditional Quantile Mapping, (CQM), incorporates meteorologic data into the bmorph bias correction process to condition flow time series to other hydrologically relevant information. By creating a series of CDFs based on meteorologic data, (such as minimum daily temperature), bmorph can select a CDF that will not only correct the time series, but most closely match the meteorologic conditions simulated.

Multidimensional CDF functions are shown has heat maps with raw taking up left and reference taking up center. A cumulative probability plot compares raw model CDF and Reference flow CDF for one meteorologic slice of the heatmaps left of it. Arrows demonstrate mapping the reference data in the heat maps to the raw model data for bias correction.
\[\tilde{x}_{mp} = x_{mp} + F^{-1}_{oc}(F_{mp}(x_{mp}|y_{mp})|y_{oc}) - F^{-1}_{mc}(F_{mp}(x_{mp}|y_{mp})|y_{mc})\]

Spatially Consistent Bias correction (SCBC)

To address the artifacts that traditional bias correction techniques can introduce into distributed streamflow predictions we have developed a regionalization technique which blends the target distribution between reference flow sites. This makes use of the topology of the river network by selecting target distributions which are nearby and interpolating between them as a function of distance. A schematic representation of this blending is shown below.

There are two edge cases which require special handling. In the case where there are no gauge sites to select either up or down stream we choose the reference site according to some user-definable criteria. There are a few different options for doing this: leaving the sites empty (leave_null), using xarray’s forward_fill, or selecting based on some measure of similarity (r2, kldiv, kge). When a site has multiple upstream gauge sites as tributaries we simply choose the closest. When we make use of this blended bias correction we do not simply bias correct total flows to produce bias corrected total flows, but rather use the ratio of the bias corrected total flow to the raw total flow to develop a correction factor which is then applied to the local (incremental) flow along each river reach in the network. These corrected local flows are then re-routed through mizuRoute to produce a spatially-consistent bias-corrected streamflow, thus we refer to this method as SCBC.

In blending, attributes from one gauge site are mixed with another gauge site depending on how close the intermediate seg is to each gauge site, (depicted left by 5 circles translating from pink to purple to blue across the segs). As a result, intermediate CDFs can be produced by transitioning from one gauge site CDF to another, (depicted right by pink CDF curves transforming into purple then blue CDFs curves).

The blend factor describes how upstream and downstream flows should be interpolated, or “blended” together. It is used in the following way,

Let …
UM, DM = Upstream Measure, Downstream Measure (length, r2, Kullback-Leibler Divergence, or Kling-Gupta Efficiency)
BF = Blend Factor
UF, DF, TF = Upstream Corrected Flow, Downstream Corrected Flow, Total Corrected Flow
\[BF = \frac{UM}{UM+DM} TF = (BF*UF) + ((1-BF)*DF)\]

Selecting bias correction techniques

Below we show how to implement and combine the the various options described above.

Independent Bias Correction: Univariate (IBC_U)

Univariate Independent Bias Correction (IBC_U) is considered the traditional bias correction method implemented here as described in the EDCDFm section above. This method can only be performed at sites with reference data, which is useful when gauge sites can measure flows but does not guarantee spatially consistent corrections amongst a series of gauge sites.

Workflow function : `bmorph.core.workflows.apply_bmorph`_

raw_ts = basin_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
train_ts = basin_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
obs_ts = basin_met_seg.sel(seg=seg)['up_ref_flow'].to_series()

ibc_u_flows[site], ibc_u_mults[site] = bmorph.workflows.apply_interval_bmorph(
    raw_ts, train_ts, obs_ts,
    apply_window, train_window, reference_window,
    interval=interval, overlap=overlap)
Independent Bias Correction: Conditioned (IBC_C)

Similar to IBC_U, Conditioned Independent Bias Correction (IBC_C) can only apply corrections at gauge sites where there is reference flow data. IBC_C integrates meteorologic data into the bmorph bias correction process as described in bmorph.core.bmorph.cqm. Conditioning allows hydrologic process based knowledge to be included in the bias correction process that can help to root bias corrections in meteorologic trends.

Workflow function : `bmorph.core.workflows.apply_bmorph`_

raw_ts = basin_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
train_ts = basin_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
ref_ts = basin_met_seg.sel(seg=seg)['up_ref_flow'].to_series()
cond_var = basin_met_seg.sel(seg=seg)[f'up_{condition_var}'].to_series()

ibc_c_flows[site], ibc_c_mults[site] = bmorph.workflows.apply_bmorph(
    raw_ts, train_ts, ref_ts,
    apply_window, train_window, reference_window,
    condition_ts=cond_var,
    interval=interval, overlap=overlap)

Notice that in order to use conditioning, the *_y variables are needed to specify which meteorological time series to use in conditioning.

Spatially Consistent Bias Correction: Univariate (SCBC_U)

Univariate Spatially Consistent Bias Correction (SCBC_U) aims to address IBC’s inability to correct flows at non-gauge sites where reference timeseries do not exist. Spatial consistency is conserved by performing bias corrections at every river segment, or seg, and then rerouting the corrected flows through mizuRoute. Reference data for each seg that is not a gauge site is done by creating proxy reference data for each seg from upstream and downstream proxy gauge flows that can be combined, or blended, together to create what the reference flow data for that seg should look like, as described in Spatial Consistency: Reference Site Selection & CDF Blend Factor.

Workflow functions : `bmorph.core.workflows.apply_blendmorph`_, `bmorph.core.workflows.run_parallel_scbc`_

univariate_config = {
'train_window': train_window,
'apply_window': apply_window,
'reference_window': reference_window,
'interval': interval,
'overlap': overlap,
}

unconditioned_seg_totals = bmorph.workflows.run_parallel_scbc(
basin_met_seg, client, output_prefix, mizuroute_exe, univariate_config)
Spatially Consistent Bias Correction: Conditioned (SCBC_C)

Conditioned Spatially Consistent Bias Correction (SCBC_C) combines the meteorologic conditioning elements of IBC_C with the spatial consistency of SCBC_U. This implementation of SCBC factors in meteorologic variables given into the formulation of reference flows for each seg to be corrected to. Defined by the hydrologic response units, or hru’s, they impact, meteorologic data is mappable to each seg within the watershed topology. In IBC_C, only the data mapped to gauge sites would be used in bias correction, whereas SCBC_C can utilize meteorologic data across the watershed as it incorporates all defined segs.

Workflow functions : `bmorph.core.workflows.apply_blendmorph`_, `bmorph.core.workflows.run_parallel_scbc`_

conditonal_config = {
'train_window': train_window,
'apply_window': apply_window,
'reference_window': reference_window,
'interval': interval,
'overlap': overlap,
'condition_var': condition_var
}

conditioned_seg_totals = bmorph.workflows.run_parallel_scbc(
basin_met_seg, client, output_prefix, mizuroute_exe, conditonal_config)

Again, because we are conditioning our bias corrections, condition_var must be included in running this script.

Citations

Pierce, D. W., Cayan, D. R., Mauerer, E. P., Abatzoglou J. T., & Hegewisch, K. C. (2015). Improved Bias Correction Techniques for Hydrological Simulations of Climate Change. *Journal of Hydrometeorology, 16*(6), 2421-2442. http://dx.doi.org/10.1175/JHM-D-14-0236.1

Li, H., Sheffield, J., & Wood, E. F. (2010). Bias correction of monthly precipitation and temperature fields from Intergovernmental Panel on Climate Change AR4 models using equidistant quantile matching. *Journal of Geophysical Research: Atmospheres, 115*(D10), 1-20. https://doi.org/10.1029/2009JD012882

bmorph Tutorial: Getting your first bias corrections

This notebook demonstrates how to setup data for and bias correct it through bmorph, it contains the same information as the tutorial page. In this notebook, we will demonstrate how to perform four variations of bias correction. For more information about the details of each method refer to the bias correction page page. The rest of the documentation for bmorph can be found here.

  • Independent Bias Correction: Univariate (IBC_U) : IBC_U is the traditional bias correction method. This method can only be performed at sites with reference data.

  • Independent Bias Correction: Conditioned (IBC_C) : IBC_C allows for correcting for specific biases that are process-dependent. This method can only be performed at sites with reference data.

  • Spatially Consistent Bias Correction: Univariate (SCBC_U): SCBC_U corrects local flows at each river reach in the network, and then reroutes them to aggregate, producing bias corrected flows at locations of interest where reference data does and does not exist.

  • Spatially Consistent Bias Correction: Conditioned (SCBC_C): SCBC_C also corrects the local flows like SCBC_U, but allows for conditioning on dependent processes.

Expectations using this Tutorial

Before we begin, we are expecting that anyone using this notebook is already familiar with the following:

This tutorial also makes use of a number of other open source libraries & programs which, although no familiarity is required, may be of interest: - geopandas - tqdm - dask - networkx - scipy - scikit-learn

While proficiency with each item is not required to understand the notebook, the details of these items will not be included in the tutorial itself.

Learning objectives & goals

By the end of this tutorial you will learn

  • The data requirements to perform streamflow bias corrections with bmorph

  • Input and output formats that bmorph expects

  • The meaning of parameters for algorithmic control of bias corrections

  • How to perform independent bias correction at locations with reference flows

  • How to perform spatially consistent bias corrections across a river network, as well as rerouting the corrected flows with mizuRoute

  • How to use the analysis, evaluation, and visualization tools built into bmorph

We expect that this tutorial will take approximately half an hour for someone familiar with Python.

Import Packages and Load Data

We start by importing the necessary packages for the notebook. This tutorial mainly shows how to use bmorph.core.workflows and bmorph.core.mizuroute_utils to bias correct streamflow data in the Yakima River Basin.

%pylab inline
%load_ext autoreload
%autoreload 2
%reload_ext autoreload

import warnings
warnings.filterwarnings('ignore')
import os
import sys
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
import matplotlib as mpl
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from dask.distributed import Client, progress

# Set a bigger default plot size
mpl.rcParams['figure.figsize'] = (10, 8)
mpl.rcParams['font.size'] = 22

# Import bmorph, and mizuroute utilities
import bmorph
from bmorph.util import mizuroute_utils as mizutil
from bmorph.evaluation.simple_river_network import SimpleRiverNetwork
from bmorph.evaluation import plotting as bplot

# Set the environment directory, this is a workaround for portability
envdir = os.path.dirname(sys.executable)

# Import pyproj and set the data directory, this is a workaround for portability
import pyproj
pyproj.datadir.set_data_dir(f'{envdir}/../share/proj')
Populating the interactive namespace from numpy and matplotlib

Getting the sample data

The following code cell has two commands preceded by !, which indicates that they are shell commands. They will download the sample data and unpackage it. The sample data can be viewed as a HydroShare resource here. This cell may take a few moments.

! wget https://www.hydroshare.org/resource/fd2a347d34f145b4bfa8b6bff39c782b/data/contents/bmorph_testdata.tar.gz
! tar xvf bmorph_testdata.tar.gz
--2021-07-02 12:20:36--  https://www.hydroshare.org/resource/fd2a347d34f145b4bfa8b6bff39c782b/data/contents/bmorph_testdata.tar.gz
Resolving www.hydroshare.org (www.hydroshare.org)... 152.54.2.89
Connecting to www.hydroshare.org (www.hydroshare.org)|152.54.2.89|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /resource/fd2a347d34f145b4bfa8b6bff39c782b/data/contents/bmorph_testdata.tar.gz/ [following]
--2021-07-02 12:20:37--  https://www.hydroshare.org/resource/fd2a347d34f145b4bfa8b6bff39c782b/data/contents/bmorph_testdata.tar.gz/
Reusing existing connection to www.hydroshare.org:443.
HTTP request sent, awaiting response... 500 Internal Server Error
2021-07-02 12:20:37 ERROR 500: Internal Server Error.

yakima_workflow/
yakima_workflow/mizuroute_configs/
yakima_workflow/mizuroute_configs/.ipynb_checkpoints/
yakima_workflow/mizuroute_configs/.ipynb_checkpoints/reroute_deschutes_univariate-checkpoint.control
yakima_workflow/mizuroute_configs/.ipynb_checkpoints/reroute_deschutes_conditional-checkpoint.control
yakima_workflow/mizuroute_configs/__init__.py
yakima_workflow/input/
yakima_workflow/input/nrni_reference_flows.nc
yakima_workflow/input/yakima_raw_flows.nc
yakima_workflow/input/yakima_met.nc
yakima_workflow/output/
yakima_workflow/output/__init__.py
yakima_workflow/.ipynb_checkpoints/
yakima_workflow/topologies/
yakima_workflow/topologies/.ipynb_checkpoints/
yakima_workflow/topologies/.ipynb_checkpoints/param.nml-checkpoint.default
yakima_workflow/topologies/param.nml.default
yakima_workflow/topologies/yakima_huc12_topology.nc
yakima_workflow/topologies/yakima_huc12_topology_scaled_area.nc
yakima_workflow/gis_data/
yakima_workflow/gis_data/crcc_pointlist.txt
yakima_workflow/gis_data/yakima_hru.shp
yakima_workflow/gis_data/yakima_seg.prj
yakima_workflow/gis_data/yakima_seg.cpg
yakima_workflow/gis_data/yakima_hru.dbf
yakima_workflow/gis_data/yakima_seg.shx
yakima_workflow/gis_data/yakima_seg.shp
yakima_workflow/gis_data/yakima_seg.dbf
yakima_workflow/gis_data/yakima_hru.cpg
yakima_workflow/gis_data/yakima_hru.prj
yakima_workflow/gis_data/yakima_hru.shx
yakima_workflow/README.md

Test dataset: The Yakima River Basin

Before getting into how to run bmorph, let’s look at what is in the sample data. You will note that we now have a yakima_workflow directory downloaded in the same directory as this notebook (found by clicking on the 1tutorial/ tab left by Binder or navigating to this directory in the file explorer of your choice). This contains all of the data that you need to run the tutorial. There are a few subdirectories:

  • gis_data: contains shapefiles, this is mainly used for plotting, not for analysis

  • input: this is the input meteorologic data, simulated streamflow to be corrected, and the reference flow dataset

  • mizuroute_configs: this is an empty directory that will automatically be populated with mizuroute configuration files during the bias correction process

  • output: this is an empty directory that will be where the bias corrected flows will be written out to

  • topologies: this contains the stream network topologies that will be used for routing flows via mizuroute

The Yakima River Basin is a tributary of the Columbia river basin in the Pacific northwestern United States. It’s western half is situated in the Cascade mountains and receives seasonal snowpack. The eastern half is lower elevation and is semi-arid. Let’s load up the shapefiles for the sub-basins and stream network and plot it. In this discretization we have 285 sub-basins (HRU) and 143 stream segments.

Setting up some metadata

Next we set up the gauge site names and their respective river segment identification numbers, or site’s and seg’s. This will be used throughout to ensure the data does not get mismatched. bmorph uses the convention: site_to_seg = { site_0_name : site_0_seg, ..., site_n_name, site_n_seg} Since it is convenient to be able to access this data in different orders we also set up some other useful forms of these gauge site mappings for later use. We will show you on the map where each of these sites are on the stream network in the next section.

site_to_seg = {'KEE' : 4175, 'KAC' : 4171, 'EASW': 4170,
               'CLE' : 4164, 'YUMW': 4162, 'BUM' : 5231,
               'AMRW': 5228, 'CLFW': 5224, 'RIM' : 5240,
               'NACW': 5222, 'UMTW': 4139, 'AUGW': 594,
               'PARW': 588,  'YGVW': 584,  'KIOW': 581}

seg_to_site = {seg: site for site, seg in site_to_seg.items()}
ref_sites = list(site_to_seg.keys())
ref_segs = list(site_to_seg.values())

Mapping the Yakima River Basin

With our necessary metadata defined let’s make a couple of quick plots orienting you to the Yakima River Basin. To do so we will read in a network topology file, and shapefiles for the region. We will make one plot which has the Yakima River Basin, along with stream network, subbasins, and gauged sites labeled. We will also plot a network diagram which displays in an abstract sense how each stream segment is connected. For the latter we use the SimpleRiverNetwork that we’ve implemented in bmorph. To set up the SimpleRiverNetwork we need the topology of the watershed (yakima_topo). The river network and shapefiles that we use to draw the map, and perform simulations on is the Geospatial Fabric. In the Geospatial Fabric, rivers and streams are broken into segments, each with a unique identifier, as illustrated above in the site_to_seg definition. The locations of the gauged sites are shown in red, while all of the un-gauged stream segments are shown in darker grey. The sub-basins for each stream segment are shown in lighter grey.

yakima_topo = xr.open_dataset('yakima_workflow/topologies/yakima_huc12_topology.nc').load()
yakima_hru = gpd.read_file('./yakima_workflow/gis_data/yakima_hru.shp').to_crs("EPSG:4326")
yakima_seg = gpd.read_file('./yakima_workflow/gis_data/yakima_seg.shp').to_crs("EPSG:4326")

fig, axes = plt.subplots(1, 2, figsize=(14, 9), gridspec_kw={'width_ratios': [1.5, 1]})
axes[1].invert_xaxis() # flip makes nodes line up better with map

# Plot the subbasins and stream segments
ax = yakima_hru.plot(color='gainsboro', edgecolor='white', ax=axes[0])
yakima_seg.plot(ax=ax, color='grey')

# Plot the reference flow sites
ref_lats = yakima_seg[yakima_seg['seg_id'].isin(ref_segs)]['end_lat']
ref_lons = yakima_seg[yakima_seg['seg_id'].isin(ref_segs)]['end_lon']
ref_names = [seg_to_site[s] for s in yakima_seg[yakima_seg['seg_id'].isin(ref_segs)]['seg_id']]
ax.scatter(ref_lons, ref_lats, color='crimson', zorder=100, marker='s', label='Gauged locations')
for name, lat, lon in zip(ref_names, ref_lats, ref_lons):
    if name in ['AUGW', 'EASW']:
        # Set labels at a slightly different position so we don't have overlaps
        offset_x, offset_y = -0.16, -0.04
    else:
        offset_x, offset_y = 0.02, 0.02
    ax.text(lon+offset_x, lat+offset_y, name, fontsize=10, color='white', weight='bold',
            bbox=dict(boxstyle="round", ec='crimson', fc='crimson', ),)
ax.legend()
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

# Now plot the abstracted river network, with gauged sites highlighted
yakima_srn = SimpleRiverNetwork(yakima_topo)
yakima_srn.draw_network(color_measure=yakima_srn.generate_node_highlight_map(ref_segs),
                        cmap=mpl.cm.get_cmap('Set1_r'), ax=axes[1], node_size=60)
plt.tight_layout(pad=0)
_images/bmorph_tutorial_7_0.png

Loading in the streamflow data and associated meteorological data

Now we load in the meteorological data that will be used for conditional bias correction: daily minimum temperature (tmin), seasonal precipitation (prec), and daily maximum temperature (tmax). In principle, any type of data can be used for conditioning, (i.e. Snow-Water Equivalent (SWE), groundwater storage, landscape slope angle, etc.). This data is initially arranged on the sub-basins, rather than stream segments. We will remap these onto the stream segments in a moment, so that they can be used in the bias correction process.

Finally, we load the simulated flows and reference flows. bmorph is designed to bias correct streamflow simulated with mizuroute. We denote the simulated flows as the “raw” flows when they are uncorrected, and the flows that will be used to correct the raw flows as the “reference” flows. During the bias correction process bmorph will map the raw flow values to the reference flow values by matching their quantiles. In our case the reference flows are estimated no-reservoir-no-irrigation (NRNI) flows taken from the River Management Joint Operating Committee (RMJOC).

All of the datasets discussed are in the xarray Dataset format, which contains the metadata associated with the original NetCDF files. You can inspect the data simply by printing it out. For instance, here you can see that both the reference flows and raw flows (named IRFroutedRunoff, for “Impulse Response Function routed runoff” from mizuRoute) are in cubic meters per second.

# Meteorologic data
yakima_met = xr.open_dataset('yakima_workflow/input/yakima_met.nc').load()
# Remove the 17* prefix, which was used to denote the domain covers the region 17 of the Hydrologic Unit Maps
yakima_met['hru'] = (yakima_met['hru'] - 1.7e7).astype(np.int32)

# Raw streamflows
yakima_raw = xr.open_dataset('yakima_workflow/input/yakima_raw_flows.nc')[['IRFroutedRunoff', 'dlayRunoff', 'reachID']].load()
# Update some metadata
yakima_raw['seg'] = yakima_raw.isel(time=0)['reachID'].astype(np.int32)

# Reference streamflows - this contains sites from the entire Columbia river basin, but we will select out only the `ref_sites`
yakima_ref = xr.open_dataset('yakima_workflow/input/nrni_reference_flows.nc').rename({'outlet':'site'})[['seg', 'seg_id', 'reference_flow']]
# Pull out only the sites in the Yakima basin
yakima_ref = yakima_ref.sel(site=ref_sites).load()

print('Reference flow units: ', yakima_ref['reference_flow'].units)
print('Raw flow units: ', yakima_raw['IRFroutedRunoff'].units)
Reference flow units:  m3/s
Raw flow units:  m3/s

Convert from mizuroute output to bmorph format

mizuroute_utils is our utility module that will handle converting mizuroute outputs to the format that we need for bmorph. We will use the mizutil.to_bmorph function to merge together all of the data we previously loaded, and calculate some extra pieces of information to perform spatially consistent bias corrections (SCBC). For more information about how we perform SCBC see the SCBC page in the documentation. Now we pass our data in to to_bmorph, the primary utility function for automating bmorph pre-processing.

yakima_met_seg = mizutil.to_bmorph(yakima_topo, yakima_raw, yakima_ref, yakima_met,  fill_method='r2')

Setting up bmorph configuration and parameters

Before applying bias correction we need to specify some parameters and configuration for correction. Returning to these steps can help fine tune your bias corrections to the basin you are analyzing.

The train_window is what we will use to train the bias correction model. This is the time range that is representative of the basin’s expected behavior that bmorph should mirror.

The bmorph_window is when bmorph should be applied to the series for bias correction.

Lastly the reference_window is when the reference flows should be used to smooth the Cumulative Distribution function (CDF) of the bias corrected flows. This is recommended to be set as equivalent to the train_window.

train_window = pd.date_range('1981-01-01', '1990-12-30')[[0, -1]]
reference_window = train_window
apply_window= pd.date_range('1991-01-01', '2005-12-30')[[0, -1]]

interval is the length of bmorph‘s application intervals, typically a factor of years to preserve hydrologic relationships. Note that for pandas.DateOffset, ’year’ and ‘years’ are different and an ‘s’ should always be included here for bmorph to run properly, even for a single year.

overlap describes how many days the bias correction cumulative distribution function windows should overlap in total with each other. overlap is evenly distributed before and after this window. This is used to reduce discontinuities between application periods. Typical values are between 60 and 120 days.

The two “smoothing” parameters are used to smooth the timeseries before the CDFs are computed and have two different uses. THe n_smooth_short is used in the actual calculation of the CDFs which are used to perform the quantile mapping. Smoothing is used to ensure that the CDFs are smooth. Setting a very low value here may cause noisier bias corrected timeseries. Setting a very high value may cause the bias corrections to not match extreme flows. Typical values are from 7-48 days.

n_smooth_long on the other hand is used to preserve long-term trends in mean flows from the raw flows. Typical values are 270 to 720 days. Using very low values may cause bias corrections to be degraded. This feature can be turned off by setting n_smooth_long to None.

condition_var names the variable to use in conditioning, such as maximum temperature (tmax), 90 day rolling total precipitation (seasonal_precip), or daily minimum temperature (tmin). At this time, only one conditioning meteorological variable can be used per bmorph execution. In this example, tmax and seasonal_precip have been commented out to select tmin as the conditioning variable. If you wish to change this, be sure to either change which variables are commented out or change the value of condition_var itself. For now we will just use tmin, which is the daily minimum temperature. Our hypothesis on choosing tmin is that it will be a good indicator for errors in snow processes, which should provide a good demonstration for how conditional bias correction can modify flow timing in desirable ways.

Further algorithmic controls can be used to tune the conditional bias correction as well. Here we use the histogram method for estimating the joint PDF, which is provided as hist as the method. We also have implemented a kernel density estimator which will be used if you set the method to kde. While kde tends to make smoother PDFs it comes with a larger computational cost. For both methods we specify the number of xbins and ybins which control how fine grained the joint PDFs should be calculated as. Setting a very high number here can potentially cause jumpy artifacts in the bias corrected timeseries.

# bmorph parameter values
interval = pd.DateOffset(years=5)
overlap = 90
n_smooth_long = 365
n_smooth_short = 21

# Select from the various available meteorologic fields for conditioning
#condition_var = 'tmax'
#condition_var = 'seasonal_precip'
condition_var = 'tmin'

Here we name some configuration parameters for bmorph’s conditional and univariate bias correction methods, respectively.

output_prefix will be used to write and load files according to the basin’s name, make certain to update this with the actual name of the basin you are analyzing so you can track where different files are written.

conditional_config = {
    'data_path':  './yakima_workflow',
    'output_prefix': "yakima",
    'raw_train_window': train_window,
    'ref_train_window': reference_window,
    'apply_window': apply_window,
    'interval': interval,
    'overlap': overlap,
    'n_smooth_long': n_smooth_long,
    'n_smooth_short': n_smooth_short,
    'condition_var': condition_var,
    'method': 'hist',
    'xbins': 100,
    'ybins': 100,
}

univariate_config = {
    'data_path':  './yakima_workflow',
    'output_prefix': "yakima",
    'raw_train_window': train_window,
    'ref_train_window': reference_window,
    'apply_window': apply_window,
    'interval': interval,
    'overlap': overlap,
    'n_smooth_long': n_smooth_long,
    'n_smooth_short': n_smooth_short,
}

You made it! Now we can actually bias correct with bmorph!

First off we run the Independent Bias Corrections, which are completely contained in the cell below.

Here we run through each of the gauge sites and correct them individually. Since independent bias correction can only be performed at locations with reference data, corrections are only performed at the gauge sites here.

Independent bias correction

ibc_u_flows = {}
ibc_u_mults = {}
ibc_c_flows = {}
ibc_c_mults = {}
cond_vars = {}

raw_flows = {}
ref_flows = {}
for site, seg in tqdm(site_to_seg.items()):
    raw_ts =   yakima_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
    train_ts = yakima_met_seg.sel(seg=seg)['IRFroutedRunoff'].to_series()
    obs_ts =   yakima_met_seg.sel(seg=seg)['up_ref_flow'].to_series()
    cond_var = yakima_met_seg.sel(seg=seg)[f'up_{condition_var}'].to_series()
    ref_flows[site] = obs_ts
    raw_flows[site] = raw_ts
    cond_vars[site] = cond_var

    ## IBC_U (Independent Bias Correction: Univariate)
    ibc_u_flows[site], ibc_u_mults[site] = bmorph.workflows.apply_bmorph(
        raw_ts, train_ts, obs_ts, **univariate_config)

    ## IBC_C (Independent Bias Correction: Conditioned)
    ibc_c_flows[site], ibc_c_mults[site] = bmorph.workflows.apply_bmorph(
        raw_ts, train_ts, obs_ts, condition_ts=cond_var, **conditional_config)
0%|          | 0/15 [00:00<?, ?it/s]

Spatially consistent bias correction

Here we specify where the mizuroute executable is installed on your system.

mizuroute_exe = f'{envdir}/route_runoff.exe'

Now we use run_parallel_scbc to do the rest. In the first cell we will run the spatially-consistent bias correction without any conditioning. The second cell will run the spatially-consistent bias correction with conditioning. This produced bias corrected flows at all 143 stream segments in the Yakima River Basin. Finally, we select out the corrected streamflows for both cases (with and without conditioning) to only contain the gauged sites. Selecting out only the gauged locations allows us to compare the spatially-consistent methods with the independent bias corrections. Finally we combine all the data into a single xarray Dataset to make analysis easier.

# SCBC without conditioning
unconditioned_seg_totals = bmorph.workflows.apply_scbc(yakima_met_seg, mizuroute_exe, univariate_config)
0%|          | 0/143 [00:00<?, ?it/s]
# SCBC with conditioning
conditioned_seg_totals = bmorph.workflows.apply_scbc(yakima_met_seg, mizuroute_exe, conditional_config)
0%|          | 0/143 [00:00<?, ?it/s]
# Here we select out our rerouted gauge site modeled flows.
unconditioned_site_totals = {}
conditioned_site_totals = {}
for site, seg in tqdm(site_to_seg.items()):
    unconditioned_site_totals[site] = unconditioned_seg_totals['IRFroutedRunoff'].sel(seg=seg).to_series()
    conditioned_site_totals[site] = conditioned_seg_totals['IRFroutedRunoff'].sel(seg=seg).to_series()
0%|          | 0/15 [00:00<?, ?it/s]

Merging together the results

# Merge everything together
yakima_analysis = xr.Dataset(coords={'site': list(site_to_seg.keys()), 'time': unconditioned_seg_totals['time']})
yakima_analysis['scbc_c'] = bmorph.workflows.bmorph_to_dataarray(conditioned_site_totals, 'scbc_c')
yakima_analysis['scbc_u'] = bmorph.workflows.bmorph_to_dataarray(unconditioned_site_totals, 'scbc_u')
yakima_analysis['ibc_u'] = bmorph.workflows.bmorph_to_dataarray(ibc_u_flows, 'ibc_u')
yakima_analysis['ibc_c'] = bmorph.workflows.bmorph_to_dataarray(ibc_c_flows, 'ibc_c')
yakima_analysis['raw'] = bmorph.workflows.bmorph_to_dataarray(raw_flows, 'raw')
yakima_analysis['ref'] = bmorph.workflows.bmorph_to_dataarray(ref_flows, 'ref')
yakima_analysis.to_netcdf(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed.nc')
# And also output it as some CSV files
yakima_analysis['scbc_c'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_scbc_c.csv')
yakima_analysis['scbc_u'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_scbc_u.csv')
yakima_analysis['ibc_u'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_ibc_u.csv')
yakima_analysis['ibc_c'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_ibc_u.csv')
yakima_analysis['raw'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_raw.csv')
yakima_analysis['ref'].to_pandas().to_csv(f'./yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed_ref.csv')

Now let’s take a look at our results

If you look closely, the following plots are the same ones included in Plotting! Because the plotting functions expect the variable seg, we will need to rename site to seg for them to properly run.

yakima_ds = xr.open_dataset(f'yakima_workflow/output/{univariate_config["output_prefix"]}_data_processed.nc')
yakima_ds = yakima_ds.rename({'site':'seg'})

Let’s pick a few sites and colors to plot for consistency. To simplify our plots, we will only focus on scbc_c in the dataset we just created. The methods do allow for multiple methods to be compared at once however, so we will still need to store the singular scbc_c in a list.

select_sites = ['KIOW','YUMW','BUM']
select_sites_2 = ['KIOW','CLFW','BUM','UMTW']
bcs = ['scbc_c', 'scbc_u', 'ibc_c', 'ibc_u']
colors = ['grey', 'black', 'red', 'orange', 'purple', 'blue']
Time Series

Here we plot the mean weekly flows for some of the sites in Yakima River Basin. You can change or add sites above, but we will start with a small number of sites to make the plots more tractable. In the following function call you

As mentioned, these averages are computed on weekly intervals to simplify the figure, but can also be plotted on daily or monthly intervals for more or less granularity. You can also change the reduce_func to calculate any other statistic over the dataset (you might try np.median or np.var for instance). Don’t forget to change the statistic_label for other measures!

bplot.plot_reduced_flows(
    flow_dataset=yakima_ds,
    plot_sites=select_sites_2,
    interval='week',
    reduce_func=np.mean,
    statistic_label='Mean',
    raw_var='raw', raw_name="Uncorrected",
    ref_var='ref', ref_name="Reference",
    bc_vars=bcs, bc_names=[bc.upper() for bc in bcs],
    plot_colors=colors
)
(<Figure size 1440x864 with 4 Axes>, <AxesSubplot:title={'center':'UMTW'}>)
_images/bmorph_tutorial_37_1.png

From the plot above we can see that the conditional corrections (x_C methods) have more accurate flow timings, particularly during the falling limb of the hydrograph. This hints that our hypothesis on correcting on daily minimum temperature would provide a good proxy for correcting snowmelt biases. We will explore this a little bit more later.

We also see that generally the SCBC_x and IBC_x methods are fairly similar in the mean, with an exception at CLFW. This indicates that the spatially consistent bias correction produces useful bias corrections. The advantage of the SCBC method is that we produce bias corrections on every river reach, as well as produce bias corrected incremental flows which are consistent across the network.

Scatter

This compares how absolute error changes through each bias correction with Q being stream discharge. 1 to 1 and -1 to 1 lines are plotted for reference, as points plotted vertically between the lines demonstrates a reduction in absolute error while points plotted horizontally between the lines demonstrates an increase in absolute error for each flow time.

bplot.compare_correction_scatter(
    flow_dataset= yakima_ds,
    plot_sites = select_sites,
    raw_var = 'raw',
    ref_var = 'ref',
    bc_vars = bcs,
    bc_names = [bc.upper() for bc in bcs],
    plot_colors = list(colors[2:]),
    pos_cone_guide = True,
    neg_cone_guide = True,
    symmetry = False,
    title = '',
    fontsize_legend = 120,
    alpha = 0.3
)
_images/bmorph_tutorial_39_0.png
Probabilitiy Distribtutions

Since probability distributions are used to predict extreme flow events and are what bmorph directly corrects, looking at them will give us greater insight to the changes we made.

bplot.compare_mean_grouped_CPD(
    flow_dataset= yakima_ds,
    plot_sites = select_sites,
    grouper_func = bplot.calc_water_year,
    figsize = (60,40),
    raw_var = 'raw', raw_name = 'Uncorrected',
    ref_var = 'ref', ref_name = 'Reference',
    bc_vars = bcs, bc_names = [bc.upper() for bc in bcs],
    plot_colors = colors,
    linestyles = 2 * ['-','-','-'],
    markers = ['o', 'X', 'o', 'o', 'o', 'o'],
    fontsize_legend = 90,
    legend_bbox_to_anchor = (1.9,1.0)
);
_images/bmorph_tutorial_42_0.png

This function is also capable of subsetting data by month should you want to compare only January flows for example. Because bmorph makes changes based on flow distributions, this plot is the closest to directly analyzing how the different methods correct flows.

Simple River Network

Finally, we can plot information of the SCBC across the simple river network. Let’s look at the difference in the average percent difference for both SCBC_U and SCBC_C. From the timeseries plots created earlier you might have noticed that the conditional bias corrections produced lower flows in the spring months. We will start by looking only at those months. You might try changing the season if you’re interested.

season = 'MAM' # Choose from DJF, MAM, JJA, SON
scbc_c = conditioned_seg_totals['IRFroutedRunoff']
scbc_u = unconditioned_seg_totals['IRFroutedRunoff']
raw = yakima_met_seg['IRFroutedRunoff']
scbc_c_percent_diff = 100 * ((scbc_c-raw)/raw).groupby(scbc_c['time'].dt.season).mean().sel(season=season)
scbc_u_percent_diff = 100 * ((scbc_u-raw)/raw).groupby(scbc_u['time'].dt.season).mean().sel(season=season)

mainstream_map = yakima_srn.generate_mainstream_map()
scbc_u_percent_diff = pd.Series(data=scbc_u_percent_diff.to_pandas().values, index=mainstream_map.index)
scbc_c_percent_diff = pd.Series(data=scbc_c_percent_diff.to_pandas().values, index=mainstream_map.index)

fig, axes = plt.subplots(1, 2, figsize=(14,10))
yakima_srn.draw_network(color_measure=scbc_u_percent_diff, cmap=mpl.cm.get_cmap('coolwarm_r'), node_size=40,
                        with_cbar=True, cbar_labelsize=20, ax=axes[0], cbar_title='')
axes[0].set_title('SCBC_U')
yakima_srn.draw_network(color_measure=scbc_c_percent_diff, cmap=mpl.cm.get_cmap('coolwarm_r'), node_size=40,
                        with_cbar=True, cbar_labelsize=20, ax=axes[1], cbar_title='Mean percent difference from raw flows(%)')
axes[1].set_title('SCBC_C')
Text(0.5, 1.0, 'SCBC_C')
_images/bmorph_tutorial_46_1.png

From the plot above we can see that the main differences between the two methods was in modifying the headwater flows, which are at higher elevations and receive more precipitation. This aligns with our hypothesis that the daily minimum temperature would provide a good proxy for erros in snow processes.

Moving forward

In this tutorial you have learned how to set up, perform bias corrections, and analyze them with bmorph. While this tutorial is meant to cover the essentials there are quite a few diversions/alternatives that you could try out before leaving. If you’d like to mess around a bit before moving on. For instance: - What happens if you conditionally bias correct on a different variable? Try seasonal_precip, or even implement a bias correction conditional on the month if you’re feeling adventurous! - How do the smoothing parameters affect the bias corrected flows? Try a wide range of n_smooth_short, or try setting n_smooth_long to None to turn off the correction of the mean trend. - Try removing half of the gauged sites to see how it affects the spatially-consistent bias correction. You can do this by commenting out (or deleting) some of the entries in site_to_seg up at the top.

Data Overview

Directory Setup

The following directories and names are required to properly execute bmorph:

input
mizuroute_configs
notebooks
output
topologies

input contains uncorrected and reference flows and meteorological data. mizuroute_configs is where the configuration file for mizuRoute will be written. notebooks contains the python notebooks that will execute bmorph. output is where the bmorph bias corrected flows will be written to. topologies contains topographical data for the watershed to connect river segments.

Variable Naming Conventions

seg is an individual river segment containing a single reach
hru is a hydrologic response unit that feeds into a single seg, but each seg could have multiple hru’s feeding into it
seg_id is the identification number for a seg
site is the gauge site name for river segments with gauge data, not all segments have them
raw refers to uncorrected routed streamflows
ref refers to references streamflow data, such as that from a gauge site

Configuration Utilities

to mizuRoute

bmorph is designed to bias correct simulated streamflow as modeled by mizuRoute. bmorph.util.mizuroute_utils.write_mizuroute_config automates writing a valid mizuroute configuration file for the purposes of bmorph.

Running mizuRoute during bias correction is only necessary for spatially consistent methods were local flows need to be routed. As a result, writing the configuration file is automated through only the bmorph.core.workflows.run_parallel_scbc. The configuration script receives the name of the region, the type of bias correction, and the time window from run_parallel_scbc, writing the rest of the configuration file assuming the Directory Setup described above.

to bmorph

bmorph.utils.mizuroute_utils.to_bmorph handles formatting the output of mizuroute for bmorph. While configuring streamflows can be performed without this function, this helps to speed up the whole workflow with a number of customizable options. As detailed in the tutorial, bmorph.utils.mizuroute_utils.to_bmorph is the primary function for handling streamflow formatting:

from bmorph.util import mizuroute_utils as mizutil

basin_met_seg = mizutil.to_bmorph(
    topo = basin_topo,
    routed = watershed_raw.copy(),
    reference = basin_ref,
    met_hru = watershed_met,
    route_var = "IRFroutedRunoff",
    fill_method = 'r2',
).ffil(dim='seg')

Where topo is the topology file for the basin, routed are the uncorrected streamflows, reference are the reference streamflows from gauge sites, met_hru are the meteorologic variables used in conditioning, route_var is the name of the uncorrected flows in routed, and fill_method describes spatial consistency determining as described in Spatial Consistency.

Input Specifications

Input data will be need identical time indices as pandas.Series without null flow values. So long as the size of the pandas.Series are the same across all flow data, the magnitude of the length should not impact bias correction. Further information on non-flow parameters can be found in Implementation.

Output Specifications

bmorph outputs a pandas.Series time series with flows as values indexed by time entires provided by given data. Total length of the output is the number of flow values.

The tutorial stores each of these outputs in a dictionary with their site/seg being their corresponding keys. bmoprh.workflows.bmorph_to_datarray converts such a dictionary into an xarray.DataArray with coordinates site and time, corresponding the the dictionary keys and the time of the pandas.Series that they access. From there, uncorrected and reference flows can be combined with the corrected flows into a singular xarray.Dataset and saved into a netCDF file if desired, storing it in the output directory if following the tutorial.

Evaluation of Bias Correction

Below are descriptions of statistics implemented in bmorph for evaluating bias correction performance. Let P be predicted values, such as the corrected flows, and O be the observed values, such as reference flows.

Mean Bias Error (MBE)

\[MBE = \frac{\Sigma_{i=0}^{n}(P_i - O_i)}{n}\]

Mean Bias Error is useful for preserving directionality of bias, yet may hide bias if both positive and negative bias exist in a dataset. MBE is therefore useful to determine how well biases balance out if you are only interested in the net or cumulative behavior of a system. This method is recommended against if looking to describe local changes without fine discretization to minimize biases canceling each other out.

Root Mean Square Error (RMSE)

\[RMSE = \sqrt{\frac{\Sigma_{i=0}^{n}(P_i - O_i)^2}{n}}\]

Root Mean Square Error preserves magnitudes over directionality, unlike MBE.

Percent Bias (PB)

\[PB = 100\% * \frac{\Sigma_{i=0}^{n}(P_i - O_i)}{\Sigma_{i=0}^{n}O_i}\]

Percent Bias preserves direction like MBE, but aims to describe relative error instead of absolute error. PB is often used when analyzing performance across sites with different magnitudes. Because direction is preserved, the issue of positive and negative biases canceling out arises again here like in MBE.

Kullback-Leibler Divergence (KL Divergence)

Kullback-Leibler Divergence, or relative entropy, is used to describe how similar predicted and observation distributions are. Taken from Information Theory, KL Divergence describes the error in using one probability distribution in place of another, turning out to be a strong statistic to analyze how bmorph corrects probability distributions of stream flows. A KL Divergence value of 0 symbolizes a perfect match between the two probability distributions, or no error in assuming the one distribution in place of the other.

Kling-Gupta Efficiency (KGE)

\[KGE = 1 - \sqrt{(r-1)^2 + (\frac{\sigma_{P}}{\sigma_{O}}-1)^2 + (\frac{\mu_{P}}{\mu}_{O}-1)^2}\]

The Kling-Gupta Efficiency compares predicted flows to observed flows by combining linear correlations, flow variability, and bias (Knoben et. al. 2019). A KGE value of 1 represents predicted flows matching observed flows perfectly.

Plotting

The Evaluation package of bmorph comes with a number of plotting tools to analyze bias correction performance. Whether you want to compare corrected to uncorrected values, contrast different bias correction parameters, or simply compile your results into a tidy plot across multiple sites, there are plenty of plotting functions to choose from. You can find plotting functions here.

The following plots are created from the results produced by the tutorial, where yakima_ds is yakima_data_processed.nc from the sample data before using any of the code provided below, make certain to import plotting like so:

from bmorph.evaluation import plotting

Scatter

plotting.compare_correction_scatter(
    flow_dataset= yakima_ds,
    plot_sites = ['KIOW','YUMW','BUM'],
    raw_var = 'raw',
    ref_var = 'ref',
    bc_vars = ['scbc_c'],
    bc_names = ['SCBC_C'],
    plot_colors = list(colors[-1]),
    pos_cone_guide = True,
    neg_cone_guide = True,
    symmetry = False,
    title = '',
    fontsize_legend = 120,
    alpha = 0.3
)
Three scatter plots labeled KIOW, YUMW, and BUM are shown, comparing differences in flow between reference data and the raw data on the horizontal axis while differences between the reference data and bias corrected data are plotted on the vertical axis. 1 to 1 and -1 to 1 lines are plotted for reference.

Scatter plots are most useful for comparing absolute error before and after bias correction. The above plot is produced from bmorph.evaluation.plotting.compare_correction_scatter to compare how absolute error changes with SCBC_C bias correction with Q being stream discharge. 1 to 1 and -1 to 1 lines are plotted for reference, as points plotted vertically between the lines demonstrates a reduction in absolute error while points plotted horizontally between the lines demonstrates an increase in absolute error for each flow time.

Time Series

plotting.plot_reduced_flows(
    flow_dataset= yakima_ds,
    plot_sites = ['KIOW','YUMW','BUM','KEE'],
    interval = 'week',
    raw_var = 'raw', raw_name = "Uncorrected",
    ref_var = 'ref', ref_name = "Reference",
    bc_vars = ['scbc_c'], bc_names = ['SCBC_C'],
    plot_colors = ['grey', 'black', 'red']
);
Four time series plots labeled KIOW, YUMW, BUM, and KEE compare mean week of year flows between raw, reference, and scbc_c bias correction.

The above plot is produced from bmorph.evaluation.plotting.plot_reduced_flows to compare a statistical representation of the flows at each site, (Mean in this case), for raw, reference, and bias corrected flows according to SCBC_C. Here, averages are computed on weekly intervals to simplify the figure, but can also be plotted on daily or monthly intervals for more or less granularity. Comparing this with median flows can describe how much the mean is impacted by extreme flows.

Probability Distributions

plotting.compare_mean_grouped_CPD(
    flow_dataset= yakima_ds,
    plot_sites = ['KIOW','YUMW','BUM'],
    grouper_func = plotting.calc_water_year,
    figsize = (60,40),
    raw_var = 'raw', raw_name = 'Uncorrected',
    ref_var = 'ref', ref_name = 'Reference',
    bc_vars = ['scbc_c'], bc_names = ['SCBC_C'],
    plot_colors = ['grey', 'black', 'red'],
    linestyles = ['-','-','-'],
    markers = ['o', 'X', 'o'],
    fontsize_legend = 90,
    legend_bbox_to_anchor = (1.9,1.0)
);
Three cumulative percentile plots labeled KIOW, YUMW, and BUM compare probabilities of mean annual flows at each site as described by raw, reference, and scbc_c bias correction.

The above plot is produced from bmorph.evaluation.plotting.compare_mean_grouped_CPD to compare cumulative percentile distributions of mean annual flow at each site for raw, reference, and bias corrected flows according to SCBC_C. This function is also capable of subsetting data by month should you want to compare only January flows for example. Because bmorph makes changes based on flow distributions, this plot is the closest to directly analyzing how the different methods correct flows.

Box & Whisker

plotting.kl_divergence_annual_compare(
    flow_dataset= yakima_ds,
    sites = ['KIOW','YUMW','BUM'],
    fontsize_legend = 60, title = '',
    raw_var = 'raw', raw_name = 'Uncorrected',
    ref_var = 'ref', ref_name = 'Reference',
    bc_vars = ['scbc_c'], bc_names = ['SCBC_C'],
    plot_colors = ['grey','red']
);
Three box and whisker plots labeled KIOW, YUMW, and BUM compare Annual KL Divergence between reference and 2 scenarios: Raw and SCBC_C.

The above plot is produced from bmorph.evaluation.plotting.kl_divergence_annual_compare to compare KL Divergence with respect to reference flows for raw and SCBC_C. Being able to view KL Divergence for different scenarios side-by-side helps to provide a better understanding of how well probability distributions are being fitted across the entire time provided.

Citations

Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrology and Earth System Sciences, 23, 4323-4331. https://doi.org/10.5194/hess-23-4323-2019

Simple River Network (SRN)

Overview

The Simple River Network, or SRN, is a graphical, pseudo-physical diagnostic tool used to visualize watershed models. Utilizing NetworkX’s nodal network structure, SRNs represent each river segment, or seg, as a singular SegNode and connects them according to the watershed’s topology. Each SRN is color-codable to assigned data values, such as percent bias, so you can visualize where issues may appear in the watershed during bmorph bias correction to more easily understand spatial patterns of bias correction in the network.

Nodal network of the Columbia River Basin showing river segment connections and color-coded by Pfaffsetter basin.

SRN SegNode’s contain identifying information that allow the network to be partitioned according to Pfaffstetter Codes (Verdin & Verdin 1999, Arge et. al. 2006). Pfaffstetter enconding not only allows the networks to be partitioned, but also to be “rolled up”, effectively reducing the granularity of the network to simplify large watersheds. Data can also be subsetted and split into new SRN’s for simple manipulation.

SRN does not aim to supplant geographically accurate drawings of watershed networks. Instead it aims to provide a quicker, intermediate tool that allows for easy identification of spatial patterns within the network without having to configure spatial data.

Construction

from bmorph.evaluation import simple_river_network as srn

srn_basin = srn.SimpleRiverNetwork(
    topo=basin_topo, pfaf_seed='', outlet_index=0, max_level_pfaf=42)

All we need to setup the SRN is the topology of the basin, basin_topo, as an xarray.Dataset. If you wish to include the Pfaffstetter digits of the larger watershed that contains the basin, you can do so with pfaf_seed. While the constructor will assume that the outlet of the basin is the first index of the topology file provided, you can specify a different index to build the SRN off of with outlet_index. Lastly, because the SRN is constructed recursively, a maximum number of Pfaffstetter digits is specifiable in max_level_pfaf, which defaults as 42. Note that the larger the basin SRN, the longer construction may take.

Plotting on the SRN

Plotting the SRN is fully describe in bmorph.evaluation.simple_river_network.SimpleRiverNetwork.draw_network, so let’s cover some of the basics.

Plotting data on the SRN is done through draw_network’s color_measure that requires a pandas.Series with and index that contains the indices of basin_topo and corresponding values to be plotted on a linear colorbar. Several functions automate this process for highlighting different sections of the SRN, such as bmorph.evaluation.simple_river_network.SimpleRiverNetwork.generate_mainstream_map to plot the mainstream, bmorph.evaluation.simple_river_network.SimpleRiverNetwork.generate_pfaf_color_map to color code Pfaffstetter basins, and bmorph.evaluation.simple_river_network.SimpleRiverNetwork.generate_node_highlight_map to highlight specific river segments within the drawing.

mainstream_map = srn_basin.generate_mainstream_map()

fig, ax = plt.subplots(figsize=(30,40))
srn_yak.draw_network(color_measure=mainstream_map, cmap=mpl.cm.get_cmap('cividis'), ax=ax)
ax.invert_xaxis() # inverting the axis can be applied to increase resemblance with real watershed

For an example of how to construct your own color_measure, we can look at how the tutorial plots percent difference across the SRN:

scbc_c = conditioned_seg_totals['IRFroutedRunoff']
raw = yakima_met_seg['IRFroutedRunoff']
p_diff = ((scbc_c-raw)/raw).mean(dim='time')*100

percent_diff = pd.Series(data=p_diff.to_pandas().values,index=mainstream_map.index)

fig, ax = plt.subplots(figsize=(30,40))
srn_yak.draw_network(color_measure=percent_diff, cmap=mpl.cm.get_cmap('coolwarm_r'),
                 with_cbar=True, cbar_labelsize=40, ax=ax, cbar_title='Percent Difference (%)')

The SRN is compatible with subplotting, but may require a large subplot space to spread out.

Citations

Arge, L., Danner, A., Haverkort, H., & Zeh, N. (2006). I/O-Efficient Hierarchial Watershed Decomposition og Grid Terrain Models. In A. Riedl, W. Kainz, G.A. Elmes (Eds.), Progress in Spatial Data Handling (pp. 825-844). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35589-8_51

Verdin, K.L., & Verdin, J. P. (1999). A topological system for delineation and codification of the Earth’s river basins. Elsevier Journal of Hydrology, 218, 1-12.

For Developers

This file contains helpful tips & tricks found in the process of developing this project.

Converting notebooks to restructured text

jupyter nbconvert mynotebook.ipynb --to rst

Any images in the notebook will be saved as png files within a newly created mynotebook_files and automatically referenced within the rst file.

For more information, check out here

Helper functions from bmorph

log10_1p

Similar to numpy’s log1p, bmorph.evaluation.plotting.log10_1p has been added to address wanting to perform log10 computations on a dataset that contains zeros. It effectively adds 1 to the data, element-wise, and then takes the log10. This is useful in scientific plots where a log10 scale is desired yet zeros reside in the dataset.

determine_row_col

Tired of having to constantly reformat you subplots whenever you want to tack on one more plot or scratch off something you didn’t think you wanted? Well bmorph.plotting.evaluation.determine_row_col automates that process for you by calculating the tightest possible square/rectangular dimensions for your subplots. There may be some extra subplots leftover (and therefore we recommend turning off axis past the number you wish to plot), there will be at least enough subplots to fit all that you ask for.

Creating Documentation

Documentation for this project was done with Sphinx’s reStructuredText. Compiling of the documentation into an HTMl format was performed by Read the Docs. Thanks to through software, documentation was made easily updatable through GitHub version control without needing to develop a website from scratch in HTML itself.

The tutorial is made runnable by binder while data is stored on Hydroshare for simple online access.

API Reference

This page provides an auto-generated summary of the bmorph API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Core

Workflows
bmorph.core.workflows.apply_blendmorph(raw_upstream_ts, raw_downstream_ts, train_upstream_ts, train_downstream_ts, ref_upstream_ts, ref_downstream_ts, apply_window, raw_train_window, ref_train_window, interval, overlap, blend_factor, raw_upstream_y=None, raw_downstream_y=None, train_upstream_y=None, train_downstream_y=None, ref_upstream_y=None, ref_downstream_y=None, n_smooth_long=None, n_smooth_short=5, bw=3, xbins=200, ybins=10, rtol=1e-06, atol=1e-08, method='hist', train_cdf_min=0.0001, **kwargs)[source]

Bias correction performed by blending bmorphed flows on user defined intervals.

Blendmorph is used to perform spatially consistent bias correction, this function does so on a user-defined interval. This is done by performing bmorph bias correction for each site’s timeseries according to upstream and downstream gauge sites (or proxies) where true flows are known. The upstream and downstream corrected timeseries are then multiplied by fractional weights, blend_factor, that sum to 1 between them so the corrected flows can be combined, or “blended,” into one, representative corrected flow series for the site. It is thereby important to specify upstream and downstream values so bias corrections are performed with values that most closely represent each site being corrected.

Parameters
  • raw_upstream_ts (pandas.Series) – Raw flow timeseries corresponding to the upstream flows.

  • raw_downstream_ts (pandas.Series) – Raw flow timerseries corresponding to the downstream flows.

  • train_upstream_ts (pandas.Series) – Flow timeseries to train the bias correction model with for the upstream flows.

  • train_downstream_ts (pandas.Series) – Flow timeseries to train the bias correction model with for the downstream flows.

  • ref_upstream_ts (pandas.Series) – Observed/reference flow timeseries corresponding to the upstream flows.

  • ref_downstream_ts (pandas.Series) – Observed/reference flow timeseries corresponding to the downstream flows.

  • raw_train_window (pandas.date_range) – Date range to train the bias correction model.

  • apply_window (pandas.date_range) – Date range to apply bmorph onto flow timeseries.

  • ref_train_window (pandas.date_range) – Date range to smooth elements in ‘raw_ts’ and ‘bmorph_ts’.

  • interval (pandas.DateOffset) – Difference between bmorph application intervals.

  • overlap (int) – Total overlap in number of days the CDF windows have with each other, distributed evenly before and after the application window.

  • blend_factor (numpy.array) – An array determining how upstream and downstream bmorphing is proportioned. This is determined by the fill_method used in mizuroute_utils. The blend_factor entries are the proportion of upstream multiplers and totals added with 1-blend_factor of downstream multipliers and totals.

  • n_smooth_long (int, optional) – Number of elements that will be smoothed in raw_ts and bmorph_ts. The nsmooth value in this case is typically much larger than the one used for the bmorph function itself. For example, 365 days.

  • n_smooth_short (int) – Number of elements that will be smoothed when determining CDFs used for the bmorph function itself.

  • raw_upstream_y (pandas.Series, optional) – Raw time series of the second time series variable for conditioning corresponding to upstream flows.

  • raw_downstream_y (pandas.Series, optional) – Raw time series of the second time series variable for conditioning corresponding to downstream flows.

  • train_upstream_y (pandas.Series, optional) – Training second time series variable for conditioning correpsonding to downstream flows.

  • train_downstream_y (pandas.Series, optional) – Training second time series variable for conditioning correpsonding to upstream flows.

  • ref_upstream_y (pandas.Series, optional) – Target second time series variable for conditioning corresponding to upstream flows.

  • ref_downstream_y (pandas.Series, optional) – Target second time series variable for conditioning corresponding to downtream flows.

  • bw (int, optional) – Bandwidth for KernelDensity. This should only be used if method=’kde’.

  • xbins (int, optional) – Bins for the first time series. This should only be used if method=’hist’.

  • ybins (int, optional) – Bins for the second time series. This should only be used if method=’hist’.

  • rtol (float, optional) – The desired relatie tolerance of the result for KernelDensity. This should only be used if method=’kde’.

  • atol (float, optional) – The desired absolute tolerance of the result for KernelDensity. This should only be used if method=’kde’.

  • method (str, optional) – Method to use for conditioning. Currently ‘hist’ using hist2D and ‘kde’ using kde2D are the only supported methods.

  • **kwargs – Additional keyword arguments. Mainly implemented for cross-compatibility with other methods so that a unified configuration can be used

Returns

  • bc_totals (pandas.Series) – Returns a time series of length of an interval in the bmoprh window with bmorphed values.

  • bc_multipliers (pandas.Series) – Returns a time series of equal length to bc_totals used to scale the raw flow values into the bmorphed values returned in bc_totals.

bmorph.core.workflows.apply_bmorph(raw_ts, train_ts, ref_ts, apply_window, raw_train_window, ref_train_window, condition_ts=None, raw_y=None, train_y=None, ref_y=None, interval=<DateOffset: years=1>, overlap=60, n_smooth_long=None, n_smooth_short=5, bw=3, xbins=200, ybins=10, rtol=1e-06, atol=1e-08, method='hist', train_cdf_min=0.0001, **kwargs)[source]

Bias correction is performed by bmorph on user-defined intervals.

Parameters
  • raw_ts (pandas.Series) – Raw flow timeseries.

  • train_ts (pandas.Series) – Flow timeseries to train the bias correction model with.

  • ref_ts (pandas.Series) – Observed/reference flow timeseries.

  • condition_ts (pandas.Series) – A timeseries with a variable to condition on. This will be used in place of raw_y, train_y, and ref_y. It is mainly added as a convenience over specifying the same timeseries for each of those variables.

  • apply_window (pandas.date_range) – Date range to apply bmorph onto flow timeseries.

  • raw_train_window (pandas.date_range) – Date range to train the bias correction model.

  • ref_train_window (pandas.date_range) – Date range to smooth elements in ‘raw_ts’ and ‘bmorph_ts’.

  • raw_y (pandas.Series, optional) – Raw time series of the second time series variable for conditioning.

  • train_y (pandas.Series, optional) – Training second time series.

  • ref_y (pandas.Series, optional) – Target second time series.

  • interval (pandas.DateOffset) – Difference between bmorph application intervals.

  • overlap (int) – Total number of days overlap CDF windows have with each other, distributed evenly before and after the application window.

  • n_smooth_long (int, optional) – Number of elements that will be smoothed in raw_ts and bmorph_ts. The nsmooth value in this case is typically much larger than the one used for the bmorph function itself. For example, 365 days.

  • n_smooth_short (int, optional) – Number of elements that will be smoothed when determining CDFs used for the bmorph function itself.

  • bw (int, optional) – Bandwidth for KernelDensity. This should only be used if method=’kde’.

  • xbins (int, optional) – Bins for the first time series. This should only be used if method=’hist’.

  • ybins (int, optional) – Bins for the second time series. This should only be used if method=’hist’.

  • rtol (float, optional) – The desired relatie tolerance of the result for KernelDensity. This should only be used if method=’kde’.

  • atol (float, optional) – The desired absolute tolerance of the result for KernelDensity. This should only be used if method=’kde’.

  • method (str) – Method to use for conditioning. Currently ‘hist’ using hist2D and ‘kde’ using kde2D are the only supported methods.

  • **kwargs – Additional keyword arguments. Mainly implemented for cross-compatibility with other methods so that a unified configuration can be used

Returns

  • bmorph_corr_ts (pandas.Series) – Returns a time series of length of an interval in the bmoprh window with bmorphed values.

  • bmorph_multipliers (pandas.Series) – Returns a time series of equal length to bc_totals used to scale the raw flow values into the bmorphed values returned in bc_totals.

bmorph.core.workflows.apply_scbc(ds, mizuroute_exe, bmorph_config, client=None, save_mults=False)[source]

Applies Spatially Consistent Bias Correction (SCBC) by bias correcting local flows and re-routing them through mizuroute. This method can be run in parallel by providing a dask client.

Parameters
  • ds (xr.Dataset) – An xarray dataset containing time and seg dimensions and variables to be bias corrected. This will mostly likely come from the provided preprocessing utility, mizuroute_utils.to_bmorph

  • mizuroute_exe (str) – The path to the mizuroute executable

  • bmorph_config (dict) – The configuration for the bias correction. See the documentation on input specifications and selecting bias correction techniques for descriptions of the options and their choices.

  • client (dask.Client (optional)) – A client object to manage parallel computation.

  • save_mults (boolean (optional)) – Whether to save multipliers from bmorph for diagnosis. If True, multipliers are saved in the same directory as local flows. Defaults as False to not save multipliers.

Returns

region_totals – The rerouted, total, bias corrected flows for the region

Return type

xr.Dataset

BMORPH
bmorph: modify a time series by removing elements of persistent differences

(aka bias correction)

Persistent differences are inferred by comparing a ‘ref’ sample with a ‘training’ sample. These differences are then used to correct a ‘raw’ sample that is presumed to have the same persistent differences as the ‘training’ sample. The resulting ‘bmorph’ sample should then be consistent with the ‘ref’ sample.

bmorph.core.bmorph.bmorph(raw_ts, train_ts, ref_ts, raw_apply_window, raw_train_window, ref_train_window, raw_cdf_window, raw_y=None, ref_y=None, train_y=None, nsmooth=12, bw=3, xbins=200, ybins=10, rtol=1e-07, atol=0, method='hist', smooth_multipliers=True, train_cdf_min=0.0001)[source]

Morph raw_ts based on differences between ref_ts and train_ts

bmorph is an adaptation of the PresRat bias correction procedure from Pierce et al. (2015; http://dx.doi.org/10.1175/JHM-D-14-0236.1), which is itself an extension of the Equidistant quantile matching (EDCDFm) technique of Li et al. (2010; http://dx.doi.org/10.1029/94JD00483). The method as implemented here uses a multiplicative change in the quantiles of a CDF, followed by a simple correction to preserve changes in the long-term mean. No further frequency-based corrections are applied.

The method differs from PresRat in that it is not applied for fixed periods (but uses a moving window) to prevent discontinuities in the corrected time series and it does not apply a frequency-based correction.

The method also allows changes to be made through an adapted version of the EDCDFm technique or through the multiDimensional ConDitional EquiDistant CDF matching function if a second timeseries variable is passed.

Parameters
  • raw_ts (pandas.Series) – Raw time series that will be bmorphed

  • raw_cdf_window (slice) – Slice used to determine the CDF for raw_ts

  • raw_bmorph_window (slice) – Slice of raw_ts that will be bmorphed

  • ref_ts (pandas.Series) – Target time series. This is the time series with ref values that overlaps with train_ts and is used to calculated ref_cdf

  • train_ts (pandas.Series) – Training time series. This time series is generated by the same process as raw_ts but overlaps with ref_ts. It is used to calculate train_cdf

  • training_window (slice) – Slice used to subset ref_ts and train_ts when the mapping between them is created

  • nsmooth (int) – Number of elements that will be smoothed when determining CDFs

  • raw_y (pandas.Series) – Raw time series of the second time series variable for cqm

  • ref_y (pandas.Series) – Target second time series

  • train_y (pandas.Series) – Training second time series

  • bw (int) – bandwidth for KernelDensity

  • xbins (int) – Bins for the flow time series

  • ybins (int) – Bins for the second time series

  • train_cdf_min (float (optional)) – Minimum percentile allowed for train cdf. Defaults as 1e-4 to help handle data spikes in corrections caused by multipliers being too large from the ratio between reference and training flows being large.

Returns

bmorph_ts – Returns a time series of length raw_bmorph_window with bmorphed values

Return type

pandas.Series

bmorph.core.bmorph.bmorph_correct(raw_ts, bmorph_ts, correction_window, ref_mean, train_mean, nsmooth)[source]

Correct bmorphed values to preserve the ratio of change

Apply a correction to bmorphed values to preserve the mean change over a correction_window. This is similar to teh correction applied in the original PresRat algorithm; Pierce et al. 2015; http://dx.doi.org/10.1175/JHM-D-14-0236.1), except that we use a rolling mean to determine the correction to avoid discontinuities on the boundaries.

Parameters
  • raw_ts (pandas.series) – Series of raw values that have not been bmorphed

  • bmorph_ts (Bmorphed version of raw_ts) – Series of bmorphed values

  • correction_window (slice) – Slice of raw_ts and bmorph_ts over which the correction is applied

  • ref_mean (float) – Mean of target time series (ref_ts) for the base period

  • train_mean (float) – Mean of training time series (train_ts) for the base period

  • nsmooth (int) – Number of elements that will be smoothed in raw_ts and bmorph_ts. The nsmooth value in this case is typically much larger than the one used for the bmorph function itself. For example, 365 days.

Returns

bmorph_corrected_ts – Corrected series of length correction_window.

Return type

pandas.Series

bmorph.core.bmorph.cqm(raw_x: pandas.core.series.Series, train_x: pandas.core.series.Series, ref_x: pandas.core.series.Series, raw_y: pandas.core.series.Series, train_y: pandas.core.series.Series, ref_y: Optional[pandas.core.series.Series] = None, method='hist', xbins=200, ybins=10, bw=3, rtol=1e-07, atol=0, nsmooth=5, train_cdf_min=0.0001) pandas.core.series.Series[source]

Conditional Quantile Mapping

Multidimensional conditional equidistant CDF matching function:
ilde{x_{mp}} = x_{mp} + F^{-1}_{oc}(F_{mp}(x_{mp}|y_{mp})|y_{oc})
  • F^{-1}_{mc}(F_{mp}(x_{mp}|y_{mp})|y_{mc})

bmorph.core.bmorph.edcdfm(raw_x, raw_cdf, train_cdf, ref_cdf, train_cdf_min=0.0001)[source]

Calculate multipliers using an adapted version of the EDCDFm technique

This routine implements part of the PresRat bias correction method from Pierce et al. (2015; http://dx.doi.org/10.1175/JHM-D-14-0236.1), which is itself an extension of the Equidistant quantile matching (EDCDFm) technique of Li et al. (2010; http://dx.doi.org/10.1029/94JD00483). The part that is implemented here is the amended form of EDCDFm that determines multiplicative changes in the quantiles of a CDF.

In particular, if the value raw_x falls at quantile u_t (in raw_cdf), then the bias-corrected value is the value in ref_cdf at u_t (ref_x) multiplied by the model-predicted change at u_t evaluated as a ratio (i.e., model future (or raw_x) / model historical (or ref_x)). Thus, the bias-corrected value is raw_x multiplied by ref_x/train_x. Here we only return the multiplier ref_x/train_x. This method preserves the model-predicted median (not mean) change evaluated multiplicatively. Additional corrections are required to preserve the mean change. Inclusion of these additional corrections constitutes the PresRat method.

Parameters
  • raw_x (pandas.Series) – Series of raw values that will be used to determine the quantile u_t

  • raw_cdf (pandas.Series) – Series of raw values that represents the CDF that is used to determine the non-parametric quantile of raw_x

  • train_cdf (pandas.Series) – Series of training values that represents the CDF based on the same process as raw_cdf, but overlapping in time with ref_cdf

  • ref_cdf (pandas.Series) – Series of ref values that represents the ref CDF and that overlaps in time with train_cdf

Returns

multiplier – Multipliers for raw_x. The pandas.Series has the same index as raw_x

Return type

pandas.Series

bmorph.core.bmorph.hist2D(x, y, xbins, ybins, **kwargs)[source]

Create a 2 dimensional pdf vias numpy histogram2d

bmorph.core.bmorph.kde2D(x, y, xbins=200, ybins=10, **kwargs)[source]

Estimate a 2 dimensional pdf via kernel density estimation

bmorph.core.bmorph.marginalize_cdf(y_raw, z_raw, vals)[source]

Find the marginalized cdf by computing cumsum(P(x|y=val)) for each val

Local Flows
bmorph.core.local_flows.estimate_local_flow(ref_total_flow: pandas.core.series.Series, sim_total_flow: pandas.core.series.Series, sim_local_flow: pandas.core.series.Series, how: str = 'flow_fraction', method_kwargs: Dict[str, Any] = {}) pandas.core.series.Series[source]

Estimate the local flow for a single site.

Parameters
  • ref_total_flow – Reference total flow to calculate the reference_local_flow from

  • sim_total_flow – Simulated total flow

  • sim_local_flow – Simulated reference flow

  • how – How to estimate the local reference flow. Available options are flow_fraction

  • method_kwargs – Additional arguments to pass to the estimator.

Returns

Return type

the estimated local reference flow

bmorph.core.local_flows.find_local_segment(lats, lons, target_latlon, n_return=10, metric='euclidean', gridsearch=False) Dict[str, numpy.ndarray][source]

Finds the closest coordinates to a given target. Can return multiple coordinates, in ascending order from closest to furthest.

Parameters
  • lats – Latitudes to search through

  • lons – Longitudes to search through

  • target_latlon – Tuple of (lat, lon) that is being searched for

  • n_return – Number of closest coordinates to return

  • metric – Distance metric. Can be any valid metric from scipy.spatial.distance.cdist

  • gridsearch – Whether to create a meshgrid from the given lats and lons

Returns

coords: Coordinates of n_return nearest coordinates as (lat, lon) distances: Distances according to metric from target_latlon indices: Indices from lats and lons to the n_return nearest

Return type

dictionary containing

bmorph.core.local_flows.flow_fraction_multiplier(total_flow: pandas.core.series.Series, local_flow: pandas.core.series.Series, nsmooth: int = 30) pandas.core.series.Series[source]

Calculate a the ratio of local flow to total flow from timeseries after applying a rolling mean.

Parameters
  • total_flow – Total accumulated flow at the given location

  • local_flow – Portion of flow that is directly from the sub-basin (excludes upstream contributions)

  • nsmooth – Number of timesteps to use

bmorph.core.local_flows.quantile_regression(total_flow: pandas.core.series.Series, local_flow: pandas.core.series.Series, nsmooth: int = 30)[source]

TODO: Implement

Utilities

mizuRoute Utilities
bmorph.util.mizuroute_utils.calculate_blend_vars(routed: xarray.core.dataset.Dataset, topology: xarray.core.dataset.Dataset, reference: xarray.core.dataset.Dataset, gauge_sites=None, route_var='IRFroutedRunoff', fill_method='kldiv', min_kge=- 0.41)[source]

Calculates a number of variables used in blendmorph and map_var_to_seg.

Parameters
  • routed (xr.Dataset) – The dataset that will be modified and returned ready for map_var_to_seg.

  • topology (xr.Dataset) – Contains the network topology with a “seg” dimension that identifies reaches, matching the routed dataset.

  • reference (xr.Dataset) – Contains reaches used for reference with dimension “site” and coordinate “seg”.

  • gauge_sites (list, optional) – Contains the gauge site names from the reference dataset to be used that are automatically pulled from reference if None are given.

  • route_var (str) – Variable name of flows used for fill_method purposes within routed. This is defaulted as ‘IRFroutedRunoff’.

  • fill_method (str) – While finding some upstream/downstream reference segs may be simple, (segs with ‘is_gauge’ = True are their own reference segs, others may be easy to find looking directly up or downstream), some river networks may have multiple options to select gauge sites and may fail to have upstream/downstream reference segs designated. ‘fill_method’ specifies how segs should be assigned upstream/downstream reference segs for bias correction if they are missed walking upstream or downstream.

    Currently supported methods:
    ‘leave_null’

    nothing is done to fill missing reference segs, np.nan values are replaced with a -1 seg designation and that’s it

    ‘forward_fill’

    xarray’s ffill method is used to fill in any np.nan values

    ‘r2’

    reference segs are selected based on which reference site that seg’s flows has the greatest r2 value with

    ‘kldiv’ (default)

    reference segs are selected based on which reference site that seg’s flows has the smallest KL Divergence value with

    ‘kge’

    reference segs are selected based on which reference site that seg’s flows has the greatest KGE value with

  • min_kge (float) – If not None, all upstream/downstream reference seg selections will be filtered according to the min_kge criteria, where seg selections that have a kge with the current seg that is less that min_kge will be set to -1 and determined unsuitable for bias correction. This is defaulted as -0.41.

Returns

routed – with the following added: ‘is_headwaters’ ‘is_gauge’ ‘down_seg’ ‘distance_to_up_gauge’ ‘distance_to_down_gauge’ ‘cdf_blend_factor’ ‘up_seg’ ‘up_ref_seg’ ‘down_ref_seg’

Return type

xr.Dataset

bmorph.util.mizuroute_utils.calculate_cdf_blend_factor(routed: xarray.core.dataset.Dataset, gauge_reference: xarray.core.dataset.Dataset, gauge_sites=None, fill_method='kldiv', min_kge=- 0.41)[source]

Calculates the cumulative distribution function blend factor based on distance to a seg’s nearest up gauge site with respect to the total distance between the two closest guage sites to the seg.

Parameters
  • routed (xr.Dataset) – Contains flow timeseries data.

  • gauge_reference (xr.Dataset) – Contains reference flow timeseries data for the same watershed as the routed dataset.

  • gauge_sites (list, optional) – If None, gauge_sites will be taken as all those listed in gauge_reference.

  • fill_method (str) – See map_ref_sites for full description of how fill_method works.

    Because each fill_method selects reference segs differently, calculate_blend_vars needs to know how they were selected to create blend factors. Note that ‘leave_null’ is not supported for this method because there is no filling for this method. Currently supported:

    ‘forward_fill’
    cdf_blend_factor = distance_to_upstream /

    (distance_to_upstream + distance_to_downstream)

    ‘kldiv’

    cdf_blend_factor = kldiv_upstream / (kldiv_upstream + kldiv_downstream)

    ‘r2’

    cdf_blend_factor = r2_upstream / (r2_upstream + r2_downstream)

Returns

routed – The original routed dataset updated with ‘cdf_blend_factors’ used to combine upstream and downstream relative bias corrections. Each fill_method will also add or use upstream and downstream statistical measures calculated in map_ref_sites.

Return type

xr.Dataset

bmorph.util.mizuroute_utils.find_max_kge(ds, curr_seg_flow)[source]

Searches through ds to find which seg has the larges Kling-Gupta Efficiency (KGE) value with respect to curr_seg_flow. If no seg is found, max_kge = -np.inf and max_kge_ref_seg = -1.

Parameters
  • ds (xr.Dataset) – Contains the variable ‘reference_flow’ to compare curr_seg_flow against and the coordinate ‘seg’.

  • curr_seg_flow (int) – A numpy array containing flow values that KGE is to be maximized with respect to.

Returns

  • max_kge (float) – Maximum KGE value found.

  • max_kge_ref_seg – River segment designation corresponding to max_kge.

bmorph.util.mizuroute_utils.find_max_r2(ds, curr_seg_flow)[source]

Searches through ds to find which seg has the greatest r2 value with respect to curr_seg_flow. If no seg is found, max_r2 = 0 and max_r2_ref_seg = -1.

Parameters
  • ds (xr.Dataset) – Contains the variable ‘reference_flow’ to compare curr_seg_flow against and the coordinate ‘seg’.

  • curr_seg_flow (np.array) – A numpy array containing flow values that r2 is to be maximized with respect to.

Returns

  • max_r2 (float) – Magnitude of the maximum R squared value found.

  • max_r2_ref_seg (int) – River segement designation corresponding to the max_r2.

bmorph.util.mizuroute_utils.find_min_kldiv(ds, curr_seg_flow)[source]

Searches through ds to find which seg has the smallest Kullback-Leibler Divergence value with respect to curr_seg_flow. If no seg is found, min_kldiv = -1 and min_kldiv_ref_seg = -1. https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

Parameters
  • ds (xr.Dataset) – contains the variable ‘reference_flow’ to compare curr_seg_flow against and the coordinate ‘seg’.

  • curr_seg_flow (np.array) – a numpy array containing flow values that KL Divergence is to be maximized with respect to.

Returns

  • min_kldiv (float) – Magnitude of the minimum KL Divergence found.

  • min_kldiv_ref_seg (int) – River segment designation corresponding to min_kldiv.

bmorph.util.mizuroute_utils.find_up(ds, seg, sel_method='first', sel_var='IRFroutedRunoff')[source]

Finds the segment directly upstream of seg given seg is not a headwater segment, (in which case np.nan is returned).

Parameters
  • ds (xr.Dataset) – Dataset containing river segments as ‘seg’, headwater segments by ‘is_headwaters’, and what is downstream of each seg in ‘down_seg’.

  • seg (int) – River segment designation to search from.

  • sel_method (str) – Method to use when selecting among multiple upstream segments.

  • sel_var (str) – Variable used when comparing segments amonth multiple upstream segments. Can be ‘forward_fill’, ‘r2’, or ‘kge’.

Returns

up_seg – Upstream segment designation found, or np.nan if seg is a headwater segement.

Return type

int

bmorph.util.mizuroute_utils.kling_gupta_efficiency(sim, obs)[source]

Calculates the Kling-Gupta Efficiency (KGE) between two flow arrays. https://agrimetsoft.com/calculators/Kling-Gupta%20efficiency

Parameters
  • sim (array-like) – Simulated flow array.

  • obs (array-like) – Observed flow array.

Returns

kge – Kling-Gupta Efficiency calculated between the two arrays.

Return type

float

bmorph.util.mizuroute_utils.map_headwater_sites(routed: xarray.core.dataset.Dataset)[source]

Boolean identifies whether a river segement is a headwater with ‘is_headwater’.

Parameters

routed (xr.Dataset) – Contains watershed river segments designated as the dimension ‘seg’. River segments are connected by referencing immediate downstream segments as ‘down_seg’ for each ‘seg’.

Returns

routed – The original routed dataset updated with which sites are headwaters.

Return type

xr.Dataset

bmorph.util.mizuroute_utils.map_met_hru_to_seg(met_hru, topo)[source]

Maps meterological data from hru to seg.

Parameters
  • met_hru (xr.Dataset) – A dataset of meteorological data to be mapped onto the stream segments to facilitate conditioning. All variables in this dataset will automatically be mapped onto the stream segments and returned.

  • topo (xr.Dataset) – Topology dataset for running mizuRoute. We expect this to have seg and hru dimensions.

Returns

met_seg – A dataset of meterological data mapped onto the stream segments to facilitate conditioning.

Return type

xr.Dataset

bmorph.util.mizuroute_utils.map_ref_sites(routed: xarray.core.dataset.Dataset, gauge_reference: xarray.core.dataset.Dataset, gauge_sites=None, route_var='IRFroutedRunoff', fill_method='r2', min_kge=- 0.41)[source]

Assigns segs within routed boolean ‘is_gauge’ “identifiers” and what each seg’s upstream and downstream reference seg designations are.

Parameters
  • routed (xr.Dataset) – Contains the input flow timeseries data.

  • gauge_reference (xr.Dataset) – Contains reference flow timeseries data for the same watershed as the routed dataset.

  • gauge_sites (list, optional) – If None, gauge_sites will be taken as all those listed in gauge_reference.

  • route_var (str) – Variable name of flows used for fill_method purposes within routed. This is defaulted as ‘IRFroutedRunoff’.

  • fill_method (str) – While finding some upstream/downstream reference segs may be simple, (segs with ‘is_gauge’ = True are their own reference segs, others may be easy to find looking directly up or downstream), some river networks may have multiple options to select gauge sites and may fail to have upstream/downstream reference segs designated. ‘fill_method’ specifies how segs should be assigned upstream/downstream reference segs for bias correction if they are missed walking upstream or downstream.

    Currently supported methods:
    ‘leave_null’

    nothing is done to fill missing reference segs, np.nan values are replaced with a -1 seg designation and that’s it

    ‘forward_fill’

    xarray’s ffill method is used to fill in any np.nan values

    ‘r2’

    reference segs are selected based on which reference site that seg’s flows has the greatest r2 value with

    ‘kldiv’

    reference segs are selected based on which reference site that seg’s flows has the smallest KL Divergence value with

    ‘kge’

    reference segs are selected based on which reference site that seg’s flows has the greatest KGE value with

Returns

routed – Routed timeseries with reference gauge site river segments assigned to each river segement in the original routed.

Return type

xr.Dataset

bmorph.util.mizuroute_utils.map_segs_topology(routed: xarray.core.dataset.Dataset, topology: xarray.core.dataset.Dataset)[source]

Adds contributing_area, average elevation, length, and down_seg to routed from topology.

Parameters
  • routed (xr.Dataset) – Contains streamflow timeseries mapped to river segments denoted as ‘seg’.

  • topology (xr.Dataset) – Contains topological data of the watershed that routed’s streamflow timeseries describe. River segment designations, lengths, and immeditate downstream segments are expected as ‘seg’, ‘Length’, and ‘Tosegment’.

Returns

routed – The input dataset routed updated with the topological data.

Return type

xr.Dataset

bmorph.util.mizuroute_utils.map_var_to_segs(routed: xarray.core.dataset.Dataset, map_var: xarray.core.dataarray.DataArray, var_label: str, var_key: str, gauge_segs=None)[source]

Splits the variable into its up and down components to be used in blendmorph.

Parameters
  • routed (xr.Dataset) – the dataset that will be modified and returned having been prepared by calculate_blend_vars with the dimension ‘seg’

  • map_var (xr.DataArray) – contains the variable to be split into up and down components and can be the same as routed, (must also contain the dimension ‘seg’)

  • var_label (str) – suffix of the up and down parts of the variable

  • var_key (str) – variable name to access the variable to be split in map_var

  • gauge_segs (list, optional) – List of the gauge segs that identify the reaches that are gauge sites, pulled from routed if None.

  • —-

Returns

routed – with the following added: f’down_{var_label}’ f’up_{var_label}’

Return type

xr.Dataset

bmorph.util.mizuroute_utils.to_bmorph(topo: xarray.core.dataset.Dataset, routed: xarray.core.dataset.Dataset, reference: xarray.core.dataset.Dataset, met_hru: Optional[xarray.core.dataset.Dataset] = None, route_var: str = 'IRFroutedRunoff', fill_method='r2', min_kge=None)[source]

Prepare mizuroute output for bias correction via the blendmorph algorithm. This allows an optional dataset of hru meteorological data to be given for conditional bias correction.

Parameters
  • topo (xr.Dataset) – Topology dataset for running mizuRoute. We expect this to have seg and hru dimensions.

  • routed (xr.Dataset) – The initially routed dataset from mizuRoute.

  • reference (xr.Dataset) – A dataset containing reference flows for bias correction. We expect this to have site and time dimensions with flows being stored in reference_flow.

  • met_hru (xr.Dataset, optional) – A dataset of meteorological data to be mapped onto the stream segments to facilitate conditioning. All variables in this dataset will automatically be mapped onto the stream segments and returned.

  • route_var (str) – Name of the variable of the routed runoff in the routed dataset. Defaults to IRFroutedRunoff.

  • fill_method (str) – While finding some upstream/downstream reference segs may be simple, (segs with ‘is_gauge’ = True are their own reference segs, others may be easy to find looking directly up or downstream), some river networks may have multiple options to select gauge sites and may fail to have upstream/downstream reference segs designated. ‘fill_method’ specifies how segs should be assigned upstream/downstream reference segs for bias correction if they are missed walking upstream or downstream.

    Currently supported methods:
    ‘leave_null’

    nothing is done to fill missing reference segs, np.nan values are replaced with a -1 seg designation and that’s it

    ‘forward_fill’

    xarray’s ffill method is used to fill in any np.nan values

    ‘r2’

    reference segs are selected based on which reference site that seg’s flows has the greatest r2 value with

    ‘kldiv’ (default)

    reference segs are selected based on which reference site that seg’s flows has the smallest KL Divergence value with

    ‘kge’

    reference segs are selected based on which reference site that seg’s flows has the greatest KGE value with

  • min_kge (float, optional) – See calculate_blend_vars for more information defaults None unless fill_method = ‘kge’.

Returns

A dataset with the required data for applying the blendmorph routines. See the blendmorph documentation for further information.

Return type

met_seg

bmorph.util.mizuroute_utils.trim_time(dataset_list: list)[source]

Trims all times of the xr.Datasets in the list to the shortest timeseries.

Parameters

dataset_list (List[xr.Dataset]) – Contains a list of xr.Datasets

Returns

Contains a list in the same order as dataset_list except with all items in the list having the same start and end time.

Return type

list

bmorph.util.mizuroute_utils.walk_down(ds, start_seg)[source]

Finds the nearest downstream gauge site and returns the distance traveled to reach it from start_seg.

Parameters
  • ds (xr.Dataset) – Dataset containing river segments, downstream segs, the length of the river segments, and which segs are gauge sites as ‘seg’, ‘down_seg’, ‘lenght’, and ‘is_gauge’, respectively.

  • start_seg (int) – River segment designation to start walking from to a downstream gauge site.

Returns

  • tot_length (float) – Total length traveled during walk, (e.g. cumulative river distance from start_seg to the downstream gauge site).

  • cur_seg (int) – River segment designation of the gauge site reached.

bmorph.util.mizuroute_utils.walk_up(ds, start_seg)[source]

Finds the nearest upstream gauge site and returns the distance traveled to reach it from start_seg.

Parameters
  • ds (xr.Dataset) – Dataset containing river segments, upstream segs, the length of the river segments, and which segs are gauge sites as ‘seg’, ‘up_seg’, ‘lenght’, and ‘is_gauge’, respectively.

  • start_seg (int) – River segment designation to start walking from to an upstream gauge site.

Returns

  • tot_length (float) – Total length traveled during walk, (e.g. cumulative river distance from start_seg to the downstream gauge site).

  • cur_seg (int) – River segment designation of the gauge site reached.

Evaluation

Plotting
bmorph.evaluation.plotting.anomaly_scatter2D(computations: dict, baseline_key: str, vert_key: str, horz_key: str, sites=[], multi=True, colors=['Maroon', 'Navy', 'Black', 'Orange', 'Yellow', 'Blue', 'Grey', 'Pink', 'Lavender'], show_legend=True)[source]

Plots two correction models against each other after Raw is subracted from each.

Parameters
  • computations (dict) – Expecting {“Correction Name”: correction pandas.DataFrame}.

  • baseline_key (str) – Dictionary key for the computations dictionary that accesses what baseline the corrections should be compared to. This is typically observations.

  • vert_key (str) – Dictionary key for the computations dictionary that accesses the model to be plotted on the vertical axis.

  • horz_key (str) – Dictionary key for the computations dictionary that accesses the model to be plotted on the horizontal axis.

  • sites (list) – Site(s) to be compared in the plot, can have a size of 1. If multi is set to False and this is not changed to a single site, then the first value in the list will be chosen.

  • multi (boolean, optional) – Whether the plot uses data from multiple sites or a single site.

  • colors (list, optional) – Colors as strings to be plotted from. Plotting colors are different for each correction DataFrame, but same across sites for a singular correction. An error will be thrown if there are more cor_keys then colors.

  • show_legend (boolean, optional) – Whether or not to display the legend, defaults as True.

bmorph.evaluation.plotting.calc_water_year(df: pandas.core.frame.DataFrame)[source]

Calculates the water year.

Parameters

df (pandas.DataFrame) – Flow timeseries with a DataTimeIndex.

Returns

A pandas.DataFrame index grouped by water year.

Return type

pandas.DataFrame.index

bmorph.evaluation.plotting.color_code_nxgraph(graph: <module 'networkx.classes.graph' from '/home/docs/checkouts/readthedocs.org/user_builds/bmorph/envs/master/lib/python3.7/site-packages/networkx/classes/graph.py'>, measure: pandas.core.series.Series, cmap=<matplotlib.colors.LinearSegmentedColormap object>, vmin=None, vmax=None) dict[source]

Creates a dictionary mapping of nodes to color values.

Parameters
  • graph (networkx.graph) – Graph to be color coded

  • measure (pandas.Series) – Contains river segment ID’s as the index and desired measures as values.

  • cmap (matplotlib.colors.LinearSegmentedColormap, optional) – Colormap to be used for coloring the SimpleRiverNewtork plot. This defaults as ‘coolwarm_r’.

  • vmin (float, optional) – Minimum value for coloring

  • vmax (float, optional) – Maximum value for coloring

Returns

Dictionary of {i:color} where i is the index of the river segment.

Return type

dict

bmorph.evaluation.plotting.compare_CDF(flow_dataset: xarray.core.dataset.Dataset, plot_sites: list, raw_var: str, raw_name: str, ref_var: str, ref_name: str, bc_vars: list, bc_names: list, plot_colors: list, logit_scale=True, logarithm_base='10', units='Q [$m^3/s$]', markers=['o', 'x', '*', '*'], figsize=(20, 20), sharex=False, sharey=True, fontsize_title=40, fontsize_labels=30, fontsize_tick=20, markersize=1, alpha=0.3)[source]

Compare probability distribution functions on a logit scale.

Plots the CDF’s of the raw, reference, and bias corrected flows.

Parameters
  • flow_dataset (xarray.Dataset) – Contatains raw, reference, and bias corrected flows.

  • gauges_sites (list) – Gauge sites to be plotted contained in flow_dataset.

  • raw_var (str) – Accesses the raw flows in flow_dataset for flows before bias correction.

  • raw_name (str) – Label for the raw flows in the legend, corresponding to raw_var.

  • ref_var (str) – Accesses the reference flows in flow_dataset for true flows.

  • ref_name (str) – Label for the reference flows in the legend, corresponding to ref_var.

  • bc_vars (list) – Accesses the bias corrected flows in flow_dataset, where each element accesses its own bias corrected flows. Can be a size of 1.

  • bc_names (list) – Label for the bias corrected flows in the legend for each entry in bc_var, assumed to be in the same order. Can be a size of 1.

  • logit_scale (boolean, optional) – Whether to plot the vertical scale on a logit axis (True) or not (False). Defaults as True.

  • logarithm_base (str, optional) – The logarthimic base to use for the horizontal scale. Only the following are currently supported:

    ‘10’ to use a log10 horizontal scale (default) ‘e’ to use a natural log horizontal scale

  • units (str, optional) – The horizontal axis’s label for units, defaults as r’Q [$m^3/s$]’.

  • plot_colors (list, optional) – Colors to be plotted for the flows corresponding to raw_var, ref_var, and bc_var, defaulting as [‘grey’, ‘black’, ‘blue’, ‘red’], assuming there are two entries in bc_var.

  • markers (list, optional) – Markers to be plotted for the flows corresponding to raw_var, ref_var, and bc_var, defaulting as [‘o’, ‘x’, ‘*’, ‘*’], assuming there are two entries in bc_var.

  • figsize (tuple, optional) – Figure size following matplotlib notation, defaults as (20, 20).

  • sharex (boolean or str, optional) – Whether horizontal axis should be shared amongst subplots, defaulting as False.

  • sharey (boolean or str, optional) – Whether vertical axis should be shared amongst subplots, defaulting as True.

  • fontsize_title (int, optional) – Font size of the title, defaults as 40.

  • fontsize_labels (int, optional) – Font size of the labels, defaults as 30.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 20.

  • markersize (float, optional) – Size of the markers plotted, defaults as 1.

  • alpha (float, optional) – Transparancy of the markers plotted, defaults as 0.3 to help see where markers clump together.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.compare_CDF_all(flow_dataset: xarray.core.dataset.Dataset, plot_sites: list, raw_var: str, raw_name: str, ref_var: str, ref_name: str, bc_vars: list, bc_names: list, plot_colors: list, logit_scale=True, logarithm_base='10', units='Q [$m^3/s$]', figsize=(20, 20), fontsize_title=40, fontsize_labels=40, fontsize_tick=40, markersize=1, alpha=0.3)[source]

Compare probability distribution functions as a summary statistic.

Plots the CDF’s of the raw, reference, and bias corrected flows with data from all sites in plot_sites combined for a summary statistic.

Parameters
  • flow_dataset (xarray.Dataset) – Contains raw (uncorrected), reference (true), and bias corrected flows.

  • plot_sites (list) – Gauge sites to be plotted.

  • raw_var (str) – Accesses the raw (uncorrected) flows in flow_dataset.

  • raw_name (str) – Label for the raw flows in the legend, corresponding to raw_var.

  • ref_var (str) – Accesses the reference (true) flows in flow_dataset.

  • ref_name (str) – Label for the reference flows in the legend, corresponding to ref_var.

  • bc_vars (list) – Accesses the bias corrected flows in flow_dataset. Can be a size of 1.

  • bc_names (list) – Label for the bias corrected flows in the legend, corresponding to bc_var. Can be a size of 1.

  • plot_colors (list, optional) – Colors to be plotted for the flows corresponding to raw_var, ref_var, and bc_var, defaulting as [‘grey’, ‘black’, ‘blue’, ‘red’], assuming there are two entries in bc_var.

  • logit_scale (True, optional) – Whether to plot the vertical scale on a logit axis (True) or not (False). Defaults as True.

  • logarithm_base (str, optional) – The logarthimic base to use for the horizontal scale. Only the following are currently supported:

    ‘10’ to use a log10 horizontal scale (default) ‘e’ to use a natural log horizontal scale

  • units (str, optional) – The horizontal axis’s label for units, defaults as r’Q [$m^3/s$]’.

  • figsize (tuple, optional) – Figure size following matplotlib connventions, defaults as (20, 20).

  • fontsize_title (int, optional) – Font size of the title, defaults as 40.

  • fontsize_labels (int, optional) – Font size of the labels, defaults as 40.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 40.

  • markersize (float, optional) – Size of the markers plotted, defaults as 1. Linewidth is half of this value.

  • alpha (float, optional) – Transparancy of the lines and markers, defaults as 0.3.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.compare_PDF(flow_dataset: xarray.core.dataset.Dataset, gauge_sites=<class 'list'>, raw_var='raw_flow', ref_var='reference_flow', bc_var='bias_corrected_total_flow', raw_name='Mizuroute Raw', ref_name='NRNI Reference', bc_name='BMORPH BC', fontsize_title=40, fontsize_labels=30, fontsize_tick=20)[source]

Compare probability distribution functions.

Plots the PDF’s of the raw, reference, and bias corrected flows for each gauge site.

Parameters
  • flow_dataset (xarray.Dataset) – Contatains raw, reference, and bias corrected flows.

  • gauges_sites (list) – Gauge sites to be plotted as used in the flow_dataset.

  • raw_var (str, optional) – The string to access the raw flows in flow_dataset for flows before bias correction. Defaults as ‘raw_flow’.

  • ref_var (str, optional) – The string to access the reference flows in flow_dataset for true flows. Defaults as ‘reference_flow’.

  • bc_var (str, optional) – The string to access the bias corrected flows in flow_dataset for flows after bias correction. Defaults as ‘bias_corrected_total_flow’.

  • raw_name (str, optional) – Label for the raw flows before bias correction. Defaults as ‘Mizuroute Raw’.

  • ref_name (str, optional) – Label for the reference flows. Defaults as ‘NRNI Reference’.

  • bc_name (str, optional) – Label for the bias corrected flows after bias correction. Defaults as ‘BMORPH BC’.

  • fontsize_title (int, optional) – Fontsize of the title. Defaults as 40.

  • fontsize_labels (int, optional) – Fontsize of the lables. Defaults as 30.

  • fontsize_tick (int, optional) – Fontsize of the ticks. Defaults as 20.

bmorph.evaluation.plotting.compare_correction_scatter(flow_dataset: xarray.core.dataset.Dataset, plot_sites: list, raw_var='raw', raw_name='Mizuroute Raw', ref_var='ref', ref_name='Reference', bc_vars=[], bc_names=[], plot_colors=['blue', 'purple', 'orange', 'red'], title='Absolute Error in Flow$(m^3/s)$', fontsize_title=80, fontsize_legend=68, alpha=0.05, fontsize_subplot=60, fontsize_tick=45, fontcolor='black', pos_cone_guide=False, neg_cone_guide=False, symmetry=True)[source]

Difference from reference flows before and after correction.

Plots differences between the raw and reference flows on the horizontal and differences between the bias corrected and refrerence on the vertical. This compares corrections needed before and after the bias correction method is applied.

Parameters
  • flow_dataset (xarray.Dataset) – contains raw, reference, and bias corrected flows.

  • plot_sites (list) – Sites to be plotted, expected as the seg coordinate in flow_dataset.

  • raw_var (str, optional) – The string to access the raw flows in flow_dataset, defaults as raw.

  • raw_name (str, optional) – Label for the raw flows in the legend, defaults as ‘Mizuroute Raw’.

  • ref_var (str, optional) – The string to access the reference flows in ref, defaults as ‘upstream_ref_flow’.

  • ref_name (str, optional) – Label for the reference flows in the legend, defaults as ‘Reference’.

  • bc_vars (list) – The strings to access the bias corrected flows in flow_dataset.

  • bc_names (list) – Labels for the bias corrected flows in the legend, expected in the same order as bc_vars.

  • plot_colors (list, optional) – Colors to be plotted for each site in plot sites. Defaults as [‘blue’, ‘purple’, ‘orange, ‘red’].

  • fontsize_title (int, optional) – Fontsize of the title, defaults as 80.

  • fontsize_legend (int, optional) – Fontsize of the legend, defaults as 68.

  • fontsize_subplot (int, optional) – Fontsize of the subplots, defaults as 60.

  • fontsize_tick (int, optional) – Fontsize of the ticks, defaults as 45.

  • fontcolor (str, optional) – Color of the font, defaults as ‘black’.

  • pos_cone_guide (boolean, optional) – If True, plots a postive 1:1 line through the origin for reference.

  • neg_cone_guide (boolean, optional) – If True, plots a negative 1:1 line through the origin for reference.

  • symmetry (boolean, optional) – If True, the plot axis are symmetrical about the origin (default). If False, plotting limits will minimize empty space while not losing any data.

bmorph.evaluation.plotting.compare_mean_grouped_CPD(flow_dataset: xarray.core.dataset.Dataset, plot_sites: list, grouper_func, raw_var: str, raw_name: str, ref_var: str, ref_name: str, bc_vars: list, bc_names: list, plot_colors: list, subset_month=None, units='Mean Annual Flow [$m^3/s$]', figsize=(20, 20), sharex=False, sharey=False, pp_kws={'postype': 'cunnane'}, fontsize_title=80, fontsize_legend=68, fontsize_subplot=60, fontsize_tick=45, fontsize_labels=80, linestyles=['-', '-', '-'], markers=['.', '.', '.'], markersize=30, alpha=1, legend_bbox_to_anchor=(1, 1), fig=None, axes=None, start_ax_index=0, tot_plots=None)[source]
Cumulative Probability Distributions

plots the CPD’s of the raw, reference, and bias corrected flows on a probability axis

flow_datasetxarray.Dataset

Contatains raw, reference, and bias corrected flows.

plot_siteslist

A list of sites to be plotted.

grouper_func

Function to group a pandas.DataFrame index by to calculate the mean of the grouped values.

raw_varstr

The string to access the raw flows in flow_dataset.

raw_namestr

Label for the raw flows in the legend.

ref_varstr

The string to access the reference flows in flow_dataset.

ref_namestr

Label for the reference flows in the legend.

bc_varslist

List of strings to access the bias corrected flows in flow_dataset.

bc_nameslist

List of labels for the bias corrected flows in the legend.

plot_colorslist

Contains the colors to be plotted for raw_var, ref_var, and bc_vars, respectively.

subset_month: int, optional

The integer date of a month to subset out for plotting, (ex: if you want to subset out January, enter 1). Defaults as None to avoid subsetting and use all the data in the year.

unitsstr, optional

Vertical axis’s label for units, defaults as r’Mean Annual Flow [$m^3/s$]’.

pp_kwsdict, optional

Plotting position computation as specified by https://matplotlib.org/mpl-probscale/tutorial/closer_look_at_plot_pos.html. Defaults as dict(postype=’cunnane’) for cunnane plotting positions.

fontsize_titleint, optional

Font size for the plot title, defaults as 80.

fontsize_legendint, optional

Font size for the plot legend, defaults as 68.

fontsize_subplotint, optional

Font size for the plot subplot text, default as 60.

fontsize_tickint, optional

Font size for the plot ticks, defaults as 45.

fontsize_labelsint, optional

Font size for the horizontal and vertical axis labels, defaults as 80.

linestyleslist, optional

Linestyles for ploting raw_var, ref_var, and bc_vars, respectively. Defaults as [‘-‘,’-‘,’-‘], expecting one of each.

markerslist, optional

Markers for ploting raw_var, ref_var, and bc_vars, respectively. Defaults as [‘.’,’.’,’.’], expecting one of each.

markersizeint, optional

Size of the markers for plotting, defaults as 30.

alphafloat, optional

Alpha transparency value for plotting, where 1 is opaque and 0 is transparent.

legend_bbox_to_anchortuple, optional

Box that is used to position the legend to the final axes. Defaults as (1,1). Modify this is the legend does not plot where you desire it to be.

figmatplotlib.figure, optional

matplotlib figure object to plot on, defaulting as None and creating a new object unless otherwise specified.

axesmatplotlib.axes, optional

Array-like of matplotlib axes objet to plot multiple plots on, defaulting as None and creating a new object unless otherwise specified.

start_ax_indexint, optional

If the plots should not be plotted starting at the first ax in axes, specifiy the index that plotting should begin on. Defaults as None, assuming plotting should begin from the first ax.

tot_plotsint, optional

If more plotting is to be done than with the total data to be provided, describe how many total plots there should be. Defalts as None, assuming plotting should begin form the first ax.

bmorph.evaluation.plotting.correction_scatter(site_dict: dict, raw_flow: pandas.core.frame.DataFrame, ref_flow: pandas.core.frame.DataFrame, bc_flow: pandas.core.frame.DataFrame, colors: list, title='Flow Residuals', fontsize_title=80, fontsize_legend=68, fontsize_subplot=60, fontsize_tick=45, fontcolor='black', pos_cone_guide=False, neg_cone_guide=False)[source]

Difference from reference flows before and after correction.

Plots differences between the raw and reference flows on the horizontal and differences between the bias corrected and refrerence on the vertical. This compares corrections needed before and after the bias correction method is applied.

Parameters
  • site_dict (dict) – Expects {subgroup name: list of segments in subgroup} how sites are to be seperated.

  • raw_flow (pandas.DataFrame) – Contains flows before correction.

  • ref_flow (pandas.DataFrame) – Contains the reference flows to compare raw_flow and bc_flow.

  • bc_flow (pandas.DataFrame) – Contains flows after correction.

  • colors (list) – Colors to be plotted for each site in site_dict.

  • title (str, optional) – Title label for the plot, defaults as ‘Flow Residuals’.

  • fontsize_title (int, optional) – Fontsize of the title, defaults as 80.

  • fontsize_legend (int, optional) – Fontsize of the legend, defaults as 68.

  • fontsize_subplot (int, optional) – Fontsize of the subplots, defaults as 60.

  • fontsize_tick (int, optional) – Fontsize of the ticks, defaults as 45.

  • fontcolor (str, optional) – Color of the font, defaults as ‘black’.

  • pos_cone_guide (boolean, optional) – If True, plots a postive 1:1 line through the origin for reference.

  • neg_cone_guide (boolean, optional) – If True, plots a negative 1:1 line through the origin for reference.

bmorph.evaluation.plotting.create_adj_mat(topo: xarray.core.dataset.Dataset) numpy.ndarray[source]

Forms the adjacency matrix for the graph of the topography.

Note that this is independent of whatever the segments are called, it is a purely a map of the relative object locations. :Parameters: topo (xarray.Dataset) – Describes the topograph of the river network.

Returns

An adjacency matrix describing the river network.

Return type

numpy.ndarray

bmorph.evaluation.plotting.create_nxgraph(adj_mat: numpy.ndarray) networkx.classes.graph.Graph[source]

Creates a NetworkX Graph object given an adjacency matrix.

Parameters

adj_mat (numpy.ndarray) – Adjacency matrix describing the river network.

Returns

NetworkX Graph of respective nodes.

Return type

networkx.graph

bmorph.evaluation.plotting.custom_legend(names: List, colors=['Maroon', 'Navy', 'Black', 'Orange', 'Yellow', 'Blue', 'Grey', 'Pink', 'Lavender'])[source]

Creates a list of patches to be passed in as handles for the plt.legends function.

Parameters
  • names (list) – Legend names.

  • colors (list) – A list of the colors corresponding to names.

Returns

Handle parameter for matplotlib.legend.

Return type

handles

bmorph.evaluation.plotting.determine_row_col(n: int, pref_rows=True)[source]

Determines rows and columns for rectangular subplots

Calculates a rectangular subplot layout that contains at least n subplots, some may need to be turned off in plotting. If a square configuration is possible, then a square configuration will be proposed. This helps automate the process of plotting a variable number of subplots.

Parameters
  • n (int) – Total number of plots.

  • pref_rows (boolean) – If True, and only a rectangular arrangment is possible, then put the longer dimension in n_rows. If False, then it is placed in the n_columns.

Returns

  • n_rows (int) – Number of rows for matplotlib.subplot.

  • n_columns (int) – Number of columns for matplotlib.subplot.

bmorph.evaluation.plotting.diff_maxflow_plotter(observed: pandas.core.frame.DataFrame, names: list, colors: list, *models: pandas.core.frame.DataFrame)[source]

Plots box plots of numerous models grouped by site.

Parameters
  • observed (pandas.Dataframe) – a dataframe containing observations

  • names (list) – List of the model names.

  • colors (list) – List of colors to be plotted.

  • *models (List[pandas.DataFrame]) – Any number of pandas.DataFrame objects to be evaluated.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.diff_maxflow_sites(observed: pandas.core.frame.DataFrame, predicted: pandas.core.frame.DataFrame)[source]

Calculates difference in maximum flows on a hydrologic year and site-by-site basis.

Parameters
  • observed (pandas.DataFrame) – Dataframe containing all observations.

  • predicted (pandas.DataFrame) – Dataframe containing all predictions.

Returns

DataFrame containing the difference in maximum flows.

Return type

pandas.DataFrame

bmorph.evaluation.plotting.draw_dataset(topo: xarray.core.dataset.Dataset, color_measure: pandas.core.series.Series, cmap=<matplotlib.colors.LinearSegmentedColormap object>)[source]

“Plots the river network through networkx.

Draws a networkx graph from a topological xrarray.Dataset and color codes it based on a pandas.Series.

Parameters
  • topo (xarray.Dataset) – Contains river segment identifications and relationships.

  • color_measure (pandas.Series) – Indicies are concurrent with the number of segs in topo. Typically this contains statistical information about the flows that will be color coded by least to greatest value.

  • cmap (matplotlib.colors.LinearSegmentedColormap, optional) – Colormap to be used for coloring the SimpleRiverNewtork plot. This defaults as ‘coolwarm_r’.

bmorph.evaluation.plotting.find_all_upstream(topo: xarray.core.dataset.Dataset, segID: int, return_segs: list = []) numpy.ndarray[source]

Finds all upstream river segments for a given river segment from the xarray.Dataset.

Parameters
  • topo (xarray.Dataset)

  • segID (int)

  • return_segs (list)

Returns

Return type

numpy.ndarray

bmorph.evaluation.plotting.find_index_water_year(data: pandas.core.frame.DataFrame) int[source]

Finds the index of the first hydrologic year.

Parameters

data (pd.DataFrame) – Flow timeseries with a DateTime index.

Returns

Index of the first hydrologic year.

Return type

int

bmorph.evaluation.plotting.find_upstream(topo: xarray.core.dataset.Dataset, segID: int, return_segs: list = [])[source]

Finds what river segment is directly upstream from the xarray.Dataset.

Parameters
  • topo (xarray.Dataset) – Contains river network topography. Expecting each river segment’s immeditate downstream river segment is designated by ‘Tosegment’/

  • segID (int) – Current river segment identification number.

  • return_segs (list) – River segment identification numbers upstream from segID. This defaults as an empty list to be filled by the method.

bmorph.evaluation.plotting.kl_divergence_annual_compare(flow_dataset: xarray.core.dataset.Dataset, sites: list, raw_var: str, raw_name: str, ref_var: str, ref_name: str, bc_vars: list, bc_names: list, plot_colors: list, title='Annual KL Diveregence Before/After Bias Correction', fontsize_title=40, fontsize_tick=30, fontsize_labels=40, fontsize_legend=30, showfliers=False, sharex=True, sharey='row', TINY_VAL=1e-06, figsize=(30, 20), show_y_grid=True)[source]

Kullback-Liebler Divergence compared before and after bias correction as boxplots.

Plots the KL divergence for each year per site as KL(P_{ref} || P_{raw}) and KL( P_{ref} || P_{bc}).

Parameters
  • flow_dataset (xarray.Dataset) – Contains raw (uncorrected), reference (true), and bias corrected flows.

  • sites (list) – Contains all the sites to be plotted as included in flow_dataset, (note that if the number of sites to be plotted is square or rectangular, the last site will not be plotted to save room for the legend).

  • raw_var (str) – Accesses the raw flows in flow_dataset.

  • raw_name (str) – Label for the raw flows in the legend and horizontal labels corresponding to raw_var.

  • ref_var (str) – Accesses the reference flows in flow_dataset.

  • ref_name (str) – Label the reference flows in the legend and horizontal labels corresponding to ref_var.

  • bc_vars (list) – String(s) to access the bias corrected flows in flow_dataset.

  • bc_names (list) – Label(s) for the bias corrected flows in the legend and horizontal labels, corresponding to each element in bc_vars.

  • plot_colors (list) – Colors to be plotted for the raw and the bias corrected flows, respectively.

  • title (str, optional) – Title to be plotted, defaults as “Annual KL Diveregence Before/After Bias Correction”.

  • fontsize_title (int, optional) – Font size of the title, defaults as 40.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 30.

  • fontsize_labels (int, optional) – Font size of the labels, defaults as 40.

  • fontsize_legend (int, optional) – Font size of the legend text, defaults as 30.

  • showfliers (boolean, optional) – Whether to include fliers in the boxplots, defaults as False.

  • sharex (boolean or str, optional) – Whether the horizontal axis is shared, defaults as True.

  • sharey (boolean or str, optional) – Whether the vertical axis is shared, defaults as ‘row’ to share the vertical axis in the same row.

  • TINY_VAL (float, optional) – Used to ensure there are no zero values in the data because zero values cause unsual behavior in calculating the KL Divergence. Defaults as 1E-6.

  • figsize (tuple, optional) – Figure size following maptlotlib connventions, defaults as (30,20).

  • show_y_grid (boolean, optional) – Whether to plot y grid lines, defaults as True.

Returns

Return type

maptlotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.log10_1p(x: numpy.ndarray)[source]

Return the log10 of one plus the input array, element-wise.

Parameters

x (numpy.ndarray) – An array of values greater than -1. If values are less than or equal to -1, then a domain error will occur in computing the logarithm.

Returns

y – Array of the values having the log10(element+1) computer.

Return type

numpy.ndarray

bmorph.evaluation.plotting.norm_change_annual_flow(sites: list, before_bc: pandas.core.frame.DataFrame, after_bc: pandas.core.frame.DataFrame, colors=<class 'list'>, fontsize_title=60, fontsize_labels=40, fontsize_tick=30)[source]

Normalized change in annual flow volume.

Plots a series of subplots containing bar charts that depict the differnece in normalized annual flow volume due to bias correction.

Parameters
  • sites (list) – String names of all the sites to be plotted, matching sites contained in the DataFrames before_bc and after_bc.

  • before_bc (pandas.DataFrame) – Contains flows, (not aggregated), before bias correction is applied.

  • after_bc (pandas.DataFrame) – Contains flows, (not aggregated), after bias correction is applied.

  • colors (list) – Ccolors to be used for each site’s subplot in the same order as sites, (does not have to be unique).

  • fontsize_title (int, optional) – Font size of the title. Defaults as 60.

  • fontsize_labels (int, optional) – Font size of the labels. Defaults as 40.

  • fontsize_tick (int, optional) – Font size of the ticks. Defaults as 30.

bmorph.evaluation.plotting.organize_nxgraph(topo: networkx.classes.graph.Graph)[source]

Orders the node positions into a hierarchical structure.

Based on the “dot” layout and given topography. :Parameters: topo (xarray.Dataset) – Contains river segment identifications and relationships.

Returns

Return type

networkx.positions

bmorph.evaluation.plotting.pbias_compare_hist(sites: list, raw_flow: pandas.core.frame.DataFrame, ref_flow: pandas.core.frame.DataFrame, bc_flow: pandas.core.frame.DataFrame, grouper=TimeGrouper(freq=<YearEnd: month=12>, axis=0, sort=True, closed='right', label='right', how='mean', convention='e', origin='start_day'), total_bins=None, title_freq='Yearly', fontsize_title=90, fontsize_subplot_title=60, fontsize_tick=40, fontsize_labels=84, x_extreme=150)[source]

Histograms comparing percent bias before/after bias correction.

Creates a number of histogram subplots by each given sites that plot percent bias both before and after bias correction.

Parameters
  • sites (list) – Sites corresponding to the columns of the flow DataFrames, raw_flow, ref_flow, and bc_flow, to be plotted.

  • raw_flow (pandas.DataFrame) – Flows before bias correction.

  • ref_flow (pandas.DataFrame) – Reference flows for comparison as true values.

  • bc_flow (pandas.DataFrame) – Flows after bias correction.

  • grouper (pandas.TimeGrouper) – How flows should be grouped for bias correction, defaults as yearly.

  • total_bins (int, optional) – Number of bins to use in the histogram plots. if none specified, defaults to the floored square root of the number of pbias difference values.

  • title_freq (str, optional) – An adjective description of the frequency with which the flows are grouped corresponding to grouper. Defaults as ‘Yearly’.

  • fontsize_title (int, optional) – Font size of the title, defaulting as 90.

  • fontsize_subplot_title (int, optional) – Font size of the subplot title, defaulting as 60.

  • fontsize_ticler (int, optional) – Font size of the ticks, defaulting as 40.

  • fontsize_labels (int, optional) – Font size of the labels, defaulting as 84.

  • x_extreme (float, optional) – Greatest magnitude on the horizontal axis to specify the range, defaulting as 150, which results in a range of (-150, 150). This is useful if desiring to zoom in closer to the origin and exclued outlying percent biases.

bmorph.evaluation.plotting.pbias_diff_hist(sites: list, colors: list, raw_flow: pandas.core.frame.DataFrame, ref_flow: pandas.core.frame.DataFrame, bc_flow: pandas.core.frame.DataFrame, grouper=TimeGrouper(freq=<MonthEnd>, axis=0, sort=True, closed='right', label='right', how='mean', convention='e', origin='start_day'), total_bins=None, title_freq='Monthly', fontsize_title=90, fontsize_subplot_title=60, fontsize_tick=40, fontsize_labels=84)[source]

Histograms of differences in percent bias before/after bias correction.

Creates a number of histogram subplots by each given site that plot the difference in percent bias before and after bias correction.

Parameters
  • sites (list) – Sites that are the columns of the flow DataFrames, raw_flow, ref_flow, and bc_flow.

  • colors (list) – Colors to plot the sites with, (do not have to be different), that are used in the same order as the list of sites.

  • raw_flow (pandas.DataFrame) – Contains flows before bias correction.

  • ref_flow (pandas.DataFrame) – Contains reference flows for comparison as true values.

  • bc_flow (pandas.DataFrame) – Contains flows after bias correction.

  • grouper (pandas.TimeGrouper, optional) – How flows should be grouped for bias correction. This defaults as monthly.

  • total_bins (int, optional) – Number of bins to use in the histogram plots. If none specified, defaults to the floored square root of the number of pbias difference values.

  • title_freq (str, optional) – An adjective description of the frequency with which the flows are grouped, should align with grouper, although there is no check to verify this. This defaults as ‘Monthly’.

  • fontsize_title (int, optional) – Font size of the title. Defaults as 90.

  • fontsize_subplot_title (int, optional) – Font size of the subplots. Defaults as 60.

  • fontsize_tick (int, optional) – Font size of the ticks. Defaults as 40.

  • fontsize_labels (int, optional) – Font size of the labels. Defaults as 84.

bmorph.evaluation.plotting.pbias_plotter(observed: pandas.core.frame.DataFrame, names: list, colors: list, *models: pandas.core.frame.DataFrame)[source]

Plots box plots of numerous models grouped by site.

Parameters
  • observed (pandas.Dataframe) – Dataframe containing observations.

  • names (list) – List of the model names.

  • colors (list) – List of colors to be plotted.

  • *models (List[pandas.DataFrame]) – Any number of pandas.DataFrame objects to be evaluated.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.pbias_sites(observed: pandas.core.frame.DataFrame, predicted: pandas.core.frame.DataFrame)[source]

Calculates percent bias on a hydrologic year and site-by-site basis.

Parameters
  • observed (pandas.DataFrame) – Dataframe containing all observations.

  • predicted (pandas.DataFrame) – Dataframe containing all predictions.

Returns

Dataframe contain the percent bias computed.

Return type

pandas.DataFrame

bmorph.evaluation.plotting.plot_reduced_flows(flow_dataset: xarray.core.dataset.Dataset, plot_sites: list, reduce_func=<function mean>, interval='day', statistic_label='Mean', units_label='$(m^3/s)$', title_label='Annual Mean Flows', raw_var='IRFroutedRunoff', raw_name='Mizuroute Raw', ref_var='upstream_ref_flow', ref_name='upstream_ref_flow', bc_vars=[], bc_names=[], fontsize_title=24, fontsize_legend=20, fontsize_subplot=20, fontsize_tick=20, fontcolor='black', figsize_width=20, figsize_height=12, plot_colors=['grey', 'black', 'blue', 'red'], return_reduced_flows=False)[source]

Creates a series of subplots plotting statistical day of year flows per gauge site.

Parameters
  • flow_dataset (xarray.Dataset) – Contatains raw, reference, and bias corrected flows.

  • plot_sites (list) – Sites to be plotted.

  • reduce_func (function, optional) – A function to apply to flows grouped by interval, defaults as np.mean.

  • interval (str, optional) – What time interval annual reduce_func should be computed on. Currently supported is day for dayofyear (default), week for weekofyear, and month for monthly.

  • statistic_label (str, optional) – Label for the statistic representing the reduce_func, defaults as ‘Mean’ to fit reduce_func as np.mean.

  • units_label (str, optional) – Label for the units of flow, defaults as r`$(m^3/s)$`.

  • title_label (str) – Lable for the figure title representing the reduce_func, defaults as f’Annual Mean Flows’ to fit reduce_func as np.mean.

  • raw_var (str, optional) – The string to access the raw flows in flow_dataset, defaults as ‘IRFroutedRunoff’.

  • raw_name (str, optional) – Label for the raw flows in the legend, defaults as ‘Mizuroute Raw’.

  • ref_var (str, optional) – The string to access the reference flows in flow_dataset, defaults as ‘upstream_ref_flow’.

  • ref_name (str, optional) – Label for the reference flows in the legend, defaults as ‘upstream_ref_flow’.

  • bc_vars (list) – The strings to access the bias corrected flows in flow_dataset.

  • bc_names (list) – Labels for the bias corrected flows in the legend, expected in the same order as bc_vars.

  • plot_colors (list, optional) – Colors to be plotted for raw_var, ref_var, bc_vars respectively. Defaults as [‘grey’, ‘black’, ‘blue’, ‘red’].

  • return_reduced_flows (boolean, optional) – If True, returns the reduced flows as calculated for plotting, defaults as False. This is typically used for debugging purposes.

  • fontsize_title (int, optional) – Font size of the plot title, defaults as 80.

  • fontsize_legend (int, optional) – Font size of the plot legend, defaults as 68.

  • fontsize_subplot (int, optional) – Font size for the subplots, defaults as 60.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults 45.

  • fontcolor (str, optional) – Color of the font, defaults as ‘black’

  • figsize_width (int, optional) – Width of the figure, defaults as 70.

  • figusize_height (int, optional) – Height of the figure, defaults as 30.

Returns

If return_reduced_flows is False, matplotlib.figure and matplotlib.axes, otherwise the reduced_flows are returned as the xarray.Dataset.

Return type

xarray.Dataset or (matplotlib.figure, matplotlib.axes)

bmorph.evaluation.plotting.plot_residual_overlay(flows: pandas.core.frame.DataFrame, upstream_sites: list, downstream_site: str, start_year: int, end_year: int, ax=None, fontsize_title=40, fontsize_labels=60, fontsize_tick=30, linecolor='k', alpha=0.3)[source]

Plots flow upstream/downstream residuals overlayed across one year span.

Plots residuals from each hydrologic year on top of each other with a refence line at zero flow. Residuals are calculated as downstream flows - sum(upstream flows).

Parameters
  • flows (pandas.DataFrame) – All flows to be used in plotting.

  • upstream_sites (list) – Strings of the site names stored in flows to aggregate.

  • downstream_sites (str) – Name of the downstream site stored in flows that will have the upstream_sites subtracted from it.

  • start_year (int) – The starting year to plot.

  • end_year (int) – The year to conclude on.

  • ax (matplotlib.axes, optional) – Axes to plot on. If none is specified, a new one is created.

  • fontsize_title (int, optional) – Font size of the title.

  • fontsize_labels (int, optional) – Font size of the labels.

  • fontsize_tick (int, optional) – Font size of the ticks.

  • linecolor (str, optional) – Color of the lines plotted. Defaults as ‘k’ for black.

  • alpha (float, optional) – Transparency of the lines plotted. Defaults as 0.3 to help see how residuals line up across many years.

Returns

Return type

matplotlib.axes

bmorph.evaluation.plotting.plot_spearman_rank_difference(flow_dataset: xarray.core.dataset.Dataset, gauge_sites: list, start_year: str, end_year: str, relative_locations_triu: pandas.core.frame.DataFrame, basin_map_png, cmap=<matplotlib.colors.LinearSegmentedColormap object>, blank_plot_color='w', fontcolor='black', fontsize_title=60, fontsize_tick=30, fontsize_label=45)[source]

Creates a site-to-site rank correlation difference comparison plot with a map of the basin.

Parameters
  • flow_dataset (xarray.Dataset) – Contains raw as ‘raw_flow’ and bias corrected as ‘bias_corrected_total_flow’ flow values, times of the flow values, and the names of the sites where those flow values exist

  • gauge_sites (list) – Gauge sites to be plotted.

  • start_year (str) – String formatted as ‘yyyy-mm-dd’ to start rank correlation window.

  • end_year (str) – String formatted as ‘yyyy-mm-dd’ to end rank correlation window.

  • relative_locations_triu (pandas.DataFrame) – Denotes which sites are connected with a ‘1’ and has the lower triangle set to ‘0’.

  • basin_map_png (png file) – The basin map with site values marked.

  • cmap (matplotlib.colors.LinearSegmentedColormap, optional) – Colormap to be used for coloring the SimpleRiverNewtork plot. This defaults as ‘coolwarm_r’.

  • blank_plot_color (str, optional) – Color to set the lower extremes in cmap should be to keep extreme values from skewing the color map and hiding values. This is defaulted as ‘w’ for white. It should appear that this color matches the background plot color to make it appear as if no value is plotted here.

  • font_color (str, optional) – Color of the font, defaulted as ‘black’.

  • fontsize_title (int, optional) – Font size of the title, defaults as 60.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 30.

  • fontsize_label (int, optional) – Font size of the labels, defaults as 45.

bmorph.evaluation.plotting.rmseFracPlot(data_dict: dict, obs_key: str, sim_keys: list, sites=[], multi=True, colors=['Maroon', 'Navy', 'Black', 'Orange', 'Yellow', 'Blue', 'Grey', 'Pink', 'Lavender'])[source]

Root mean square values calculated by including descending values one-by-one.

Parameters
  • data_dict (dict) – Expecting {“Data Name”: data pandas.DataFrame}.

  • obs_key (str) – Dictionary key for the computations dictionary that accesses the observations to be used as true in calculating root mean squares.

  • sim_keys (list) – Dictionary keys accessing the simulated DataFrames in computations, used in predictions in calculating root mean squares.

  • sites (list) – Site(s) to be compared in the plot, can have a size of 1. If multi is set to False and this is not changed to a single site, then the first value in the list will be chosen.

  • multi (boolean, optional) – Whether the plot uses data from multiple sites or a single site.

  • colors (list, optional) – Colors as strings to be plotted from. Plotting colors are different for each correction DataFrame, but same across sites for a singular correction. An error will be thrown if there are more sim_keys then colors.

bmorph.evaluation.plotting.scatter_series_axes(data_x, data_y, label: str, color: str, alpha: float, ax=None) matplotlib.pyplot.axes[source]

Creates a scatter axis for plotting.

Parameters
  • data_x (array-like) – Data for the x series.

  • data_y (array-like) – Data for the y series.

  • label (str) – Name for the axes.

  • color (str) – Color for the markers.

  • alpha (float) – Transparency for the markers.

Returns

Return type

matplotlib.axes

bmorph.evaluation.plotting.site_diff_scatter(predictions: dict, raw_key: str, model_keys: list, compare: dict, compare_key: str, site: str, colors=['Maroon', 'Navy', 'Black', 'Orange', 'Yellow', 'Blue', 'Grey', 'Pink', 'Lavender'])[source]

Creates a scatter plot of Raw-BC versus some measure.

Parameters
  • predictions (dict) – Expects {‘Prediction Names’ : Prediction pandas.DataFrame}. ‘Prediction Names’ will be printed in the legend.

  • raw_key (str) – The key for the predictions dictionary that directs to the raw data that each model will be subtracting.

  • model_keys (list) – A list of dictoionary keys pertaining to the correction models that are wanting to be plotted.

  • compare (dict) – Expecting {‘Measure name’ : measure pandas.DataFrame}. These are what is being plotted against on the horizontal-axis.

  • compare_key (str) – The dictionary key for the measure desired in the compare dictionary. ‘compare_key’ will be printed on the horizontal axis.

  • site (str) – A single site designiation to be examined in the plot. This will be listed as the title of the plot.

  • colors (List[str], optional) – Colors as strings to be plotted from.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.spearman_diff_boxplots_annual(raw_flows: pandas.core.frame.DataFrame, bc_flows: pandas.core.frame.DataFrame, site_pairings, fontsize_title=40, fontsize_tick=30, fontsize_labels=40, subtitle=None, median_plot_color='red')[source]

Annual difference in spearman rank as boxplots.

Creates box plots for each stide pairing determing the difference in spearman rank for each year between the raw and bias corrected data.

Parameters
  • raw_flows (pandas.DataFrame) – Raw flows before bias correction with sites in the columns and time in the index.

  • bc_flows (pandas.DataFrame) – Bias corrected flows with sites in the columns and time in the index.

  • site_pairings (List[List[str]]) – List of list of string site pairs e.g. [[‘downstream_name’,’upstream_name’],…]. This is used to organize which sites should be paired together for computing the spearman rank difference.

  • fontsize_title (int, optional) – Font size of the title, defaults as 40.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 30.

  • fontsize_labels (int, optional) – Font size of the labels, defaults as 40.

  • subtitle (str, optional) – Subtitle to include after “Annual Chnage in Speraman Rank: “. If no subtitle is is specified, none is included and only the title is plotted.

  • median_plot_color (str) – Color to plot the boxplot’s median as, defaults as ‘red’.

bmorph.evaluation.plotting.spearman_diff_boxplots_annual_compare(flow_dataset: xarray.core.dataset.Dataset, site_pairings, raw_var: str, bc_vars: list, bc_names: list, plot_colors: list, showfliers=True, fontsize_title=40, fontsize_tick=25, fontsize_labels=30, figsize=(20, 20), sharey='row')[source]

Annual difference in spearman rank as boxplots.

Creates box plots for each site pairing determining the difference in spearman rank for each year between the raw and the bias corrected data.

Parameters
  • flow_dataset (xarray.Dataset) – Contains raw (uncorrected), reference (true), and bias corrected flows.

  • site_pairings (List[List[str]]) – List of list of string site pairs e.g. [[‘downstream_name’,’upstream_name’],…]. This is used to organize which sites should be paired together for computing the spearman rank difference.

  • raw_var (str) – Accesses the raw (uncorrected) flows in flow_dataset.

  • bc_vars (list) – Strings to access the bias corrected flows in flow_dataset.

  • bc_names (list) – Labels for the bias corrected flows from flow_dataset, corresponding to each element in bc_vars in the same order.

  • plot_colors (list) – Colors that are in the same order as the bc_vars and bc_names to be used in plotting.

  • showfliers (boolean, optional) – Whether to show fliers on the boxplots, defaults as True.

  • fontsize_title (int, optional) – Font size of the title, defaults as 40.

  • fontsize_tick (int, optional) – Font size of the ticks, defaults as 25.

  • fontsize_lables (int, optional) – Font size of the labels, defaults as 30.

  • figsize (tuple) – Figure size following matplotlib connventions, defaults as (20, 20).

  • sharey (boolean or str, optional) – Whether or how the vertical axis are to be shared, defaults as ‘row’ to have vertical axis in the same row shared.

Returns

Return type

matplotlib.figure, matplotlib.axes

bmorph.evaluation.plotting.stat_corrections_scatter2D(computations: dict, baseline_key: str, cor_keys: list, uncor_key: str, sites=[], multi=True, colors=['Maroon', 'Navy', 'Black', 'Orange', 'Yellow', 'Blue', 'Grey', 'Pink', 'Lavender'])[source]

Creates a scatter plot of the flow before/after corrections relative to observations.

Parameters
  • computations (dict) – Expecting {“Correction Name”: correction pandas.DataFrame}.

  • baseline_key (str) – Contains the dictionary key for the computations dictionary that accesses what baseline the corrections should be compared to. This is typically observations.

  • cor_keys (list) – Dictionary keys accessing the correction DataFrames in computations. These will be printed in the legend.

  • uncor_key (str) – The dictionary key that accesses the uncorrected data in computations.

  • sites (list) – Site(s) to be compared in the plot, can have a size of 1. If multi is set to False and this is not changed to a single site, then the first value in the list will be chosen.

  • multi (boolean, optional) – Determines whether the plot uses data from multiple sites or a single site, defaults as True.

  • colors (List[str], optional) – Colors as strings to be plotted from. Plotting colors are different for each correction DataFrame, but same across sites for a singular correction. An error will be thrown if there are more cor_keys then colors.

Returns

Return type

matplotlib.figure, matplotlib.axes

Simple River Network
class bmorph.evaluation.simple_river_network.SegNode(seg_id, pfaf_code)[source]

River segment node used in SimpleRiverNetwork.

Creates a node of a segment to be used in the simple river network.

Variables
  • pfaf_code (int) – Pfafstetter code for this river segment.

  • seg_id (int) – Idenification for this river segment.

  • upstream (List[SegNode]) – Containing what is directly upstream from this SegNode.

  • basin_area (float) – Summative Basin Area for this seg.

  • _end_marker (boolean) – TRUE if this SegNode marks the end of an interbasin during simple_river_network.ecode_pfaf. This variable is only used during the encoding of pfaffstetter codes for the SimpleRiverNetwork and is set to FALSE when not in use. Changing this variable could interfere with SimpleRiverNetwork operations.

  • encoded (boolean) – TRUE if this SegNode has been fully given a unique pfaf_code within the SimpleRiverNetwork, otherwise it is FALSE.

class bmorph.evaluation.simple_river_network.SimpleRiverNetwork(topo: xarray.core.dataset.Dataset, pfaf_seed=<class 'int'>, outlet_index=0, max_pfaf_level=42)[source]

Psuedo-physical visualization of watershed models.

The SimpleRiverNetwork maps nodes within a given topography to visualize their arragments and simplify different parts of the network to track statistics propogating through the network. This tree network has the root as outlet, parsing upstream for all operations, opposite of the direction of flow.

Variables
  • topo (xarray.Dataset) – Dataset describing the topography of the river network.

  • seg_id_values (list) – All seg_id’s being used in the network that identify river segments in the watershed.

  • outlet (SegNode) – The end of the river network and start of the SimpleRiverNetwork, aka “root” of the network.

  • adj_mat (numpy.array) – A square adjacency matrix size N, where N is the length of seg_id_values, that can be used to graph the SimpleRiverNetwork, where the row/column index i corresponds to seg_id i in seg_id_values.

  • network_graph (networkx.graph) – A networkx graph part of the visual component of the network. More information about NetworkX can be found at https://networkx.org/

  • network_positions (dictionary) – A dictionary with networkx nodes as keys and positions as values, using in the plotting of the network.

draw_network(label_map=[], color_measure=None)[source]

Plots the river network through networkx.

draw_multi_measure(color_dict, label_map=[])[source]

Overlays multiple network plots to compare multiple measures at once.

find_node(target_id, node: SegNode)[source]

Linear search of SimpleRiverNetwork for a specific SegNode.

find_like_pfaf(node: SegNode, target_pfaf_digits: list, degree: int)[source]

Finds nodes based on pfaffstetter codes.

collect_upstream_nodes(node: SegNode)[source]

Finds all upstream SegNode’s.

generate_pfaf_map()[source]

List of pfaffstetter codes corresponding to seg_id_values.

generate_weight_map()[source]

Creates a list proportional upstream area ratios for each seg_id_values.

generate_mainstream_map()[source]

Highlights the mainstream for plotting in draw_network.

generate_pfaf_color_map()[source]

Extracts the first pfaffstetter digit of each code for colorcoding.

generate_node_highlight_map(seg_ids: list)[source]

Highlight specific SegNode’s in a SimpleRiverNetwork.

pfaf_aggregate()[source]

Aggregates the flow network by one pfafstetter level.

spawn_srn(spawn_outlet)[source]

Creates a new SimpleRiverNetwork from spawn_outlet and upstream of it.

aggregate_measure(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray, aggregation_function) pandas.core.series.Series[source]

This is a preliminary function.

Aggregates the measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

  • aggregation_function (numpy function) – A function to be passed in on how the aggregated segs should have this variable combined, recommended as a numpy function like np.sum, np.median, np.mean …

Returns

a pd.Series formated as

Return type

(seg_id_values_index, aggregated measure)

aggregate_measure_max(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray) pandas.core.series.Series[source]

This is a preliminary function.

Determines the maximum measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

Returns

A pandas.Series formated as: (seg_id_values_index, aggregated measure)

Return type

pandas.Series

aggregate_measure_mean(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray) pandas.core.series.Series[source]

This is a preliminary function.

Determines the mean measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

Returns

A pandas.Series formated as: (seg_id_values_index, aggregated measure)

Return type

pandas.Series

aggregate_measure_median(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray) pandas.core.series.Series[source]

This is a preliminary function.

Determines the median measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

Returns

A pandas.Series formated as: (seg_id_values_index, aggregated measure).

Return type

pandas.Series

aggregate_measure_min(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray) pandas.core.series.Series[source]

Determines the minimum measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

Returns

A pandas.Series formated as: (seg_id_values_index, aggregated measure)

Return type

pandas.Series

aggregate_measure_sum(dataset_seg_ids: numpy.ndarray, variable: numpy.ndarray) pandas.core.series.Series[source]

This is a preliminary function.

Determines the sum measure value for the given variable based on how SimpleRiverNetwork has been aggregated and provides a pandas.Series to plot on the SimpleRiverNetwork.

Parameters
  • dataset_seg_ids (numpy.ndarray) – Contains all the seg_id values according to the original topology. This should be in the same order seg order as variable.

  • variable (numpy.ndarray) – Contains all the variable values according to the original topology. This should be in the same order seg order as dataset_seg_ids.

Returns

A pandas.Series formated as: (seg_id_values_index, aggregated measure)

Return type

pandas.Series

append_pfaf(node: bmorph.evaluation.simple_river_network.SegNode, pfaf_digit: int)[source]

Adds a pfaffstetter code digit to all upstream nodes.

Parameters
  • node (SegNode) – A SegNode to designate the root of the flow tree.

  • pfaf_digit (int) – The digit to be added to the pfaffstetter codes.

append_sequential(sequence, base='')[source]

Adds odd digits for unbranching stream segments.

Adds odd digits to the pfaf_codes of SegNode’s in a row, or in sequence. this ensures all SegNode’s within the SimpleRiverNetwork have a unique code.

Parameters
  • sequence (list) – A list of SegNode’s in a sequence, typically aggregated from find_branch.

  • base (str) – An addition to the pfaf_code if needing to append the pfafstetter code being appended.

check_upstream_end_marking(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Checks if any directly upstream nodes are marked by _end_marker.

Checks if any nodes directly upstream are _end_marker’s and returns True if so.

Parameters

node (SegNode) – A SegNode to check directly upstream from.

Returns

If any directly upstream nodes are marked by _end_marker.

Return type

boolean

clear_end_markers(node)[source]

Sets all upstream _end_marker’s to False.

Sets all end_mark in nodes at and upstream of node to False.

Parameters

node (SegNode) – SegNode to start setting _end_marker to False and moving upstream from.

clear_network()[source]

Sets the network to only the oulet SegNode.

Sets the adjacency matrix to an empty array and sets the upstream designation of the outlet to an empty list, clearing the shape of the network upstream of the outlet. This does not reset the topograpghy or seg_id_values, so the original shape can be rebuilt.

collect_upstream_nodes(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Finds all upstream SegNode’s.

Finds all nodes upstream of a node and returns a list of them.

Parameters

node (SegNode) – A SegNode to collect upstream nodes from.

Returns

All upstream nodes of node.

Return type

list

color_network_graph(measure, cmap, vmax=None, vmin=None)[source]

Creats a dictionary and colorbar depicting measure.

Parameters
  • measure (pandas.Series, optional) – Describes how colors for each SegNode should be allocated relative to a linear colormap. The index is expected to match the indicies of seg_id_values as a 0:len(seg_id_values)-1 array. If no measure is specified, then colors will be assigned sequentially in order of seg_id_values.

  • cmap (matplotlib.colors.LinearSegmentedColormap) – Colormap to be used for coloring the SimpleRiverNewtork plot.

  • vmin (float, optional) – Minimum value for coloring

  • vmax (float, optional) – Maximum value for coloring

Returns

  • color_dict (dict) – Dictionary of {i:color} where i is the index of the SegNode’s seg_id in seg_id_values.

  • color_bar (ScalarMappable) – A color bar used to plot color values determined by measure for plotting in draw_network.

count_net_upstream(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Inclusively counts number of upstream nodes.

Counts the number of SegNode’s upstream of node, including the original node.

Parameters

node (SegNode) – A SegNode to begin counting from.

Returns

Number of SegNode’s upstream.

Return type

int

draw_multi_measure(color_dict, label_map=[], node_size=200, font_size=8, font_weight='bold', node_shape='s', linewidths=2, font_color='w', with_labels=False, with_cbar=False, with_background=True)[source]

Overlays multiple network plots to compare multiple measures at once.

Plots several networkx plots of user specified transparency for a single SimpleRiverNetwork to compare mutliple measures at once. For example, if dataset_1 is “Blues” and dataset_2 is “Reds”, then a bivariate colormap can be used where shades of purple would represent the combinations of dataset_1 and dataset_2.

Parameters
  • color_dict (dict) – Expected as {name: [pandas.Series, cmap, alpha]} to organize which colormap applies to which data.

  • label_map (list, optional) – Text to be plotted on top of each node in the same order as seg_id_values. There must be a value for each seg_id in seg_id_values and the values must be unique, otherwise an error will arise in plotting.

  • node_size (float, optional) – Plotting size the nodes, defaulting at 200.

  • font_size (float, optional) – Font size of the text from label_map on top of each node, defaulted at 8.

  • font_weight (str, optional) – Font weight of the text from label_map on top of each node, defaulted as bold.

  • node_shape (str, optional) – Shape of the plotted nodes, defaults as ‘s’ for square. Networkx uses can use any one of ‘so^>v<dph8’.

  • linewidths (float, optional) – Width of the connecting lines between nodes, defaults as 2.

  • font_color (str, optional) – Font color of the text from label_map on top of each node, defaulted as w for white.

  • with_labels (boolean, optional) – Whether labels should be plotted on top of each node, True, or not, False. This is defaulted as False.

  • with_cbar (boolean, optional) – Whether a colorbar should be plotted right of the network plot, True, or not, False. This is defaulted as False.

  • with_background (boolean, optional) – Whether a background should be plotted with the network figure, True, or not, False. This is defaulted as True. If desiring to download the image with a transparent background, such as a PNG, then set this to False.

draw_network(label_map=[], color_measure=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, node_size=200, font_size=8, font_weight='bold', node_shape='s', linewidths=2, font_color='w', node_color=None, with_labels=False, with_cbar=False, with_background=True, cbar_labelsize=10, vmin=None, vmax=None, edge_color='k', alpha=1, cbar_title='', cbar_label_pad=40, ax=None)[source]

Plots the river network through networkx.

Plots the visual component of the SimpleRiverNetwork where spatial connections between river segments can be seen. This graphical tool may not match the topographical shape of the actual river network, but it should be similar. Visualizng how the river segments are connected virtually can help find errors in the construction of large models or locate where analysis only associated with the seg_id of a river segment corresponds to the pseudo-physical network. Plotting the network with labels, highlighting specific nodes, color coding by pfafstetter basin, and other coloring can help visually connect this plot with a topographical plot.

Parameters
  • label_map (list, optional) – Text to be plotted on top of each node in the same order as seg_id_values. There must be a value for each seg_id in seg_id_values and the values must be unique, otherwise an error will arise in plotting.

  • color_measure (pandas.Series, optional) – Describes how colors for each SegNode should be allocated relative to a linear colormap. The index is expected to match the indicies of seg_id_values as a 0:len(seg_id_values)-1 array.

  • cmap (matplotlib.colors.LinearSegmentedColormap, optional) – Colormap to be used for coloring the SimpleRiverNewtork plot. This is defaulted as matplotlib.cm.get_cmap(‘hsv’), a vibrant set of colors to alert that a more specific colormap has not been specified.

  • node_size (float, optional) – Plotting size the nodes, defaulting at 200.

  • font_size (float, optional) – Font size of the text from label_map on top of each node, defaulted at 8.

  • font_weight (str, optional) – Font weight of the text from label_map on top of each node, defaulted as bold.

  • node_shape (str, optional) – Shape of the plotted nodes, defaults as ‘s’ for square. Networkx uses can use any one of ‘so^>v<dph8’.

  • linewidths (float, optional) – Width of the connecting lines between nodes, defaults as 2.

  • font_color (str, optional) – Font color of the text from label_map on top of each node, defaulted as w for white.

  • with_labels (boolean, optional) – Whether labels should be plotted on top of each node, True, or not, False. This is defaulted as False.

  • with_cbar (boolean, optional) – Whether a colorbar should be plotted right of the network plot, True, or not, False. This is defaulted as False.

  • with_background (boolean, optional) – Whether a background should be plotted with the network figure, True, or not, False. This is defaulted as True. If desiring to download the image with a transparent background, such as a PNG, then set this to False.

  • cbar_labelsize (float, optional) – Font size of the labels on the colorbar that can be attached in with_cbar being set to True, defaulted as 10.

  • vmin (float, optional) – Minimum value for coloring

  • vmax (float, optional) – Maximum value for coloring

  • edge_color (str, optional) – Node outline color of each node, defaulted as ‘k’ for black.

  • alpha (float) – Transparancy of each node, where 1 is perfectly opaque and 0 is perfectly transparent. This is primarly useful in draw_multi_measure, where plots are overlayed on top of each other.

  • cbar_title (str, optional) – Title of the colorbar that can be attached in with_cbar being set to True. This is defaulted as ‘’ to exclude a title.

  • cbar_label_pad (float, optional) – Padding for the colorbar labels, defaulted as 40.

  • ax (Matplotlib Axes object, optional) – Draw the network in the specified Matplotlib axes.

encode_pfaf(root_node=<class 'bmorph.evaluation.simple_river_network.SegNode'>, level=0, max_level=42)[source]

Recursively encodes pfafstetter codes on a SimpleRiverNetwork.

Parameters
  • root_node (SegNode) – A SegNode from which to start encoding.

  • level (int) – How many levels deep into recursion the method already is. By default, this starts at 0.

  • max_level (int) – The maximum number of levels encode_pfaf will run for before raising a RuntimeError. By default, this is set to the arbitrary number 42 as a safety mechanism.

find_branch(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Locates the nearest upstream branch.

Locates a node that branches into 2+ nodes, returning what node branches and any nodes prior to the branch that where in a row.

Parameters

node (SegNode) – A SegNode to start searching from.

Returns

  • branch (SegNode) – The SegNode found with an upstream branch.

  • sequential_nodes (list) – The list of nodes preceeding and including branch.

find_like_pfaf(node: bmorph.evaluation.simple_river_network.SegNode, target_pfaf_digits: list, degree: int)[source]

Finds nodes based on pfaffstetter codes.

Finds all nodes with the matching digit at the exact same location in pfaf_code and returns all of them in a list.

Parameters
  • node (SegNode) – A SegNode to start searching from.

  • target_pfaf_digits (list) – A list of pfaf_digit to search for in the SimpleRiverNetwork. This can be a list of one element.

  • degree (int) – How many pfafstetter levels deep should the function look for i.e. if you have degree 2, and a pfaf_code of 1234, it will examin “3” to check if it is a match.

Returns

A list of odes with like pfaffstetter codes that match the target_pfaf_digits at the input degree.

Return type

list

find_node(target_id, node: bmorph.evaluation.simple_river_network.SegNode)[source]

Linear search of SimpleRiverNetwork for a specific SegNode.

Searches for and returns a node with the desired seg_id upstream of node.

Parameters
  • target_id (int) – A seg_id to search for within the SimpleRiverNetwork.

  • node (SegNode) – A SegNode to start searching from. The seg_id of this SegNode is checked against target_id.

Returns

SegNode with seg_id matching target_id. If the desired SegNode could not be found, None is returned.

Return type

SegNode

find_tributary_basins(tributaries)[source]

Finds the four tributaries with the largest drainage areas.

Parameters

tributaries (list) – A list of tributary SegNode’s to be searched.

Returns

A list of the largest_tributaries found in the list of tributaries given.

Return type

list

force_upstream_area(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Aggregates upstream basin area regardless of _end_marker.

Calculates the basin area upstream of node of interest, ignoring _end_marker. This does include the area of the node of interest.

Parameters

node (SegNode) – A SegNode to start from and calculate both its and all upstream SegNode’s aggregate area.

Returns

Aggregate upstream area.

Return type

float

generate_mainstream_map()[source]

Highlights the mainstream for plotting in draw_network.

Creates a list of which nodes are part of the mainstream in order of the seg_id_values.

Returns

int booleans denoting whether a river segment is part of the mainstream, 1, or off the mainstream, 0, corresponding to each seg_id in seg_id_values.

Return type

list

generate_node_highlight_map(seg_ids: list)[source]

Highlight specific SegNode’s in a SimpleRiverNetwork.

Takes a list of seg_id’s and creates a pandas.Series that highlights the nodes in the list. This is best used as a diagnostic tool, finding where a specific river segment is located on a network map. Using a colormap that has notably different colors on the extremes, such as matplotlib’s “Reds”, is recommended to make highlighted nodes stand out.

Parameters

seg_ids (list) – A list of seg_id values to mark specific SegNode’s apart from other SegNode’s.

Returns

A list that will identify these highlighted nodes for draw_network by int booleans, 1 is to be higlighted while 0 is not.

Return type

list

generate_pfaf_codes()[source]

Creates a list of pfaf_code values corresponding to seg_id_values.

Returns

pfaf_code values mapped to seg_id_values.

Return type

list

generate_pfaf_color_map()[source]

Extracts the first pfaffstetter digit of each code for colorcoding.

This prepares a color_measure for draw_network, where each unique first level pfaffstetter basin can be assigned it’s own color by a Colormap. If a SegNode has the pfaf_code “1234”, then its will have the value “1” in returned map. Using a qualitative colormap of 10 distinct colors, such as matplotlib’s “tab10”, is recommended since there are 10 unique pfafstetter digits, (0 to 9).

Returns

Map of color codes for pfaf_code of each SegNode to the indicies of each SegNode corresponding to its seg_id in seg_id_values. The index should not be reassigned as it is used to match the correct nodes together in draw_network.

Return type

pandas.Series

generate_pfaf_map()[source]

List of pfaffstetter codes corresponding to seg_id_values.

Creates a list of pfaf_code values in the order of the seg_id_values, including the `seg_id_values. This is a little more cluttered, reccommended only for debugging purposes.

Returns

pfaf_code values for each corresponding seg_id in seg_id_values.

Return type

list

generate_weight_map()[source]

Creates a list proportional upstream area ratios for each seg_id_values.

Creates a list of fractional weights equivalent to the node’s upstream area divided by the overall basin_area of the whole SimpleRiverNetwork. these are in order of the seg_id_values.

Returns

(river segment’s cumulative upstream area)/(total basin area) for each river segment corresponding to the seg_id’s in seg_id_values.

Return type

list

net_upstream_area(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Aggregates upstream basin area.

Calculates the basin area upstream of node. This does include the area of the node of interest

Parameters

node (SegNode) – A SegNode to start from and calculate both its and all upstream nodes aggregate area.

Returns

Aggregate upstream area.

Return type

float

parse_upstream(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Constructs and connects SegNodes according to the network.

Recursively constructs network by searching what SegNodes are upstream of the current SegNode and updates the current SegNode’s upstream list while also building the adjacency matrix.

Parameters

node (SegNode) – A SegNode to start building the network from.

pfaf_aggregate()[source]

Aggregates the flow network by one pfafstetter level.

This “rolls up” a SimpleRiverNetwork to simplify the overall map, similar to decreasing the number of lines in a contour plot to make it more legible. If the longest pfaf_code in the network is four digits long, such as “1234” or “5678”, then all of the SegNodes sharing the first three digits will be replaced by a singular SegNode with the pfaf_code of those first three digits. For example: if you have pfaf_code’s “1231”, “1232”, “1233”, and “1234”, then they become a SegNode with the pfaf_code “123”. Basin area for each SegNode is summed to create the basin area of the new SegNode. Aggregating other properties is still in progress.

reconstruct_adj_mat(node, adj_mat: numpy.ndarray)[source]

Rebuilds the adjacency matrix from an existing flow tree.

Parameters
  • node (SegNode) – A SegNode to construct the adjacency matrix from

  • adj_mat (numpy.ndarray) – A square numpy ndarray of zeros originally, the size equal to len(seg_id_values) by len(seg_id_values), to be filled.

Returns

The reconstructed adjacenecy matrix that needs to be set as the network’s adjacency matrix to actually alter the flow tree.

Return type

numpy.ndarray

size_network_graph(measure)[source]

TODO: Implement

sort_by_pfaf(nodes: list, degree=<class 'int'>)[source]

Sorts a list of SegNodes by pfaf_code.

Sorts a list of SegNode’s in decreasing order of a pfaffstetter digit at the given degree. For example, if you have degree 2, and a pfaf_code of 1234, it will use “3” in sorting.

Parameters
  • nodes (list) – A list of SegNode’s to sort.

  • degree (int) – Which index in a pfaf_code the nodes are to be sorted by

Returns

The list of sorted nodes.

Return type

list

sort_streams(node=<class 'bmorph.evaluation.simple_river_network.SegNode'>)[source]

Sorts the mainstream and tributary branches from each other.

Returns which branches are part of the mainstream and which are part of the tributaries based on aggregate upstream area. This is typically used to determine even and odd pfaffsetter basins for encoding.

Parameters

node (SegNode) – A SegNode to start tracing the mainstream from. this is the “root” of the flow tree.

Returns

  • mainstreams (list) – A list of mainstream SegNode’s determined by having the greatest upstream area.

  • tributaries (list) – A list of tributaries having upstream_area less than the mainstream but still encountered along the way.

spawn_srn(spawn_outlet)[source]

Creates a new SimpleRiverNetwork from spawn_outlet and upstream of it.

A new SimpleRiverNetwork structure is generated from the current network. This is useful if modeling a large watershed and desire to focus on a specific element of it without having to reselect out all the nodes, for example: the Snake River Basin within the Columbia River Basin dataset.

Parameters

spawn_outlet (int) – The `seg_id`of a SegNode in the current SimpleRiverNetwork to generate from. This creates an outlet that the new tree is to be spawned from.

Returns

A new SimpleRiverNetwork with the outlet set to spawn_outlet. Properites are transferred from the pervious SimpleRiverNetwork to this one, but any SegNodes not upstream from spawn_outlet are not included in this new one.

Return type

SimpleRiverNetwork

update_node_area(node: bmorph.evaluation.simple_river_network.SegNode)[source]

Updates the desired node with basin area information.

Parameters

node (SegNode) – A SegNode to change it and only its basin_area.