Predicting tides in R

Skip to TL/DR….

Water movement in estuaries is affected by many processes acting across space and time. Tidal exchange with the ocean is an important hydrodynamic process that can define several characteristics of an estuary. Physical flushing rates and water circulation are often controlled by tidal advection, whereas chemical and biological components are affected by the flux of dissolved or particulate components with changes in the tide. As such, describing patterns of tidal variation is a common objective of coastal researchers and environmental managers.

Tidal predictions are nothing new. A clever analog approach has been around since the late 1800s. The tide-predicting machine represents the tide as the summation of waves with different periods and amplitudes. Think of a continuous line plot where the repeating pattern is linked to a rotating circle, Representing the line in two-dimensions from the rotating circle creates a sine wave with the amplitude equal to the radius of the circle. A more complex plot can be created by adding the output of two or more rotating disks, where each disk varies in radius and rate of rotation. The tide-predicting machine is nothing more than a set of rotating disks linked to a single graph as the sum of the rotations from all disks. Here’s a fantastic digital representation of the tide-predicting machine:

tidemachine

Tides are caused primarily by the gravitational pull of the sun and moon on the earth’s surface. The elliptical orbits of both the moon around the earth and the earth around the sun produce periodic but unequal forces that influence water movement. These forces combined with local surface topography and large-scale circulation patterns from uneven heating of the earth’s surface lead to the variation of tidal patterns across the globe. Although complex, these periodic patterns can be characterized as the summation of sine waves, where one wave represents the effect of a single physical process (e.g., diurnal pull of the moon). Describing these forces was the objective of the earlier tide-predicting machines. Fortunately for us, modern software (i.e., R) provides us with a simpler and less expensive approach based on harmonic regression.

Approach

We’ll create and sum our own sine waves to demonstrate complexity from addition. All sine waves follow the general form y as a function of x:

sinewave

where the amplitude of the wave is beta and the frequency (or 1 / period) is f. The parameters alpha and Phi represent scalar shifts in the curve up/down and left/right, respectively. We can easily create a function in R to simulate sine waves with different characteristics. This function takes the parameters from the above equation as arguments and returns a sine wave (y) equal in length to the input time series (x). The alpha and beta are interpreted as units of wave height (e.g., meters) and f and Phi are in hours.

# function for creating sine wave
waves <- function(time_in, alpha = 0, beta = 1, freq = 24, phi = 0){

  # timestep per hour
  time_step <- 60 / unique(diff(time_in))

  # set phi as difference in hours from start of time_in
  phi <- min(time_in) + phi * 3600
  phi<- as.numeric(difftime(phi, min(time_in)))
  phi <- phi / time_step

  # get input values to cos func
  in_vals <- seq(0, length(time_in), length = length(time_in))
  in_vals <- in_vals / time_step
  in_vals <- 2 * pi * in_vals * 1 / freq

  # wave
  y <- alpha + beta * sin(in_vals + phi)

  return(y)

}

The default arguments will return a sine wave with an amplitude of one meter and frequency of one wave per 24 hours. Two additional time series are created that vary these two parameters.

# input time series for two weeks, 15 minute time step
x <- as.POSIXct(c('2017-04-01', '2017-04-15'))
x <- seq(x[1], x[2], by = 60 * 15)

# get three sine waves
# a: default
# b: amplitude 0.5, 48 hour period
# c: amplitude 2, 12 hour period
a <- waves(x)
b <- waves(x, beta = 0.5, f = 48)
c <- waves(x, beta = 2, f = 12)

We can combine all three waves in the same data object, take the summation, and plot to see how it looks.

# for data munging and plotting
library(tidyverse)

# get sum of all y values, combine to single object
yall <- rowSums(cbind(a, b, c))
dat <- data.frame(x, a, b, c, yall) %>% 
  gather('var', 'val', -x)

# plot
ggplot(dat, aes(x = x, y = val)) + 
  geom_line() + 
  facet_wrap(~var, ncol = 1) + 
  theme_bw()

The important piece of information we get from the plot is that adding simple sine waves can create complex patterns. As a general rule, about 83% of the variation in tides is created by seven different harmonic components that, when combined, lead to the complex patterns we observe from monitoring data. These components are described as being of lunar or solar origin and relative periods occurring either once or twice daily. For example, the so-called ‘M2’ component is typically the dominant tidal wave caused by the moon, twice daily. The periods of tidal components are constant across locations but the relative strength (amplitudes) vary considerably.

maincomponents

The oce package in R has a nifty function for predicting up to 69 different tidal constituents. You’ll typically only care about the main components above but it’s useful to appreciate the variety of components included in a tidal signal. We’ll apply the tidem function from oce to predict the tidal components on a subset of SWMP data. A two-week period from the Apalachicola Bay Dry Bar station is used.

library(SWMPr)
library(oce)

# clean, one hour time step, subset, fill gaps
dat <- qaqc(apadbwq) %>% 
  setstep(timestep = 60) %>% 
  subset(subset = c('2013-01-01 0:0', '2013-12-31 0:0'), select = 'depth') %>% 
  na.approx(maxgap = 1e6)

The tidem function from oce requires a ‘sealevel’ object as input. Plotting the sealevel object using the plot method from oce shows three panels; the first is the complete time series, second is the first month in the record, and third is a spectral decomposition of the tidal components as cycles per hour (cph, or period).

datsl <- as.sealevel(elevation = dat$depth, time = dat$datetimestamp)
plot(datsl)

We can create a model to estimate the components from the table above using tidem. Here, we estimate each component separately to extract predictions for each, which we then sum to estimate the complete time series.

# tidal components to estimate
constituents <- c('M2', 'S2', 'N2', 'K2', 'K1', 'O1', 'P1')

# loop through tidal components, predict each with tidem
preds <- sapply(constituents, function(x){
  
    mod <- tidem(t = datsl, constituent = x)
    pred <- predict(mod)
    pred - mean(pred)
    
  }) 

# combine prediction, sum, add time data
predall <- rowSums(preds) + mean(datsl[['elevation']])
preds <- data.frame(time = datsl[['time']], preds, Estimated = predall) 

head(preds)
## time M2 S2 N2 K2
## 1 2013-01-01 00:00:00 -0.111578526 -0.020833606 0.000215982 -0.0048417234
## 2 2013-01-01 01:00:00 -0.118544835 -0.008940681 0.006428260 -0.0093752262
## 3 2013-01-01 02:00:00 -0.095806627 0.005348532 0.011088593 -0.0113830570
## 4 2013-01-01 03:00:00 -0.049059634 0.018205248 0.013072149 -0.0103243372
## 5 2013-01-01 04:00:00 0.009986414 0.026184523 0.011900172 -0.0064842694
## 6 2013-01-01 05:00:00 0.066540974 0.027148314 0.007855534 -0.0008973087
## K1 O1 P1 Estimated
## 1 0.0911501572 0.01312209 0.0381700294 1.463683
## 2 0.0646689921 0.03909021 0.0340807303 1.465686
## 3 0.0337560517 0.06274939 0.0276811946 1.491713
## 4 0.0005294868 0.08270543 0.0194051690 1.532812
## 5 -0.0327340223 0.09778235 0.0098135843 1.574727
## 6 -0.0637552642 0.10709170 -0.0004434629 1.601819

Plotting two weeks from the estimated data shows the results. Note the variation in amplitude between the components. The M2 , K1, and O1 components are the largest at this location. Also note the clear spring/neap variation in range every two weeks for the combined time series. This complex fort-nightly variation is caused simply by adding the separate sine waves.

# prep for plot
toplo <- preds %>% 
  gather('component', 'estimate', -time) %>% 
  mutate(component = factor(component, level = c('Estimated', constituents)))

# plot two weeks
ggplot(toplo, aes(x = time, y = estimate, group = component)) + 
  geom_line() + 
  scale_x_datetime(limits = as.POSIXct(c('2013-07-01', '2013-07-31'))) + 
  facet_wrap(~component, ncol = 1, scales = 'free_y') + 
  theme_bw() 

All tidal components can of course be estimated together. By default, the tidem function estimates all 69 tidal components. Looking at our components of interest shows the same estimated amplitudes in the plot above.

# estimate all components together
mod <- tidem(t = datsl)

# get components of interest
amps <- data.frame(mod@data[c('name', 'amplitude')]) %>% 
  filter(name %in% constituents) %>% 
  arrange(amplitude)
amps
## name amplitude
## 1 K2 0.01091190
## 2 N2 0.01342395
## 3 S2 0.02904518
## 4 P1 0.04100388
## 5 O1 0.11142455
## 6 M2 0.12005114
## 7 K1 0.12865764

And of course comparing the model predictions with the observed data is always a good idea.

# add predictions to observed data
dat$Estimated <- predict(mod)

# plot one month
ggplot(dat, aes(x = datetimestamp, y = depth)) + 
  geom_point() + 
  geom_line(aes(y = Estimated), colour = 'blue') + 
  scale_x_datetime(limits = as.POSIXct(c('2013-07-01', '2013-07-31'))) + 
  scale_y_continuous(limits = c(0.9, 2)) +
  theme_bw() 

The fit is not perfect but this could be from several reasons, none of which are directly related to the method – instrument drift, fouling, water movement from non-tidal sources, etc. The real value of the model is we can use it to fill missing observations in tidal time series or to predict future observations. We also get reasonable estimates of the main tidal components, i.e., which physical forces are really driving the tide and how large are the contributions. For example, our data from Apalachicola Bay showed that the tide is driven primarily by the M2, K1, and O1 components, where each had relative amplitudes of about 0.1 meter. This is consistent with general patterns of micro-tidal systems in the Gulf of Mexico. Comparing tidal components in other geographic locations would produce very different results, both in the estimated amplitudes and the dominant components.

TL/DR

Here’s how to estimate the tide from an observed time series. The data are from SWMPr and the tidem model is from oce.

library(SWMPr)
library(oce)

# clean input data, one hour time step, subset, fill gaps
dat <- qaqc(apadbwq) %>% 
  setstep(timestep = 60) %>% 
  subset(., subset = c('2013-01-01 0:0', '2013-12-31 0:0'), select = 'depth') %>% 
  na.approx(maxgap = 1e6)

# get model
datsl <- as.sealevel(elevation = dat$depth, time = dat$datetimestamp)
mod <- tidem(t = datsl)

# add predictions to observed data
dat$Estimated <- predict(mod)

# plot
ggplot(dat, aes(x = datetimestamp, y = Estimated)) + 
  geom_line() +
  theme_bw() 

 

SWMPr 2.1.0 on CRAN

I’ve just released an updated version of my package for estuary monitoring data, SWMPr, available on CRAN. I’ve made several additions to the package since it’s initial release – nothing too crazy but enough to warrant another push to CRAN and blog post. I’ve been pretty bad about regular updates but I’ve added a few features to make some of the functions easier to use in addition to some new functions for plotting SWMP data. I’ll start with a brief overview of the package then describe some of the major changes since the last release (2.0.0). As always, please keep a close watch on the GitHub repository for progress on the development version of the package.

What is SWMPr? SWMPr is an R package for estuary monitoring data from the National Estuarine Research Reserve System (NERRS). NERRS is a collection of reserve programs located at 28 estuaries in the United States. The System-Wide Monitoring Program (SWMP) was established by NERRS in 1995 as a long-term monitoring program to collect water quality, nutrient, and weather data at over 140 stations (more info here). To date, over 58 million records have been collected and are available online through the Centralized Data Management Office (CDMO). The SWMPr package provides a bridge between R and the data provided by SWMP (which explains the super clever name). The package is meant to augment existing CDMO services and to provide more generic features for working with water quality time series. The initial release included functions to import SWMP data from the CDMO directly into R, functions for data organization, and some basic analysis functions. The original release also included functions for estimating rates of ecosystem primary production using the open-water method.

# installing and loading the package
install.packages('SWMPr')
library(SWMPr)

What’s new in 2.1? A full list of everything that’s changed can be viewed here. Not all these changes are interesting (bugs mostly), but they are worth viewing if you care about the nitty gritty. The most noteworthy changes include the following.

  • The overplot function can be used to plot multiple variables with identical scaling on the y-axis. I think this is generally discouraged under sound plotting theory (see the rants here), but overplotting is an often-requested feature regardless of popular opinion. I had to use the base graphics to write this function since it’s not possible with ggplot. I actually borrowed most of the code from a colleague at NERRS, shouts to the Grand Bay office. To illustrate ease of use…
# import data and do some initial clean up
data(apacpwq)
dat <- qaqc(apacpwq)

# a truly heinous plot
overplot(dat, select = c('depth', 'do_mgl', 'ph', 'turb'),
  subset = c('2013-01-01 0:0', '2013-02-01 0:0'), lwd = 2)

  • The qaqc function now has more flexible filtering of QAQC data flags by using regular expression matching, rather than searching by integer flags as in the previous version. What this means is that observations can be filtered with greater control over what flags and errors are removed. This example shows how to remove flags using the old method as integer flags and using the new method. The second example will keep all flags that are annotated with the ‘CSM’ comment code (meaning check the metadata). The value with this approach is that not all integer flags are coded the same, i.e., QAQC flags with the same integer may not always have the same error code. The user may not want to remove all flags of a single type if only certain error codes are important.
# import data
data(apadbwq)
dat <- apadbwq

# retain only '0' and '-1' flags, as in the older version
newdat <- qaqc(dat, qaqc_keep = c('0', '-1'))

# retain observations with the 'CSM' error code
newdat <- qaqc(dat, qaqc_keep = 'CSM')
  • Several of the data import functions were limited in the total number of records that could be requested from the CDMO. I made some dirty looping hacks so that most of these rate limitations, although technically still imposed, can be ignored when making large data requests to the CDMO. Previously, the single_param, all_params, and all_params_dtrng functions were limited to 100 records in a single request – not very useful for time series taken every 15 minutes. The new version lets you download any number of records using these functions, although be warned that the data request can take a long time for larger requests. As before, your computer’s IP address must be registered with the CDMO to use these functions.

  • Although it’s now theoretically possible to retrieve all the SWMP data with the above functions, using the import_local function is still much, much easier. The main advantage of this function is that local data can be imported into R, which allows the user to import large amounts of data from a single request. The new release of SWMPr makes this process even easier by allowing data to be imported directly from the compressed, zipped data folder returned from the CDMO data request. The syntax is the same, but the full path including the .zip file extension must be included. As before, this function is designed to be used with data from the zip downloads feature of the CDMO.

# this is the path for the downloaded data files, zipped folder
path <- 'C:/this/is/my/data/path.zip'

# import the data
dat <- import_local(path, 'apaebmet')
  • A nice feature in R documentation that I recently discovered is the ability to search for functions by ‘concept’ or ‘alias’ tags. I’ve described the functions in SWMPr as being in one of three categories based on their intended use in the data workflow: retrieve, organize, and analyze. The new version of SWMPr uses these categories as search terms for finding the help files for each function. The package includes additional functions not in these categories but they are mostly intended as helpers for the primary functions. As always, consult the manual for full documentation.
help.search(package = 'SWMPr', 'retrieve')
help.search(package = 'SWMPr', 'organize')
help.search(package = 'SWMPr', 'analyze')
  • Finally, I’ve added several default methods to existing SWMPr functions to make them easier to use outside of the normal SWMPr workflow. For example, combining time series with different time steps is a common challenge prior to data analysis. The comb function achieves this task for SWMP data, although using the previous release of the package on generic data was rather clunky. The new default method makes it easier to combine data objects with a generic format (data frames), provided a few additional arguments are provided so the function knows how to handle the information. Default methods were also added for the setstep, decomp, and decomp_cj functions.

I guarantee there are some bugs in this new release and I gladly welcome bug reports on the issues tab of the development repo. Ideas for additional features can also be posted. Please check out our SWMPrats web page for other SWMP-related analysis tools.

Cheers,

Marcus

Reinventing the wheel for ordination biplots with ggplot2

I’ll be the first to admit that the topic of plotting ordination results using ggplot2 has been visited many times over. As is my typical fashion, I started creating a package for this purpose without completely searching for existing solutions. Specifically, the ggbiplot and factoextra packages already provide almost complete coverage of plotting results from multivariate and ordination analyses in R. Being the stubborn individual, I couldn’t give up on my own package so I started exploring ways to improve some of the functionality of biplot methods in these existing packages. For example, ggbiplot and factoextra work almost exclusively with results from principal components analysis, whereas numerous other multivariate analyses can be visualized using the biplot approach. I started to write methods to create biplots for some of the more common ordination techniques, in addition to all of the functions I could find in R that conduct PCA. This exercise became very boring very quickly so I stopped adding methods after the first eight or so. That being said, I present this blog as a sinking ship that was doomed from the beginning, but I’m also hopeful that these functions can be built on by others more ambitious than myself.

The process of adding methods to a default biplot function in ggplot was pretty simple and not the least bit interesting. The default ggpord biplot function (see here) is very similar to the default biplot function from the stats base package. Only two inputs are used, the first being a two column matrix of the observation scores for each axis in the biplot and the second being a two column matrix of the variable scores for each axis. Adding S3 methods to the generic function required extracting the relevant elements from each model object and then passing them to the default function. Easy as pie but boring as hell.

I’ll repeat myself again. This package adds nothing new to the functionality already provided by ggbiplot and factoextra. However, I like to think that I contributed at least a little bit by adding more methods to the biplot function. On top of that, I’m also naively hopeful that others will be inspired to fork my package and add methods. Here you can view the raw code for the ggord default function and all methods added to that function. Adding more methods is straightforward, but I personally don’t have any interest in doing this myself. So who wants to help??

Visit the package repo here or install the package as follows.

library(devtools)
install_github('fawda123/ggord')
library(ggord)

Available methods and examples for each are shown below. These plots can also be reproduced from the examples in the ggord help file.

##  [1] ggord.acm      ggord.ca       ggord.coa      ggord.default 
##  [5] ggord.lda      ggord.mca      ggord.MCA      ggord.metaMDS 
##  [9] ggord.pca      ggord.PCA      ggord.prcomp   ggord.princomp
# principal components analysis with the iris data set
# prcomp
ord <- prcomp(iris[, 1:4])

p <- ggord(ord, iris$Species)
p

p + scale_colour_manual('Species', values = c('purple', 'orange', 'blue'))

p + theme_classic()

p + theme(legend.position = 'top')

p + scale_x_continuous(limits = c(-2, 2))

# principal components analysis with the iris dataset
# princomp
ord <- princomp(iris[, 1:4])

ggord(ord, iris$Species)

# principal components analysis with the iris dataset
# PCA
library(FactoMineR)

ord <- PCA(iris[, 1:4], graph = FALSE)

ggord(ord, iris$Species)

# principal components analysis with the iris dataset
# dudi.pca
library(ade4)

ord <- dudi.pca(iris[, 1:4], scannf = FALSE, nf = 4)

ggord(ord, iris$Species)

# multiple correspondence analysis with the tea dataset
# MCA
data(tea)
tea <- tea[, c('Tea', 'sugar', 'price', 'age_Q', 'sex')]

ord <- MCA(tea[, -1], graph = FALSE)

ggord(ord, tea$Tea)

# multiple correspondence analysis with the tea dataset
# mca
library(MASS)

ord <- mca(tea[, -1])

ggord(ord, tea$Tea)

# multiple correspondence analysis with the tea dataset
# acm
ord <- dudi.acm(tea[, -1], scannf = FALSE)

ggord(ord, tea$Tea)

# nonmetric multidimensional scaling with the iris dataset
# metaMDS
library(vegan)
ord <- metaMDS(iris[, 1:4])

ggord(ord, iris$Species)

# linear discriminant analysis
# example from lda in MASS package
ord <- lda(Species ~ ., iris, prior = rep(1, 3)/3)

ggord(ord, iris$Species)

# correspondence analysis
# dudi.coa
ord <- dudi.coa(iris[, 1:4], scannf = FALSE, nf = 4)

ggord(ord, iris$Species)

# correspondence analysis
# ca
library(ca)
ord <- ca(iris[, 1:4])

ggord(ord, iris$Species)

Cheers,

Marcus

SWMPr 2.0.0 now on CRAN

I’m pleased to announce that my second R package, SWMPr, has been posted on CRAN. I developed this package to work with water quality time series data from the System Wide Monitoring Program (SWMP) of the National Estuarine Research Reserve System (NERRS). SWMP was established in 1995 to provide continuous environmental data at over 300 fixed stations in 28 estuaries of the United States (more info here). SWMP data are collected with the general objective of describing dynamics of estuarine ecosystems to better inform coastal management. However, simple tools for processing and evaluating the increasing quantity of data provided by the monitoring network have complicated broad-scale comparisons between systems and, in some cases, simple trend analysis of water quality parameters at individual sites. The SWMPr package was developed to address common challenges of working with SWMP data by providing functions to retrieve, organize, and analyze environmental time series data.

The development version of this package lives on GitHub. It can be installed from CRAN and loaded in R as follows:

install.packages('SWMPr')
library(SWMPr)

SWMP data are maintained online by the Centralized Data Management Office (CDMO). Time series data describing water quality, nutrient, or weather observations can be downloaded for any of the 28 estuaries. Several functions are provided by the SWMPr package that allow import of data from CDMO into R, either through direct download through the existing web services or by local (import_local) or remote (import_remote) sources. Imported data are loaded as swmpr objects with several attributes following standard S3 object methods. The remaining functions in the package are used to organize and analyze the data using a mix of standard methods for time series and more specific approaches developed for SWMP. This blog concludes with a brief summary of the available functions, organized by category. As always, be sure to consult the help documentation for more detailed information.

I’ve also created two shiny applications to illustrate the functionality provided by the package. The first shiny app evaluates trends in SWMP data within and between sites using an interactive map. Trends between reserves can be viewed using the map, whereas trends at individual sites can be viewed by clicking on a map location. The data presented on the map were imported and processed using the import_local, qaqc, and aggregate functions. The second app provides graphical summaries of water quality, weather, or nutrient data at individual stations using the plot_summary function. Data were also processed with the import_local, qaqc, and aggregate functions.

SWMP trends map (click to access):

swmp_comp

SWMP summary map (click to access):

swmp_summary

SWMPr functions

Below is a brief description of each function in the SWMPr package. I’m currently working on a manuscript to describe use of the package in much greater detail. For now, please visit our website that introduced version 1.0.0 of the SWMPr package (check the modules tab).

Retrieve

all_params Retrieve up to 100 records starting with the most recent at a given station, all parameters. Wrapper to exportAllParamsXMLNew function on web services.
all_params_dtrng Retrieve records of all parameters within a given date range for a station. Optional argument for a single parameter. Maximum of 1000 records. Wrapper to exportAllParamsDateRangeXMLNew.
import_local Import files from a local path. The files must be in a specific format, specifically those returned from the CDMO using the zip downloads option for a reserve.
import_remote Import SWMP site data from a remote independent server. These files have been downloaded from CDMO, processed using functions in this package, and uploaded to an Amazon server for quicker import into R.
single_param Retrieve up to 100 records for a single parameter starting with the most recent at a given station. Wrapper to exportSingleParamXMLNew function on web services.

Organize

comb.swmpr Combines swmpr objects to a common time series using setstep, such as combining the weather, nutrients, and water quality data for a single station. Only different data types can be combined.
qaqc.swmpr Remove QAQC columns and remove data based on QAQC flag values for a swmpr object. Only applies if QAQC columns are present.
qaqcchk.swmpr View a summary of the number of observations in a swmpr object that are assigned to different QAQC flags used by CDMO. The output is used to inform further processing but is not used explicitly.
rem_reps.swmpr Remove replicate nutrient data that occur on the same day. The default is to average replicates.
setstep.swmpr Format data from a swmpr object to a continuous time series at a given timestep. The function is used in comb.swmpr and can also be used with individual stations.
subset.swmpr Subset by dates and/or columns for a swmpr object. This is a method passed to the generic `subset’ function provided in the base package.

Analyze

aggreswmp.swmpr Aggregate swmpr objects for different time periods – years, quarters, months, weeks, days, or hours. Aggregation function is user-supplied but defaults to mean.
aggremetab.swmpr Aggregate metabolism data from a swmpr object. This is primarily used within plot_metab but may be useful for simple summaries of raw daily data.
ecometab.swmpr Estimate ecosystem metabolism for a combined water quality and weather dataset using the open-water method.
decomp.swmpr Decompose a swmpr time series into trend, seasonal, and residual components. This is a simple wrapper to decompose. Decomposition of monthly or daily trends is possible.
decomp_cj.swmpr Decompose a swmpr time series into grandmean, annual, seasonal, and events components. This is a simple wrapper to decompTs in the wq package. Only monthly decomposition is possible.
hist.swmpr Plot a histogram for a swmpr object.
lines.swmpr Add lines to an existing swmpr plot.
na.approx.swmpr Linearly interpolate missing data (NA values) in a swmpr object. The maximum gap size that is interpolated is defined as a maximum number of records with missing data.
plot.swmpr Plot a univariate time series for a swmpr object. The parameter name must be specified.
plot_metab Plot ecosystem metabolism estimates after running ecometab on a swmpr object.
plot_summary Create summary plots of seasonal/annual trends and anomalies for a water quality or weather parameter.
smoother.swmpr Smooth swmpr objects with a moving window average. Window size and sides can be specified, passed to filter.

Miscellaneous

calcKL Estimate the reaeration coefficient for air-sea gas exchange. This is only used within the ecometab function.
map_reserve Create a map of all stations in a reserve using the ggmap package.
metab_day Identify the metabolic day for each approximate 24 period in an hourly time series. This is only used within the ecometab function.
param_names Returns column names as a list for the parameter type(s) (nutrients, weather, or water quality). Includes QAQC columns with ‘f_’ prefix. Used internally in other functions.
parser Parses html returned from CDMO web services, used internally in retrieval functions.
site_codes Metadata for all stations, wrapper to exportStationCodesXMLNew function on web services.
site_codes_ind Metadata for all stations at a single site, wrapper to NERRFilterStationCodesXMLNew function on web services.
swmpr Creates object of swmpr class, used internally in retrieval functions.
time_vec Converts time vectors to POSIX objects with correct time zone for a site/station, used internally in retrieval functions.

Cheers,

Marcus

NeuralNetTools 1.0.0 now on CRAN

After successfully navigating the perilous path of CRAN submission, I’m pleased to announce that NeuralNetTools is now available!  From the description file, the package provides visualization and analysis tools to aid in the interpretation of neural networks, including functions for plotting, variable importance, and sensitivity analyses. I’ve written at length about each of these functions (see here, here, and here), so I’ll only provide an overview in this post. Most of these functions have remained unchanged since I initially described them, with one important change for the Garson function. Rather than reporting variable importance as -1 to 1 for each variable, I’ve returned to the original method that reports importance as 0 to 1. I was getting inconsistent results after toying around with some additional examples and decided the original method was a safer approach for the package. The modified version can still be installed from my GitHub gist. The development version of the package is also available on GitHub. Please use the development page to report issues.

The package is fairly small but I think the functions that have been included can help immensely in evaluating neural network results. The main functions include:

  • plotnet: Plot a neural interpretation diagram for a neural network object, original blog post here
    # install, load package
    install.packages(NeuralNetTools)
    library(NeuralNetTools)
    
    # create model
    library(neuralnet)
    AND <- c(rep(0, 7), 1)
    OR <- c(0, rep(1, 7))
    binary_data <- data.frame(expand.grid(c(0, 1), c(0, 1), c(0, 1)), AND, OR)
    mod <- neuralnet(AND + OR ~ Var1 + Var2 + Var3, binary_data,
                     hidden = c(6, 12, 8), rep = 10, err.fct = 'ce', linear.output = FALSE)
    
    # plotnet
    par(mar = numeric(4), family = 'serif')
    plotnet(mod, alpha = 0.6)
    
    Fig: Using the plotnet function.

  • garson: Relative importance of input variables in neural networks using Garson’s algorithm, original blog post here
    # create model
    library(RSNNS)
    data(neuraldat)
    x <- neuraldat[, c('X1', 'X2', 'X3')]
    y <- neuraldat[, 'Y1']
    mod <- mlp(x, y, size = 5)
    
    # garson
    garson(mod, 'Y1')
    
    Fig: Using the garson function.

  • lekprofile: Conduct a sensitivity analysis of model responses in a neural network to input variables using Lek’s profile method, original blog post here
    # create model
    library(nnet)
    data(neuraldat)
    mod <- nnet(Y1 ~ X1 + X2 + X3, data = neuraldat, size = 5)
    
    # lekprofile
    lekprofile(mod)
    
    Fig: Using the lekprofile function.

A few other functions are available that are helpers to the main functions. See the documentation for a full list.

All the functions have S3 methods for most of the neural network classes available in R, making them quite flexible. This includes methods for nnet models from the nnet package, mlp models from the RSNNS package, nn models from the neuralnet package, and train models from the caret package. The functions also have methods for numeric vectors if the user prefers inputting raw weight vectors for each function, as for neural network models created outside of R.

Huge thanks to Hadley Wickham for his packages that have helped immensely with this process, namely devtools and roxygen2. I also relied extensively on his new web book for package development. Any feedback regarding NeuralNetTools or its further development is appreciated!

Cheers,

Marcus

Back to square one – R and RStudio installation

I remember my first experience installing R. Basic installation can be humbling for someone not familiar with mirror networks or file binaries. I remember not knowing the difference between base and contrib… which one to select? The concept of CRAN and mirrors was also new to me. Which location do I choose and are they all the same? What the hell is a tar ball?? Simple challenges like these can be discouraging to first-time users that have never experienced the world of open-source software. Although these challenges seem silly now, they were very real at the time. Additionally, help documentation is not readily accessible for the novice. This month I decided to step back and present a simple guide to installing R and RStudio. Surprisingly, a quick Google search was unable to locate comparable guides. I realize that most people don’t have any problem installing R, but I can remember a time when step-by-step installation instructions would have been very appreciated. Also, I made this guide for a workshop and I’m presenting it here so I don’t have to create a different blog post for this month… I am lazy. Files for creating the guide are available here.

ggplot2 redux.

Cheers,

Marcus

Some love for ggplot2

With all the recent buzz about ggvis (this, this, and this) it’s often easy to forget all that ggplot2 offers as a graphics package. True, ggplot is a static approach to graphing unlike ggvis but it has fundamentally changed the way we think about plots in R. I recently spent some time thinking about some of the more useful features of ggplot2 to answer the question ‘what is offered by ggplot2 that one can’t do with the base graphics functions?’ First-time users of ggplot2 are often confused by the syntax, yet it is precisely this syntax built on the philosophy of the grammar of graphics that makes ggplot2 so powerful. Adding content layers to mapped objects are central to this idea, which allows linking of map aesthetics through a logical framework. Additionally, several packages have been developed around this philosophy to extend the functionality of ggplot2 in alternative applications (e.g., ggmap, GGally, ggthemes).

I recently gave a presentation to describe some of my favorite features of ggplot2 and other packages building on its core concepts. I describe the use of facets for multi-panel plots, default and custom themes, ggmap for spatial mapping with ggplot2, and GGally for generalized pairs plots. Although this is certainly a subjective and incomplete list, my workflows have become much more efficient (and enjoyable) by using these tools. Below is a link to the presentation. Note that this will not load using internet explorer and you may have to reload if using Chrome to get the complete slide deck. This is my first time hosting a Slidify presentation on RPubs, so please bear with me. The presentation materials are also available at Github.

ggplot2 redux.

What are some of your favorite features of ggplot2??

Cheers,

Marcus