The fluxfinder
package provides functions to parse
static-chamber greenhouse gas measurement files generated by a variety
of instruments; compute flux rates using multi-observation metadata; and
generate diagnostic metrics and plots. It’s designed to be easy to
integrate into scientific workflows.
library(fluxfinder)
# Data from a LI-7810
f <- system.file("extdata/TG10-01087.data", package = "fluxfinder")
dat <- ffi_read_LI7810(f)
#> TG10-01087.data: read 507 rows of TG10-01087 data, 2022-10-27 10:35:42 to 2022-10-27 10:44:08 EST
# Note that the fluxfinder read functions print some info after reading
# Set "options(fluxfinder.quiet = TRUE)" to suppress such messages
# Look at a subset of the data; the full data frame has 500+ rows and 25 columns
dat[1:6, 1:9]
#> DATAH SECONDS NANOSECONDS NDX DIAG REMARK H2O CO2 CH4
#> 1 DATA 1666884942 313442945 4509 0 NA 12500.35 458.8612 2068.000
#> 2 DATA 1666884943 313442945 4513 0 NA 12449.87 458.1066 2069.830
#> 3 DATA 1666884944 313442945 4517 0 NA 12418.81 458.7320 2071.540
#> 4 DATA 1666884945 313442945 4521 0 NA 12429.65 458.8037 2071.960
#> 5 DATA 1666884946 313442945 4525 0 NA 12439.89 458.2824 2069.660
#> 6 DATA 1666884947 313442945 4529 0 NA 12429.37 456.9340 2066.544
The data frame returned by ffi_read_LI7810
is all data
from the raw LI-7810
file, except that TIMESTAMP
, TZ
(time zone of
the timestamps), SN
(serial number), and MODEL
columns have been added.
The analyzer data is basically a stream of measured greenhouse gas concentrations:
For these data to be useful, we need to associate them with metadata about the measurements: when they were started, how long they lasted, plot/treatment/collar information, etc.
# Accompanying metadata
md <- system.file("extdata/TG10-01087-metadata.csv", package = "fluxfinder")
metadat <- read.csv(md)
print(metadat)
#> Date Start_time Plot Obs_length
#> 1 2022-10-27 10:35:30 A 60
#> 2 2022-10-27 10:37:15 B 60
#> 3 2022-10-27 10:39:00 C 60
#> 4 2022-10-27 10:40:30 D 60
#> 5 2022-10-27 10:42:00 E 60
#> 6 2022-10-27 10:43:30 F 60
#> 7 2022-10-27 11:00:00 G 60
Important note: in this sample metadata, our
measurement identified is labeled Plot
, but this could be
named, and refer to, anything: bottle, sample, collar, etc. It’s simply
an identifier for this measurement, i.e. this row.
The ffi_metadata_match
function matches up the data with
metadata, using the TIMESTAMP
column that
ffi_read_LI7810
helpfully created when it read the data
file.
dat$metadat_row <- ffi_metadata_match(
data_timestamps = dat$TIMESTAMP,
start_dates = metadat$Date,
start_times = metadat$Start_time,
obs_lengths = metadat$Obs_length + 10) # 10 is expected dead band length
#> 1 entry had no timestamp matches!
# Note that ffi_metadata_match() warns us that one metadata row didn't match any data
# Based on the row match information, add a "Plot" column to the data
dat$Plot <- metadat$Plot[dat$metadat_row]
metadat$metadat_row <- seq_len(nrow(metadat))
# ...and plot
p <- ggplot(dat, aes(TIMESTAMP, CO2, color = Plot)) + geom_point()
print(p)
Some of these are clearly not correct–the measurement time seems to be shorter then 60 seconds for the C, D, and E plots:
In real life we’d want to correct the faulty metadata at its source. Here, we’ll just change the values programmatically and re-match:
metadat$Obs_length[3:5] <- c(30, 45, 45)
dat$metadat_row <- ffi_metadata_match(
data_timestamps = dat$TIMESTAMP,
start_dates = metadat$Date,
start_times = metadat$Start_time,
obs_lengths = metadat$Obs_length + 10)
#> 1 entry had no timestamp matches!
dat$Plot <- metadat$Plot[dat$metadat_row]
p %+% dat
That looks better!
We’d like our final units to be in µmol/m2/s, and so need to do some
unit conversion. (This can happen either before or after flux
computation, below.) The package provides ffi_ppm_to_umol()
and ffi_ppb_to_nmol()
functions that perform this
conversion using the Ideal Gas
Law.
dat$CO2_umol <- ffi_ppm_to_umol(dat$CO2,
volume = 0.1, # m3
temp = 24) # degrees C
#> Assuming atm = 101325 Pa
#> Using R = 8.31446261815324 m3 Pa K-1 mol-1
# See the message: because we didn't provide the 'atm' parameter,
# ffi_ppm_to_umol assumed standard pressure.
# Also normalize by ground area (0.16 m2 in this example)
dat$CO2_umol_m2 <- dat$CO2_umol / 0.16
Note that in the example above we’re using a constant system
volume and measurement ground area. If that’s not the case,
there should be a column in the metadata providing the changing values
(e.g. giving volume in m3) for each measurement. Then after
calling ffi_metadata_match()
, merge the data and metadata
and pass the appropriate column to ffi_ppm_to_umol()
.
Here’s an example:
# Let's say volume varies by measurement; this can happen if the chamber
# height changes depending on the ground vegetation in each plot
metadat$Volume <- c(0.1, 0.2, 0.1, 0.1, 0.3, 0.1, 0.1)
# Merge the data and metadata
dat_changing_vol <- merge(dat, metadat[c("Plot", "Volume")], by = "Plot", all.x = TRUE)
# Unit conversion as above, but using the changing volume information:
dat_changing_vol$CO2_umol <- ffi_ppm_to_umol(dat_changing_vol$CO2,
volume = dat_changing_vol$Volume,
temp = 24)
#> Assuming atm = 101325 Pa
#> Using R = 8.31446261815324 m3 Pa K-1 mol-1
# We still have constant ground area in this example
dat_changing_vol$CO2_umol_m2 <- dat_changing_vol$CO2_umol / 0.16
# Relative to the previous constant-volume example, our area-normalized
# amounts (µmol) have now increased for plots B and E because
# of their larger volumes:
aggregate(CO2_umol_m2 ~ Plot, data = dat, FUN = mean)
#> Plot CO2_umol_m2
#> 1 A 11855.81
#> 2 B 11908.15
#> 3 C 11787.88
#> 4 D 11776.09
#> 5 E 11974.30
#> 6 F 11976.25
aggregate(CO2_umol_m2 ~ Plot, data = dat_changing_vol, FUN = mean)
#> Plot CO2_umol_m2
#> 1 A 11855.81
#> 2 B 23816.30
#> 3 C 11787.88
#> 4 D 11776.09
#> 5 E 35922.89
#> 6 F 11976.25
The ffi_compute_fluxes
function provides a
general-purpose tool for computing fluxes from concentration time
series, as well as associated QA/QC information. It returns statistics
for four types of models: linear, robust linear,
polynomial, and HM81, an exponential model derived
from diffusion theory, following Hutchinson
and Mosier (1981).
Model statistics include Akaike information criterion
(AIC
), R squared (r.squared
), standard error
of the residuals (sigma
), and model p-value
(p.value
). For the robust linear regression only, a logical
value converged
is included; see the documentation for
MASS::rlm()
.
Flux (slope) statistics estimate and std.error;
For the robust linear regression model only, a logical value converged.
fluxes <- ffi_compute_fluxes(dat,
group_column = "Plot",
time_column = "TIMESTAMP",
gas_column = "CO2_umol_m2",
dead_band = 10)
#> NOTE: HM81_flux.estimate is not NA, implying nonlinear data
#> NOTE: HM81_flux.estimate is not NA, implying nonlinear data
#> NOTE: HM81_flux.estimate is not NA, implying nonlinear data
# By default, ffi_compute_fluxes returns a data.frame with one row per
# grouping variable value (i.e., per measurement). The first column is the
# group label; the second is the average value of the `time_column`;
# and the rest of the columns are fit statistics for a linear fit of
# concentration as a function of time, along with information about polynomial
# and robust-linear fits. See ?ffi_compute_fluxes for more details.
names(fluxes)
#> [1] "Plot" "TIMESTAMP" "HM81_AIC"
#> [4] "HM81_RMSE" "HM81_flux.estimate" "HM81_p.value"
#> [7] "HM81_r.squared" "lin_AIC" "lin_RMSE"
#> [10] "lin_flux.estimate" "lin_flux.std.error" "lin_int.estimate"
#> [13] "lin_int.std.error" "lin_p.value" "lin_r.squared"
#> [16] "poly_AIC" "poly_RMSE" "poly_r.squared"
#> [19] "rob_AIC" "rob_RMSE" "rob_converged"
#> [22] "rob_flux.estimate" "rob_flux.std.error" "TIMESTAMP_min"
#> [25] "TIMESTAMP_max"
# For clarity, print out only a subset of the columns
fluxes[c("Plot", "TIMESTAMP", "lin_r.squared", "lin_flux.estimate", "HM81_flux.estimate")]
#> Plot TIMESTAMP lin_r.squared lin_flux.estimate HM81_flux.estimate
#> 1 A 2022-10-27 10:36:15 0.9486012 4.862042 4.794940
#> 2 B 2022-10-27 10:37:54 0.9467074 4.024164 NA
#> 3 C 2022-10-27 10:39:24 0.6002940 3.967315 4.483758
#> 4 D 2022-10-27 10:41:02 0.9560099 6.463719 NA
#> 5 E 2022-10-27 10:42:32 0.9825172 6.680752 NA
#> 6 F 2022-10-27 10:43:54 0.9512708 7.231148 6.261085
Note that the fluxes
extract printed above has one row
per Plot
, the grouping variable; the mean
TIMESTAMP
of the group; model statistics such as
lin_r.squared
; and the flux estimate. The final column,
HM81_flux.estimate
is only numeric (i.e., not
NA
) when the data show evidence of a saturating curvature.
So in this case we might want to examine more carefully the data from
plots A, C, and F.
Plotting our computed fluxes:
ggplot(fluxes, aes(Plot, lin_flux.estimate, color = lin_r.squared)) +
geom_point() +
geom_linerange(aes(ymin = lin_flux.estimate - lin_flux.std.error,
ymax = lin_flux.estimate + lin_flux.std.error)) +
ylab("CO2 flux (µmol/m2/s)")
We might want to check whether the robust-linear slope diverges from the linear fit slope, suggesting influential outliers, or whether the polynomial R2 is much larger, potentially indicating curvature of the observations due to e.g. diffusion limitations.
ggplot(fluxes, aes(lin_flux.estimate, rob_flux.estimate, color = Plot)) +
geom_point() + geom_abline() + theme(legend.position = "none")
ggplot(fluxes, aes(lin_r.squared, poly_r.squared, color = Plot)) +
geom_point() + geom_abline() + theme(legend.position="none")
The plot C (green) data have more scatter, and thus lower
R2 values and higher uncertainty on the computed flux, but
there’s no strong evidence of nonlinearity or outlier problems (although
see note above about the HM81_estimate
field).
In our experience, static-chamber fluxes are frequently linear (Forbrich et al. 2010) even though molecular diffusion means that chamber feedbacks will inevitably lead to curvilinear (saturating) behavior over time (Pedersen et al. 2010).
Still, how does one use fluxfinder
to diagnose and work
with nonlinear data?
Let’s use R’s built-in Puromycin
dataset (simply because
it exhibits saturating behavior; it’s unrelated to greenhouse gases) as
an example:
ggplot(Puromycin, aes(conc, rate)) + geom_point() + geom_smooth(method = "lm")
#> `geom_smooth()` using formula = 'y ~ x'
Visually, we can see that a linear model will not be appropriate in this case.
ffi_compute_fluxes(Puromycin,
group_column = NULL,
time_column = "conc",
gas_column = "rate")
#> NOTE: HM81_flux.estimate is not NA, implying nonlinear data
#> conc HM81_AIC HM81_RMSE HM81_flux.estimate HM81_p.value HM81_r.squared
#> 1 0.3121739 200.9997 17.56084 246.1399 9.406055e-11 0.8696062
#> lin_AIC lin_RMSE lin_flux.estimate lin_flux.std.error lin_int.estimate
#> 1 223.7834 28.81711 105.398 16.9191 96.03154
#> lin_int.std.error lin_p.value lin_r.squared poly_AIC poly_RMSE
#> 1 7.780867 3.525876e-06 0.6488706 205.746 18.76394
#> poly_r.squared rob_AIC rob_RMSE rob_converged rob_flux.estimate
#> 1 0.8653059 223.8119 24.79301 TRUE 106.2137
#> rob_flux.std.error conc_min conc_max
#> 1 20.65072 0.02 1.1
From the diagnostics returned by ffi_compute_fluxes
:
HM81_flux.estimate
is not NA
, which only
occurs with saturating behavior;lin_AIC
(223.783) and rob_AIC
(223.812) Akaike
information criterion values are similar, so no indication of
influential outliers;lin_r.squared
(0.649) and
poly_r.squared
(0.865) values are very different,
suggesting a failure of the linear model;HM81_r.squared
(0.87) and HM81_AIC
(201) are considerably higher and lower, respectively, than the linear
model.All of these metrics point to a common conclusion: a linear model is
not appropriate for these data. If these were real data, we
should use the HM81_flux.estimate
value as our flux
estimate.
This vignette covered fluxfinder
basics: loading data
and metadata, matching the two, performing unit conversion, computing
fluxes, and some basic QA/QC. The test data we worked above could be fit
well by linear model, but for many reasons this might not always be
true; see the vignette on integrating with the gasfluxes
package for guidance on using more sophisticated model-fitting
routines.