Increasingly there are high frequency sensor data available for water quality data. There’s a need to join the sensor and discrete data by the closest time.
This article will discuss how to do that using the data.table
package.
Let’s look at site “01646500”, and a nearby site with a real-time nitrate-plus-nitrite sensor. Our goal is to get the discharge and nitrate-plus-nitrite sensor data aligned with the discrete water quality samples.
library(dataRetrieval)
site_uv <- "01646500"
site_qw <- "USGS-01646580"
pcode_uv <- "99133"
pcode_qw <- "00631"
start_date <- as.Date("2018-01-01")
end_date <- as.Date("2020-01-01")
qw_data <- readWQPqw(site_qw, pcode_qw,
startDate = start_date,
endDate = end_date)
uv_data <- readNWISuv(siteNumbers = site_uv,
parameterCd = c(pcode_uv, "00060"),
startDate = start_date,
endDate = end_date)
The sensor data (“uv” data) at this particular site has 2 columns of data that are important. The first task is to combine those columns. This is rather unique to this particular site and probably won’t need to be done generally.
library(dplyr)
uv_trim <- uv_data |>
select(uv_date = dateTime,
val1 = X_SUNA...Discontinued._99133_00000,
val2 = X_SUNA_99133_00000,
flow = X_00060_00000) |>
mutate(val_uv = if_else(is.na(val1), val2, val1)) |>
select(-val1, -val2)
uv_date | flow | val_uv |
---|---|---|
2018-01-01 05:00:00 | 1940 | 1.43 |
2018-01-01 05:15:00 | 1940 | 1.43 |
2018-01-01 05:30:00 | 1850 | 1.43 |
2018-01-01 05:45:00 | 1850 | 1.43 |
2018-01-01 06:00:00 | 1850 | 1.43 |
2018-01-01 06:15:00 | 1810 | 1.43 |
Next we’ll clean up the discrete water quality “qw” data to make it easy to follow in this tutorial.
qw_trim <- qw_data |>
filter(ActivityTypeCode == "Sample-Routine",
!is.na(ActivityStartDateTime)) |>
select(qw_date = ActivityStartDateTime,
val_qw = ResultMeasureValue,
det_txt = ResultDetectionConditionText)
qw_date | val_qw | det_txt |
---|---|---|
2018-01-04 15:30:00 | 1.39 | NA |
2018-01-15 15:15:00 | 1.73 | NA |
2018-02-01 15:30:00 | 1.66 | NA |
2018-02-11 19:00:00 | 1.33 | NA |
2018-02-13 17:30:00 | 1.53 | NA |
2018-02-15 16:00:00 | 1.65 | NA |
Finally, we’ll use the data.table
package to do a join to the nearest date. The code to do that is here:
library(data.table)
setDT(qw_trim)[, join_date := qw_date]
setDT(uv_trim)[, join_date := uv_date]
closest_dt <- uv_trim[qw_trim, on = .(join_date), roll = "nearest"]
closest_dt
is a data.table
object. It similar to a data.frame, but not identical. We can convert it to a data.frame and then use dplyr
commands. Note: the whole analysis can be done via data.table
, but most examples in dataRetrieval
have used dplyr
, which is why we bring it back to data.frame. dplyr
also has a join_by(closest())
option, but it is more complicated because you can only specify the closeness in either the forward or backwards direction (and we want either direction).
We can calculate “delta_time” - the difference in time between the uv and qw data. We’ll probably want to add a threshold that we don’t join values if they are too far apart in time. In this example, if the difference is greater than 24 hours, we’ll substitute NA
.
qw_closest <- data.frame(closest_dt) |>
mutate(delta_time = difftime(qw_date, uv_date,
units = "hours"),
val_uv = if_else(abs(as.numeric(delta_time)) >= 24, NA, val_uv)) |>
select(-join_date)
uv_date | flow | val_uv | qw_date | val_qw | det_txt | delta_time |
---|---|---|---|---|---|---|
2018-01-04 15:30:00 | 1850 | 1.43 | 2018-01-04 15:30:00 | 1.39 | NA | 0 hours |
2018-01-15 15:15:00 | 8960 | 1.84 | 2018-01-15 15:15:00 | 1.73 | NA | 0 hours |
2018-02-01 15:30:00 | 6000 | 1.67 | 2018-02-01 15:30:00 | 1.66 | NA | 0 hours |
2018-02-11 19:00:00 | 20800 | 1.29 | 2018-02-11 19:00:00 | 1.33 | NA | 0 hours |
2018-02-13 17:30:00 | 30700 | 1.63 | 2018-02-13 17:30:00 | 1.53 | NA | 0 hours |
2018-02-15 16:00:00 | 25300 | 1.68 | 2018-02-15 16:00:00 | 1.65 | NA | 0 hours |
Here are a few plots to show the applications of these joins:
library(ggplot2)
ggplot(data = qw_closest) +
geom_point(aes(x = val_uv, y = val_qw)) +
theme_bw() +
xlab("Sensor") +
ylab("Discrete")
ggplot(data = qw_closest) +
geom_point(aes(x = flow, y = val_qw)) +
theme_bw() +
xlab("Discharge") +
ylab("Concentration") +
scale_x_log10() +
scale_y_log10()