
Introducing read_waterdata_statistics_*
Source:vignettes/daily_data_statistics.Rmd
daily_data_statistics.RmdThe read_waterdata_stats_por and
read_waterdata_stats_daterange functions replace the legacy
readNWISstat function. This replacement is necessary
because the legacy API service that readNWISstat will be
decommissioned and replaced with a modernized
API. This new API has two available endpoints,
observationNormals and observationIntervals,
that appear similar at first yet have important differences we want to
highlight here.
library(dataRetrieval)
site1 <- "USGS-05428500"Fetching day-of-year and month-of-year statistics
Day-of-year and month-of-year statistics aggregate observations for
the same calendar day or month across multiple years to describe typical
seasonal conditions (e.g., all Januarys or all January 1sts). Consider
the output below, where we request day-of-year discharge averages for
January 1 and January 2. Note that the start_date and
end_date are set in month-year format to
describe the day-of-year range.
jan_por_mean <-
read_waterdata_stats_por(
monitoring_location_id = site1,
parameter_code = "00060",
computation = "arithmetic_mean",
start_date = "01-01",
end_date = "01-02"
)
jan_por_mean
#> Simple feature collection with 3 features and 18 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> parameter_code unit_of_measure parent_time_series_id time_of_year
#> 1 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 01-01
#> 2 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 01-02
#> 3 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 01
#> time_of_year_type value sample_count approval_status
#> 1 day_of_year 159.527 22 approved
#> 2 day_of_year 156.714 22 approved
#> 3 month_of_year 151.700 22 approved
#> computation_id computation percentile
#> 1 11df8572-3584-4e4a-8c6d-33cca710b7ae arithmetic_mean NA
#> 2 7708f0da-b51e-4d7e-8aa8-40ff0c84447b arithmetic_mean NA
#> 3 d5ba7ec8-a41c-4d05-907c-5369cf900e79 arithmetic_mean NA
#> monitoring_location_id monitoring_location_name
#> 1 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> site_type site_type_code country_code state_code county_code
#> 1 Stream ST US 55 025
#> 2 Stream ST US 55 025
#> 3 Stream ST US 55 025
#> geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)The first two rows show the average discharge values, aggregated
across all January 1sts and 2nds in the site’s Period of Record (POR).
But wait: what’s in that third row? Looking at the
time_of_year and time_of_year_type columns, we
see the third row represents the average discharge value aggregated
across all Januarys. This illustrates one quirk of the modern
statistics API: any time the start_date to
end_date range overlaps with the first day of a month
(e.g., "01-01"), we will get month-of-year as well as the
day-of-year summary statistics. You can filter these rows out of the
data if you don’t want them in downstream analyses:
jan_por_mean[jan_por_mean$time_of_year_type != "month_of_year",]
#> Simple feature collection with 2 features and 18 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> parameter_code unit_of_measure parent_time_series_id time_of_year
#> 1 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 01-01
#> 2 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 01-02
#> time_of_year_type value sample_count approval_status
#> 1 day_of_year 159.527 22 approved
#> 2 day_of_year 156.714 22 approved
#> computation_id computation percentile
#> 1 11df8572-3584-4e4a-8c6d-33cca710b7ae arithmetic_mean NA
#> 2 7708f0da-b51e-4d7e-8aa8-40ff0c84447b arithmetic_mean NA
#> monitoring_location_id monitoring_location_name
#> 1 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> site_type site_type_code country_code state_code county_code
#> 1 Stream ST US 55 025
#> 2 Stream ST US 55 025
#> geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)Before we go, let’s look at an example that illustrates the benefits of the statistics API. In the example below, we pull all day-of-year discharge percentiles for our site. Keep in mind that doing so without the statistics API would require us to download the entire daily period of record for this site and hand-compute these percentiles ourselves, a time- and resource-intensive process indeed.
For demonstration, we filter to the output to the January 1 day-of-year percentiles, which include a set of percentiles commonly used on WDFN webpages (e.g., Wisconsin water conditions).
library(dplyr)
library(tidyr)
library(ggplot2)
full_por_percentiles <-
read_waterdata_stats_por(
monitoring_location_id = site1,
parameter_code = "00060",
computation = c("minimum", "maximum", "median", "percentile"),
start_date = "01-01",
end_date = "12-31"
)
full_por_percentiles |>
filter(time_of_year == "01-01" & time_of_year_type == "day_of_year") |>
arrange(percentile) |>
select(
time_of_year, computation, percentile, value
)
#> Simple feature collection with 10 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> time_of_year computation percentile value geometry
#> 1 01-01 minimum 0 55.50 POINT (-89.36083 43.08944)
#> 2 01-01 percentile 5 56.46 POINT (-89.36083 43.08944)
#> 3 01-01 percentile 10 69.43 POINT (-89.36083 43.08944)
#> 4 01-01 percentile 25 104.75 POINT (-89.36083 43.08944)
#> 5 01-01 median 50 142.00 POINT (-89.36083 43.08944)
#> 6 01-01 percentile 50 142.00 POINT (-89.36083 43.08944)
#> 7 01-01 percentile 75 238.00 POINT (-89.36083 43.08944)
#> 8 01-01 percentile 90 268.90 POINT (-89.36083 43.08944)
#> 9 01-01 percentile 95 272.70 POINT (-89.36083 43.08944)
#> 10 01-01 maximum 100 273.00 POINT (-89.36083 43.08944)After a bit of data manipulation, we can then visualize the percentiles as “ribbons” on a plot. The final visual shows the percentile bands as progressively darker ribbons, where the minima and maxima are shown as thin dashed curves and the median values as a solid gray curve.
doy_perc_bands_plt <-
full_por_percentiles |>
sf::st_drop_geometry() |>
dplyr::filter(time_of_year_type == "day_of_year") |>
select(time_of_year, percentile, value) |>
distinct(time_of_year, percentile, .keep_all = TRUE) |>
mutate(time_of_year = as.Date(time_of_year, format = "%m-%d")) |>
pivot_wider(names_from = percentile, values_from = value) |>
ggplot(aes(x = time_of_year)) +
geom_ribbon(aes(ymin = `95`, ymax = `100`), fill = "#292f6b") +
geom_ribbon(aes(ymin = `90`, ymax = `95`), fill = "#5699c0") +
geom_ribbon(aes(ymin = `75`, ymax = `90`), fill = "#aacee0") +
geom_ribbon(aes(ymin = `25`, ymax = `75`), fill = "#e9e9e9") +
geom_ribbon(aes(ymin = `10`, ymax = `25`), fill = "#ebd6ab") +
geom_ribbon(aes(ymin = `5`, ymax = `10`), fill = "#dcb668") +
geom_ribbon(aes(ymin = `0`, ymax = `5`), fill = "#8f4f1f") +
geom_line(aes(y = `50`), linewidth = .2, color = "black") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
labs(
x = "Month–day",
y = "Value",
title = "Day-of-year percentile bands"
) +
theme_minimal()
doy_perc_bands_plt
Finally, let’s overlay a specific year’s daily mean data onto the plot:
daily_2024 <- read_waterdata_daily(
monitoring_location_id = site1,
parameter_code = "00060",
statistic_id = "00003",
time = "2024-01-01/2024-12-31") |>
mutate(time_of_year = as.Date(format(time, "%m-%d"), format = "%m-%d"))
doy_perc_bands_plt +
geom_line(data = daily_2024, aes(y = value), linewidth = 1) +
labs(
title = "2024 daily discharge + percentile bands"
)
Fetching monthly and annual statistics within a date range
Unlike day- or month-of-year statistics, which summarize typical
seasonal conditions across many years, the
read_waterdata_stats_daterange() function returns summaries
for specific months and years within a requested date range. Consider
the example below where we fetch the average discharge value for the
month of January, 2024. Notice that the start_date and
end_date arguments are given in YYYY-MM-DD
format.
jan_daterange_mean <-
read_waterdata_stats_daterange(
monitoring_location_id = site1,
parameter_code = "00060",
computation = "arithmetic_mean",
start_date = "2024-01-01",
end_date = "2024-01-31"
)
jan_daterange_mean
#> Simple feature collection with 3 features and 19 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> parameter_code unit_of_measure parent_time_series_id start_date
#> 1 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 2 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 3 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#> end_date interval_type value sample_count approval_status
#> 1 2024-01-31 month 112.323 31 approved
#> 2 2024-12-31 calendar_year 215.382 366 approved
#> 3 2024-09-30 water_year 197.312 366 approved
#> computation_id computation percentile
#> 1 38fdd032-9573-4c2e-b28e-385943c12834 arithmetic_mean NA
#> 2 11df75ea-2902-4791-875e-270ea20c0445 arithmetic_mean NA
#> 3 49b24e8c-5880-403f-92b5-92d26c7de734 arithmetic_mean NA
#> monitoring_location_id monitoring_location_name
#> 1 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> site_type site_type_code country_code state_code county_code
#> 1 Stream ST US 55 025
#> 2 Stream ST US 55 025
#> 3 Stream ST US 55 025
#> geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)Instead of time_of_year and
time_of_year_type columns, this output contains
start_date, end_date, and
interval_type columns representing the daterange over which
the average was calculated. The first row shows the average January,
2025 discharge was about 219 cubic feet per second. We again have extra
rows: the second row contains the calendar year 2025
average and the third contains the water year 2025
average.
Annual statistics will be returned for any calendar/water years than
intersect with the specified date range. Consider the example below,
where the start_date to end_date range is only
93 days yet happens to intersect with calendar and
water years 2023 and 2024. We see this reflected in output containing
monthly averages for September, 2023 through January, 2024 as well as
2023 and 2024 calendar and water year averages.
multiyear_daterange_mean <-
read_waterdata_stats_daterange(
monitoring_location_id = site1,
parameter_code = "00060",
computation = "arithmetic_mean",
start_date = "2023-09-30",
end_date = "2024-01-01"
)
multiyear_daterange_mean
#> Simple feature collection with 9 features and 19 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> parameter_code unit_of_measure parent_time_series_id start_date
#> 1 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-09-01
#> 2 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#> 3 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-11-01
#> 4 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-12-01
#> 5 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 6 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-01-01
#> 7 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 8 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2022-10-01
#> 9 00060 ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#> end_date interval_type value sample_count approval_status
#> 1 2023-09-30 month 61.873 30 approved
#> 2 2023-10-31 month 86.339 31 approved
#> 3 2023-11-30 month 210.767 30 approved
#> 4 2023-12-31 month 222.806 31 approved
#> 5 2024-01-31 month 112.323 31 approved
#> 6 2023-12-31 calendar_year 147.649 365 approved
#> 7 2024-12-31 calendar_year 215.382 366 approved
#> 8 2023-09-30 water_year 156.417 365 approved
#> 9 2024-09-30 water_year 197.312 366 approved
#> computation_id computation percentile
#> 1 63491354-f0d6-4c1a-a5bb-50973ab0003d arithmetic_mean NA
#> 2 bb45e0c9-34a1-44ea-85f6-34f2132b8311 arithmetic_mean NA
#> 3 0f17b134-6bf2-40be-bba7-3479b48f8b15 arithmetic_mean NA
#> 4 1bfc9ae0-036c-45f0-b8ea-b5f7203a8121 arithmetic_mean NA
#> 5 38fdd032-9573-4c2e-b28e-385943c12834 arithmetic_mean NA
#> 6 b14cfc65-635e-4eff-bba1-9145ccd8e67b arithmetic_mean NA
#> 7 11df75ea-2902-4791-875e-270ea20c0445 arithmetic_mean NA
#> 8 d42c25dc-bb81-4752-98ed-0381d633da9d arithmetic_mean NA
#> 9 49b24e8c-5880-403f-92b5-92d26c7de734 arithmetic_mean NA
#> monitoring_location_id monitoring_location_name
#> 1 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 4 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 5 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 6 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 7 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 8 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 9 USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> site_type site_type_code country_code state_code county_code
#> 1 Stream ST US 55 025
#> 2 Stream ST US 55 025
#> 3 Stream ST US 55 025
#> 4 Stream ST US 55 025
#> 5 Stream ST US 55 025
#> 6 Stream ST US 55 025
#> 7 Stream ST US 55 025
#> 8 Stream ST US 55 025
#> 9 Stream ST US 55 025
#> geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)
#> 4 POINT (-89.36083 43.08944)
#> 5 POINT (-89.36083 43.08944)
#> 6 POINT (-89.36083 43.08944)
#> 7 POINT (-89.36083 43.08944)
#> 8 POINT (-89.36083 43.08944)
#> 9 POINT (-89.36083 43.08944)Before we move on, consider the following example where we create a Monthly mean statistics table similar to what you’d find in the Water Year Summaries.
monthly_means <-
read_waterdata_stats_daterange(
monitoring_location_id = site1,
parameter_code = "00060",
computation = "arithmetic_mean"
) |>
dplyr::filter(interval_type == "month") |>
sf::st_drop_geometry()
monthly_means |>
# filter(start_date >= "2004-10-01" & start_date < "2025-09-01") |>
mutate(
Month = lubridate::month(start_date, label = TRUE),
# reorder based on WY
Month = factor(Month, month.abb[c(10:12, 1:9)]),
water_year = lubridate::year(as.Date(start_date) + months(3))
) |>
arrange(Month) |>
group_by(Month) |>
summarize(
`Mean` = round(mean(value), 0),
`Max (WY)` = paste0(
round(max(value), 0),
" (",
water_year[which.max(value)],
")"
),
`Min (WY)` = paste0(
round(min(value), 0),
" (",
water_year[which.min(value)],
")"
),
.groups = "drop"
) |>
t() |>
as.data.frame() |>
knitr::kable(
col.names = NULL
)| Month | Oct | Nov | Dec | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep |
| Mean | 171 | 263 | 213 | 152 | 138 | 151 | 205 | 210 | 161 | 161 | 128 | 120 |
| Max (WY) | 444 (2020) | 467 (2020) | 327 (2020) | 257 (2020) | 254 (2019) | 390 (2019) | 393 (2019) | 383 (2008) | 449 (2008) | 457 (2024) | 304 (2024) | 377 (2018) |
| Min (WY) | 8 (2006) | 171 (2013) | 116 (2004) | 47 (2006) | 33 (2006) | 39 (2022) | 42 (2015) | 113 (2015) | 19 (2023) | 19 (2005) | 21 (2005) | 13 (2005) |
Statistics API quirks
The sample_count column indicates that there were 22
observations used to compute these averages, suggesting the site’s
period of record is (at least) 22 years long. We can verify this using
the timeseries-metadata API endpoint, passing in the “parent” timeseries
ID used to compute the mean:
read_waterdata_ts_meta(time_series_id = unique(jan_por_mean$parent_time_series_id))
#> Requesting:
#> https://api.waterdata.usgs.gov/ogcapi/v0/collections/time-series-metadata/items?f=json&lang=en-US&limit=50000&id=d499cef948c547a58bec047dcf3e330b
#> Remaining requests this hour:2791
#> Simple feature collection with 1 feature and 21 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS: WGS 84
#> # A tibble: 1 × 22
#> unit_of_measure parameter_name parameter_code statistic_id
#> <chr> <chr> <chr> <chr>
#> 1 ft^3/s Discharge 00060 00003
#> # ℹ 18 more variables: hydrologic_unit_code <chr>, state_name <chr>,
#> # last_modified <dttm>, begin <dttm>, end <dttm>, begin_utc <dttm>,
#> # end_utc <dttm>, computation_period_identifier <chr>,
#> # computation_identifier <chr>, thresholds <chr>,
#> # sublocation_identifier <chr>, primary <chr>, monitoring_location_id <chr>,
#> # web_description <chr>, parameter_description <chr>,
#> # parent_time_series_id <chr>, geometry <POINT [°]>, time_series_id <chr>From this output, we see the begin and end dates of the POR at indeed at least 22 years apart.