header_tag.html

Skip to contents

The read_waterdata_stats_por and read_waterdata_stats_daterange functions replace the legacy readNWISstat function. This replacement is necessary because the legacy API service that readNWISstat will be decommissioned and replaced with a modernized API. This new API has two available endpoints, observationNormals and observationIntervals, that appear similar at first yet have important differences we want to highlight here.

library(dataRetrieval)

site1 <- "USGS-05428500"

Fetching day-of-year and month-of-year statistics

Day-of-year and month-of-year statistics aggregate observations for the same calendar day or month across multiple years to describe typical seasonal conditions (e.g., all Januarys or all January 1sts). Consider the output below, where we request day-of-year discharge averages for January 1 and January 2. Note that the start_date and end_date are set in month-year format to describe the day-of-year range.

jan_por_mean <-
  read_waterdata_stats_por(
  monitoring_location_id = site1,
  parameter_code = "00060",
  computation = "arithmetic_mean",
  start_date = "01-01",
  end_date = "01-02"
)

jan_por_mean
#> Simple feature collection with 3 features and 18 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#>   parameter_code unit_of_measure            parent_time_series_id time_of_year
#> 1          00060          ft^3/s d499cef948c547a58bec047dcf3e330b        01-01
#> 2          00060          ft^3/s d499cef948c547a58bec047dcf3e330b        01-02
#> 3          00060          ft^3/s d499cef948c547a58bec047dcf3e330b           01
#>   time_of_year_type   value sample_count approval_status
#> 1       day_of_year 159.527           22        approved
#> 2       day_of_year 156.714           22        approved
#> 3     month_of_year 151.700           22        approved
#>                         computation_id     computation percentile
#> 1 11df8572-3584-4e4a-8c6d-33cca710b7ae arithmetic_mean         NA
#> 2 7708f0da-b51e-4d7e-8aa8-40ff0c84447b arithmetic_mean         NA
#> 3 d5ba7ec8-a41c-4d05-907c-5369cf900e79 arithmetic_mean         NA
#>   monitoring_location_id                        monitoring_location_name
#> 1          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#>   site_type site_type_code country_code state_code county_code
#> 1    Stream             ST           US         55         025
#> 2    Stream             ST           US         55         025
#> 3    Stream             ST           US         55         025
#>                     geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)

The first two rows show the average discharge values, aggregated across all January 1sts and 2nds in the site’s Period of Record (POR). But wait: what’s in that third row? Looking at the time_of_year and time_of_year_type columns, we see the third row represents the average discharge value aggregated across all Januarys. This illustrates one quirk of the modern statistics API: any time the start_date to end_date range overlaps with the first day of a month (e.g., "01-01"), we will get month-of-year as well as the day-of-year summary statistics. You can filter these rows out of the data if you don’t want them in downstream analyses:

jan_por_mean[jan_por_mean$time_of_year_type != "month_of_year",]
#> Simple feature collection with 2 features and 18 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#>   parameter_code unit_of_measure            parent_time_series_id time_of_year
#> 1          00060          ft^3/s d499cef948c547a58bec047dcf3e330b        01-01
#> 2          00060          ft^3/s d499cef948c547a58bec047dcf3e330b        01-02
#>   time_of_year_type   value sample_count approval_status
#> 1       day_of_year 159.527           22        approved
#> 2       day_of_year 156.714           22        approved
#>                         computation_id     computation percentile
#> 1 11df8572-3584-4e4a-8c6d-33cca710b7ae arithmetic_mean         NA
#> 2 7708f0da-b51e-4d7e-8aa8-40ff0c84447b arithmetic_mean         NA
#>   monitoring_location_id                        monitoring_location_name
#> 1          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#>   site_type site_type_code country_code state_code county_code
#> 1    Stream             ST           US         55         025
#> 2    Stream             ST           US         55         025
#>                     geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)

Before we go, let’s look at an example that illustrates the benefits of the statistics API. In the example below, we pull all day-of-year discharge percentiles for our site. Keep in mind that doing so without the statistics API would require us to download the entire daily period of record for this site and hand-compute these percentiles ourselves, a time- and resource-intensive process indeed.

For demonstration, we filter to the output to the January 1 day-of-year percentiles, which include a set of percentiles commonly used on WDFN webpages (e.g., Wisconsin water conditions).

library(dplyr)
library(tidyr)
library(ggplot2)

full_por_percentiles <-
  read_waterdata_stats_por(
    monitoring_location_id = site1,
    parameter_code = "00060",
    computation = c("minimum", "maximum", "median", "percentile"),
    start_date = "01-01",
    end_date = "12-31"
  )

full_por_percentiles |>
  filter(time_of_year == "01-01" & time_of_year_type == "day_of_year") |>
  arrange(percentile) |>
  select(
    time_of_year, computation, percentile, value
  )
#> Simple feature collection with 10 features and 4 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#>    time_of_year computation percentile  value                   geometry
#> 1         01-01     minimum          0  55.50 POINT (-89.36083 43.08944)
#> 2         01-01  percentile          5  56.46 POINT (-89.36083 43.08944)
#> 3         01-01  percentile         10  69.43 POINT (-89.36083 43.08944)
#> 4         01-01  percentile         25 104.75 POINT (-89.36083 43.08944)
#> 5         01-01      median         50 142.00 POINT (-89.36083 43.08944)
#> 6         01-01  percentile         50 142.00 POINT (-89.36083 43.08944)
#> 7         01-01  percentile         75 238.00 POINT (-89.36083 43.08944)
#> 8         01-01  percentile         90 268.90 POINT (-89.36083 43.08944)
#> 9         01-01  percentile         95 272.70 POINT (-89.36083 43.08944)
#> 10        01-01     maximum        100 273.00 POINT (-89.36083 43.08944)

After a bit of data manipulation, we can then visualize the percentiles as “ribbons” on a plot. The final visual shows the percentile bands as progressively darker ribbons, where the minima and maxima are shown as thin dashed curves and the median values as a solid gray curve.

doy_perc_bands_plt <-
  full_por_percentiles |>
  sf::st_drop_geometry() |>
  dplyr::filter(time_of_year_type == "day_of_year") |>
  select(time_of_year, percentile, value) |>
  distinct(time_of_year, percentile, .keep_all = TRUE) |>
  mutate(time_of_year = as.Date(time_of_year, format = "%m-%d")) |>
  pivot_wider(names_from = percentile, values_from = value) |>
  ggplot(aes(x = time_of_year)) +
  geom_ribbon(aes(ymin = `95`, ymax = `100`), fill = "#292f6b") +
  geom_ribbon(aes(ymin = `90`, ymax = `95`), fill = "#5699c0") +
  geom_ribbon(aes(ymin = `75`, ymax = `90`), fill = "#aacee0") +
  geom_ribbon(aes(ymin = `25`, ymax = `75`), fill = "#e9e9e9") +
  geom_ribbon(aes(ymin = `10`, ymax = `25`), fill = "#ebd6ab") +
  geom_ribbon(aes(ymin = `5`, ymax = `10`), fill = "#dcb668") +
  geom_ribbon(aes(ymin = `0`, ymax = `5`), fill = "#8f4f1f") +
  geom_line(aes(y = `50`), linewidth = .2, color = "black") +
  scale_x_date(date_labels = "%b", date_breaks = "1 month") +
  labs(
    x = "Month–day",
    y = "Value",
    title = "Day-of-year percentile bands"
  ) +
  theme_minimal()

doy_perc_bands_plt

Finally, let’s overlay a specific year’s daily mean data onto the plot:

daily_2024 <- read_waterdata_daily(
  monitoring_location_id = site1,
  parameter_code = "00060",
  statistic_id = "00003", 
  time = "2024-01-01/2024-12-31") |>
  mutate(time_of_year = as.Date(format(time, "%m-%d"), format = "%m-%d"))

doy_perc_bands_plt +
  geom_line(data = daily_2024, aes(y = value), linewidth = 1) +
  labs(
    title = "2024 daily discharge + percentile bands"
  )

Fetching monthly and annual statistics within a date range

Unlike day- or month-of-year statistics, which summarize typical seasonal conditions across many years, the read_waterdata_stats_daterange() function returns summaries for specific months and years within a requested date range. Consider the example below where we fetch the average discharge value for the month of January, 2024. Notice that the start_date and end_date arguments are given in YYYY-MM-DD format.

jan_daterange_mean <-
  read_waterdata_stats_daterange(
  monitoring_location_id = site1,
  parameter_code = "00060",
  computation = "arithmetic_mean",
  start_date = "2024-01-01",
  end_date = "2024-01-31"
)

jan_daterange_mean
#> Simple feature collection with 3 features and 19 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#>   parameter_code unit_of_measure            parent_time_series_id start_date
#> 1          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 2          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 3          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#>     end_date interval_type   value sample_count approval_status
#> 1 2024-01-31         month 112.323           31        approved
#> 2 2024-12-31 calendar_year 215.382          366        approved
#> 3 2024-09-30    water_year 197.312          366        approved
#>                         computation_id     computation percentile
#> 1 38fdd032-9573-4c2e-b28e-385943c12834 arithmetic_mean         NA
#> 2 11df75ea-2902-4791-875e-270ea20c0445 arithmetic_mean         NA
#> 3 49b24e8c-5880-403f-92b5-92d26c7de734 arithmetic_mean         NA
#>   monitoring_location_id                        monitoring_location_name
#> 1          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#>   site_type site_type_code country_code state_code county_code
#> 1    Stream             ST           US         55         025
#> 2    Stream             ST           US         55         025
#> 3    Stream             ST           US         55         025
#>                     geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)

Instead of time_of_year and time_of_year_type columns, this output contains start_date, end_date, and interval_type columns representing the daterange over which the average was calculated. The first row shows the average January, 2025 discharge was about 219 cubic feet per second. We again have extra rows: the second row contains the calendar year 2025 average and the third contains the water year 2025 average.

Annual statistics will be returned for any calendar/water years than intersect with the specified date range. Consider the example below, where the start_date to end_date range is only 93 days yet happens to intersect with calendar and water years 2023 and 2024. We see this reflected in output containing monthly averages for September, 2023 through January, 2024 as well as 2023 and 2024 calendar and water year averages.

multiyear_daterange_mean <-
  read_waterdata_stats_daterange(
  monitoring_location_id = site1,
  parameter_code = "00060",
  computation = "arithmetic_mean",
  start_date = "2023-09-30",
  end_date = "2024-01-01"
)

multiyear_daterange_mean
#> Simple feature collection with 9 features and 19 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#>   parameter_code unit_of_measure            parent_time_series_id start_date
#> 1          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-09-01
#> 2          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#> 3          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-11-01
#> 4          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-12-01
#> 5          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 6          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-01-01
#> 7          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2024-01-01
#> 8          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2022-10-01
#> 9          00060          ft^3/s d499cef948c547a58bec047dcf3e330b 2023-10-01
#>     end_date interval_type   value sample_count approval_status
#> 1 2023-09-30         month  61.873           30        approved
#> 2 2023-10-31         month  86.339           31        approved
#> 3 2023-11-30         month 210.767           30        approved
#> 4 2023-12-31         month 222.806           31        approved
#> 5 2024-01-31         month 112.323           31        approved
#> 6 2023-12-31 calendar_year 147.649          365        approved
#> 7 2024-12-31 calendar_year 215.382          366        approved
#> 8 2023-09-30    water_year 156.417          365        approved
#> 9 2024-09-30    water_year 197.312          366        approved
#>                         computation_id     computation percentile
#> 1 63491354-f0d6-4c1a-a5bb-50973ab0003d arithmetic_mean         NA
#> 2 bb45e0c9-34a1-44ea-85f6-34f2132b8311 arithmetic_mean         NA
#> 3 0f17b134-6bf2-40be-bba7-3479b48f8b15 arithmetic_mean         NA
#> 4 1bfc9ae0-036c-45f0-b8ea-b5f7203a8121 arithmetic_mean         NA
#> 5 38fdd032-9573-4c2e-b28e-385943c12834 arithmetic_mean         NA
#> 6 b14cfc65-635e-4eff-bba1-9145ccd8e67b arithmetic_mean         NA
#> 7 11df75ea-2902-4791-875e-270ea20c0445 arithmetic_mean         NA
#> 8 d42c25dc-bb81-4752-98ed-0381d633da9d arithmetic_mean         NA
#> 9 49b24e8c-5880-403f-92b5-92d26c7de734 arithmetic_mean         NA
#>   monitoring_location_id                        monitoring_location_name
#> 1          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 2          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 3          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 4          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 5          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 6          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 7          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 8          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#> 9          USGS-05428500 YAHARA RIVER AT EAST MAIN STREET AT MADISON, WI
#>   site_type site_type_code country_code state_code county_code
#> 1    Stream             ST           US         55         025
#> 2    Stream             ST           US         55         025
#> 3    Stream             ST           US         55         025
#> 4    Stream             ST           US         55         025
#> 5    Stream             ST           US         55         025
#> 6    Stream             ST           US         55         025
#> 7    Stream             ST           US         55         025
#> 8    Stream             ST           US         55         025
#> 9    Stream             ST           US         55         025
#>                     geometry
#> 1 POINT (-89.36083 43.08944)
#> 2 POINT (-89.36083 43.08944)
#> 3 POINT (-89.36083 43.08944)
#> 4 POINT (-89.36083 43.08944)
#> 5 POINT (-89.36083 43.08944)
#> 6 POINT (-89.36083 43.08944)
#> 7 POINT (-89.36083 43.08944)
#> 8 POINT (-89.36083 43.08944)
#> 9 POINT (-89.36083 43.08944)

Before we move on, consider the following example where we create a Monthly mean statistics table similar to what you’d find in the Water Year Summaries.

monthly_means <-
  read_waterdata_stats_daterange(
    monitoring_location_id = site1,
    parameter_code = "00060",
    computation = "arithmetic_mean"
  ) |>
  dplyr::filter(interval_type == "month") |>
  sf::st_drop_geometry()

monthly_means |>
  # filter(start_date >= "2004-10-01" & start_date < "2025-09-01") |>
  mutate(
    Month = lubridate::month(start_date, label = TRUE),
    # reorder based on WY
    Month = factor(Month, month.abb[c(10:12, 1:9)]),
    water_year = lubridate::year(as.Date(start_date) + months(3))
  ) |>
  arrange(Month) |>
  group_by(Month) |>
  summarize(
    `Mean` = round(mean(value), 0),
    `Max (WY)` = paste0(
      round(max(value), 0),
      " (",
      water_year[which.max(value)],
      ")"
    ),
    `Min (WY)` = paste0(
      round(min(value), 0),
      " (",
      water_year[which.min(value)],
      ")"
    ),
    .groups = "drop"
  ) |>
  t() |>
  as.data.frame() |>
  knitr::kable(
    col.names = NULL
  )
Month Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Mean 171 263 213 152 138 151 205 210 161 161 128 120
Max (WY) 444 (2020) 467 (2020) 327 (2020) 257 (2020) 254 (2019) 390 (2019) 393 (2019) 383 (2008) 449 (2008) 457 (2024) 304 (2024) 377 (2018)
Min (WY) 8 (2006) 171 (2013) 116 (2004) 47 (2006) 33 (2006) 39 (2022) 42 (2015) 113 (2015) 19 (2023) 19 (2005) 21 (2005) 13 (2005)

Statistics API quirks

The sample_count column indicates that there were 22 observations used to compute these averages, suggesting the site’s period of record is (at least) 22 years long. We can verify this using the timeseries-metadata API endpoint, passing in the “parent” timeseries ID used to compute the mean:

read_waterdata_ts_meta(time_series_id =  unique(jan_por_mean$parent_time_series_id))
#> Requesting:
#> https://api.waterdata.usgs.gov/ogcapi/v0/collections/time-series-metadata/items?f=json&lang=en-US&limit=50000&id=d499cef948c547a58bec047dcf3e330b
#> Remaining requests this hour:2791
#> Simple feature collection with 1 feature and 21 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -89.36083 ymin: 43.08944 xmax: -89.36083 ymax: 43.08944
#> Geodetic CRS:  WGS 84
#> # A tibble: 1 × 22
#>   unit_of_measure parameter_name parameter_code statistic_id
#>   <chr>           <chr>          <chr>          <chr>       
#> 1 ft^3/s          Discharge      00060          00003       
#> # ℹ 18 more variables: hydrologic_unit_code <chr>, state_name <chr>,
#> #   last_modified <dttm>, begin <dttm>, end <dttm>, begin_utc <dttm>,
#> #   end_utc <dttm>, computation_period_identifier <chr>,
#> #   computation_identifier <chr>, thresholds <chr>,
#> #   sublocation_identifier <chr>, primary <chr>, monitoring_location_id <chr>,
#> #   web_description <chr>, parameter_description <chr>,
#> #   parent_time_series_id <chr>, geometry <POINT [°]>, time_series_id <chr>

From this output, we see the begin and end dates of the POR at indeed at least 22 years apart.