header_tag.html

Skip to contents

As we bid adieu to the NWIS discrete water quality services, we welcome a new web service offering, the U.S. Geological Survey (USGS) Water Data for the Nation samples data service.

In this tutorial, we’ll walk through how to use the new services in the following ways:

  • Retrieving data from a known USGS site.

  • Retrieving data using different geographical filters.

  • Discovering available data.

For more information on the new data formats, see: https://waterdata.usgs.gov/blog/qw-to-wqx3-mapping

New USGS data access

This is a modern access point for USGS discrete water quality data. The USGS is planning to modernize all web services in the near future. For each of these updates, dataRetrieval will create a new function to access the new services. To access these services on a web browser, go to https://waterdata.usgs.gov/download-samples/.

New Features

Style

New functions will use a “snake case”, such as “read_USGS_samples”. Older functions use camel case, such as “readNWISdv”. The difference is the underscore between words. This should be a handy way to tell the difference between newer modern data access, and the older traditional functions.

Structure

Historically, we used allowed users to customize their queries via the ... argument structure. With ..., users needed to know the exact names of query parameters before using the function. Now, the new functions will include ALL possible arguments that the web service APIs support. This will allow users to use tab-autocompletes (available in RStudio and other IDEs). Users will need to understand that it is not advisable to specify all of these parameters. The systems can get bogged down with redundant query parameters. We expect this will be easier for users, but it might take some time to smooth out the documentation and test usability. There may be additional consequences, such as users won’t be able to build up argument lists to pass into the function.

Dependencies

Under the hood, dataRetrieval changed the dependency from httr to httr2. httr2 is the modern R package for web requests that is actively developed/maintained. As we develop functions for the modern USGS web services, we’ll continue to explore updating package dependencies.

Developmental workflow

CRAN-stable documentation will be available on the GitHub pages: https://doi-usgs.github.io/dataRetrieval/

Development documentation will be available on GitLab pages: https://water.code-pages.usgs.gov/dataRetrieval

Development of dataRetrieval will happen on a git branch called “develop” on GitHub. The “develop” branch will only move to the “main” branch when we submit to CRAN, unless there are bug fixes that pertain to the CRAN release. The “develop” branch WILL change frequently, and there are no promises of future behavior. Users must accept that they are using those functions at their own risk. If you willing to accept this risk, the installation instructions are:

library(remotes)

install_github("DOI-USGS/dataRetrieval",
               ref = "develop")

Discrete USGS data workflow

Alright, let’s test out this new service! Here is a link to the user interface: https://waterdata.usgs.gov/download-samples/.

And here is a link to the web service documentation: https://api.waterdata.usgs.gov/samples-data/docs\

Retrieving data from a known site

Let’s say we have a USGS site. We can check the data available at that site using summarize_USGS_samples like this:

library(dataRetrieval)
site <- "USGS-04183500"

data_at_site <- summarize_USGS_samples(monitoringLocationIdentifier = site)
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/summary/USGS-04183500?mimeType=text%2Fcsv

Explore the results:

We see there’s 1174 filtered phosphorus values available. Note that if we ask for a simple characteristic = “Phosphorus”, we’d get back both filtered and unfiltered, which might not be appropriate to mix together in an analysis. “characteristicUserSupplied” allows us to query by a very specific set of data. It is similar to a long-form USGS parameter code.

To get that data, use the read_USGS_samples function:

user_char <- "Phosphorus as phosphorus, water, unfiltered"
phos_data <- read_USGS_samples(monitoringLocationIdentifier = site,
                        characteristicUserSupplied = user_char)
## Function in development, use at your own risk.
## No profile specified, defaulting to 'fullphyschem'
## Possible values are: 
## fullphyschem, basicphyschem, fullbio, basicbio, narrow, resultdetectionquantitationlimit, labsampleprep, count
## GET:https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?mimeType=text%2Fcsv&monitoringLocationIdentifier=USGS-04183500&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered

Inspecting phos_data, there are 185 columns (!). That is all of the possible fields because the default dataProfile is “Full physical chemical”.

Instead of using the “Full physical chemical” profile, we could ask for the “Narrow” profile:

phos_narrow <- read_USGS_samples(monitoringLocationIdentifier = site,
                        characteristicUserSupplied = user_char,
                        dataProfile = "narrow")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/results/narrow?mimeType=text%2Fcsv&monitoringLocationIdentifier=USGS-04183500&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■  
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■■■

We can do a simple plot to check the data:

library(ggplot2)
wrap_text <- function(x, width = 30, collapse = "\n"){
  new_text <- paste(strwrap(x, 
                width = width),
                collapse = collapse)
  return(new_text)
}

ggplot(data = phos_narrow) +
  geom_point(aes(x = Activity_StartDateTime,
                 y = Result_Measure)) +
  theme_bw() +
  ggtitle(unique(phos_narrow$Location_Name)) +
  xlab("Date") +
  ylab(wrap_text(unique(phos_narrow$Result_CharacteristicUserSupplied)))

Return data types

There are 2 arguments that dictate what kind of data is returned: dataType and dataProfile. The “dataType” argument defines what kind of data comes back, and the “dataProfile” defines what columns from that type come back.

The possibilities are (which match the documentation here):

  • dataType = “results”
    • dataProfile: “fullphyschem”, “basicphyschem”, “fullbio”, “basicbio”, “narrow”, “resultdetectionquantitationlimit”, “labsampleprep”, “count”
  • dataType = “locations”
    • dataProfile: “site”, “count”
  • dataType = “activities”
    • dataProfile: “sampact”, “actmetric”, “actgroup”, “count”
  • dataType = “Projects”
    • dataProfile: “project”, “projectmonitoringlocationweight”
  • dataType = “organizations”
    • dataProfile: “organization” and “count”

Geographical filters

Let’s say we don’t know a USGS site number, but we do have an area of interest. Here are the different geographic filters available. We’ll use characteristicUserSupplied == “Phosphorus as phosphorus, water, unfiltered” to limit the sites returned, and dataType = “locations” to limit the data returned. That means we’ll just be asking for what sites measured “Phosphorus as phosphorus, water, unfiltered”, but not actually getting those result values.

Bounding Box

North and south are latitude values; east and west are longitude values. A vector of 4 (west, south, east, north) is expected.

bbox <- c(-90.8, 44.2, -89.9, 45.0)
user_char <- "Phosphorus as phosphorus, water, unfiltered"

bbox_sites <- read_USGS_samples(boundingBox = bbox,
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&boundingBox=-90.8,44.2,-89.9,45.0

Hydrologic Unit Codes (HUCs)

Hydrologic Unit Codes (HUCs) identify physical areas within the US that drain to a certain portion of the stream network. This filter accepts values containing 2, 4, 6, 8, 10 or 12 digits.

huc_sites <- read_USGS_samples(hydrologicUnit = "070700",
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&hydrologicUnit=070700&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered

Distance from a point

Location latitude (pointLocationLatitude) and longitude (pointLocationLongitude), and the radius (pointLocationWithinMiles) are required for this geographic filter:

point_sites <- read_USGS_samples(pointLocationLatitude = 43.074680,
                                 pointLocationLongitude = -89.428054,
                                 pointLocationWithinMiles = 20,
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&pointLocationLatitude=43.07468&pointLocationLongitude=-89.42805&pointLocationWithinMiles=20
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■  
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■■■

countyFips

County query parameter. To get a list of available counties, run check_param("counties"). The “Fips” can be created using the function countyCdLookup.

dane_county <- countyCdLookup("WI", "Dane", 
                              outputType = "fips")

county_sites <- read_USGS_samples(countyFips = dane_county,
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&countyFips=US%3A55%3A025

stateFips

State query parameter. To get a list of available state fips, run check_param("states"). The “fips” can be created using the function stateCdLookup.

state_fip <- stateCdLookup("WI", outputType = "fips")

state_sites <- read_USGS_samples(stateFips = state_fip,
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&stateFips=US%3A55

Additional Query Parameters

Additional parameters can be included to limit the results coming back from a request.

siteTypeCode

Site type code query parameter.

site_type_info <- check_param("sitetype")
site_type_info$typeCode
##  [1] "GL"      "WE"      "LA"      "LA-EX"  
##  [5] "LA-OU"   "LA-PLY"  "LA-SNK"  "LA-SH"  
##  [9] "LA-SR"   "LA-VOL"  "AT"      "ES"     
## [13] "OC"      "OC-CO"   "LK"      "ST"     
## [17] "ST-CA"   "ST-DCH"  "ST-TS"   "SP"     
## [21] "GW"      "GW-CR"   "GW-EX"   "GW-HZ"  
## [25] "GW-IW"   "GW-MW"   "GW-TH"   "SB"     
## [29] "SB-CV"   "SB-GWD"  "SB-TSM"  "SB-UZ"  
## [33] "FA"      "FA-AWL"  "FA-CI"   "FA-CS"  
## [37] "FA-DV"   "FA-FON"  "FA-GC"   "FA-HP"  
## [41] "FA-QC"   "FA-LF"   "FA-OF"   "FA-PV"  
## [45] "FA-SPS"  "FA-STS"  "FA-TEP"  "FA-WIW" 
## [49] "FA-SEW"  "FA-WWD"  "FA-WWTP" "FA-WDS" 
## [53] "FA-WTP"  "FA-WU"   "AG"      "AS"     
## [57] "AW"      "SS"

siteTypeName

Site type name query parameter.

site_type_info$typeLongName
##  [1] "Glacier"                              
##  [2] "Wetland"                              
##  [3] "Land"                                 
##  [4] "Excavation"                           
##  [5] "Outcrop"                              
##  [6] "Playa"                                
##  [7] "Sinkhole"                             
##  [8] "Soil hole"                            
##  [9] "Shore"                                
## [10] "Volcanic vent"                        
## [11] "Atmosphere"                           
## [12] "Estuary"                              
## [13] "Ocean"                                
## [14] "Coastal"                              
## [15] "Lake, Reservoir, Impoundment"         
## [16] "Stream"                               
## [17] "Canal"                                
## [18] "Ditch"                                
## [19] "Tidal stream"                         
## [20] "Spring"                               
## [21] "Well"                                 
## [22] "Collector or Ranney type well"        
## [23] "Extensometer well"                    
## [24] "Hyporheic-zone well"                  
## [25] "Interconnected wells"                 
## [26] "Multiple wells"                       
## [27] "Test hole not completed as a well"    
## [28] "Subsurface"                           
## [29] "Cave"                                 
## [30] "Groundwater drain"                    
## [31] "Tunnel, shaft, or mine"               
## [32] "Unsaturated zone"                     
## [33] "Facility"                             
## [34] "Animal waste lagoon"                  
## [35] "Cistern"                              
## [36] "Combined sewer"                       
## [37] "Diversion"                            
## [38] "Field, Pasture, Orchard, or Nursery"  
## [39] "Golf course"                          
## [40] "Hydroelectric plant"                  
## [41] "Laboratory or sample-preparation area"
## [42] "Landfill"                             
## [43] "Outfall"                              
## [44] "Pavement"                             
## [45] "Septic system"                        
## [46] "Storm sewer"                          
## [47] "Thermoelectric plant"                 
## [48] "Waste injection well"                 
## [49] "Wastewater sewer"                     
## [50] "Wastewater land application"          
## [51] "Wastewater-treatment plant"           
## [52] "Water-distribution system"            
## [53] "Water-supply treatment plant"         
## [54] "Water-use establishment"              
## [55] "Aggregate groundwater use"            
## [56] "Aggregate surface-water-use"          
## [57] "Aggregate water-use establishment"    
## [58] "Specific Source"

activityMediaName

Sample media refers to the environmental medium that was sampled or analyzed.

media_info <- check_param("samplemedia")
media_info$activityMedia
## [1] "Air"               "Biological tissue"
## [3] "Other"             "Sediment"         
## [5] "Soil"              "Water"            
## [7] NA

characteristicGroup

Characteristic group is a broad category describing the sample. The options for this parameter generally follow the values described in the Water Quality Portal User Guide, but not always.

group_info <- check_param("characteristicgroup")
group_info$characteristicGroup
##  [1] "Biological"                   
##  [2] "Information"                  
##  [3] "Inorganics, Major, Metals"    
##  [4] "Inorganics, Major, Non-metals"
##  [5] "Inorganics, Minor, Metals"    
##  [6] "Inorganics, Minor, Non-metals"
##  [7] "Microbiological"              
##  [8] "Nutrient"                     
##  [9] "Organics, Other"              
## [10] "Organics, PCBs"               
## [11] "Organics, Pesticide"          
## [12] "Organics, PFAS"               
## [13] "Physical"                     
## [14] "Population/Community"         
## [15] "Radiochemical"                
## [16] "Sediment"                     
## [17] "Stable Isotopes"              
## [18] "Toxicity"

characteristic

Characteristic is a specific category describing the sample. See check_param("characteristics") for a full list, below is a small sample:

characteristic_info <- check_param("characteristics")
head(unique(characteristic_info$characteristicName))
## [1] "Hydroxy-amitriptyline, 10-" 
## [2] "1,1,1,2-Tetrachloroethane"  
## [3] "1,1,1-Trichloro-2-propanone"
## [4] "1,1,1-Trichloroethane"      
## [5] "1,1,2,2-Tetrachloroethane"  
## [6] "CFC-113"

characteristicUserSupplied

Observed property is the USGS term for the constituent sampled and the property name gives a detailed description of what was sampled. Observed Property is mapped to characteristicUserSupplied, and replaces the parameter name and pcode USGS previously used to describe discrete sample data. See check_param("observedproperty") for a full list, below is a small sample:

char_us <- check_param("observedproperty")
head(char_us$observedProperty)
## [1] "10-Hydroxy-amitriptyline, water, filtered, recoverable"               
## [2] "1,1,1,2-Tetrachloroethane, bed sediment (dry mass basis), recoverable"
## [3] "1,1,1,2-Tetrachloroethane, soil (dry mass basis), recoverable"        
## [4] "1,1,1,2-Tetrachloroethane, water, unfiltered, recoverable"            
## [5] "1,1,1-Trichloro-2-propanone, water, filtered, recoverable"            
## [6] "1,1,1-Trichloro-2-propanone, water, unfiltered, EPA method 551.1"

usgsPCode

USGS parameter code. See check_param("characteristics") for a full list, below is a small sample:

characteristic_info <- check_param("characteristics")
head(unique(characteristic_info$parameterCode))
## [1] "67995" "52417" "62235" "30089" "77562"
## [6] "51330"

projectIdentifier:

Project identifier query parameter. This information would be needed from prior project information.

recordIdentifierUserSupplied:

Record identifier, user supplied identifier. This information would be needed from the data supplier.

activityStartDate: Lower and Upper

Specify one or both of these fields to filter on the activity start date. The service will return records with dates earlier than the value entered for activityStartDateUpper and/or later than the value entered for activityStartDateLower. The logic is inclusive, i.e. it will also return records that match either date. Can be an R Date object, or a string with format YYYY-MM-DD.

For instance, let’s grab Wisconsin sites that measured phosphorus in October or November of 2024:

state_sites_recent <- read_USGS_samples(stateFips = state_fip,
                                characteristicUserSupplied = user_char,
                                dataType = "locations",
                                activityStartDateLower = "2024-10-01",
                                activityStartDateUpper = "2024-12-01",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&stateFips=US%3A55&activityStartDateLower=2024-10-01&activityStartDateUpper=2024-12-01

Many fewer sites than the original Wisconsin map:

Data Discovery

The above examples showed how to find sites within a geographic filter. We can use a few additional query parameters. As an example, let’s look for the same phosphorus, in Dane County, WI, but limited to streams:

dane_county <- countyCdLookup("WI", "Dane")

county_lake_sites <- read_USGS_samples(countyFips = dane_county,
                                characteristicUserSupplied = user_char,
                                siteTypeName = "Lake, Reservoir, Impoundment",
                                dataType = "locations",
                                dataProfile = "site")
## Function in development, use at your own risk.
## GET:https://api.waterdata.usgs.gov/samples-data/locations/site?mimeType=text%2Fcsv&characteristicUserSupplied=Phosphorus%20as%20phosphorus%2C%20water%2C%20unfiltered&siteTypeName=Lake%2C%20Reservoir%2C%20Impoundment&countyFips=US%3A55%3A025
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■  
## Waiting 2s for throttling delay ■■■■■■■■■■■■■■■■■

There are only 18 lake sites measuring phosphorus in Dane County, WI. We can get a summary of the data at each site using the summarize_USGS_samples function. This function only accepts 1 site at a time:

all_data <- data.frame()

for(i in county_lake_sites$Location_Identifier){
  avail_i <- summarize_USGS_samples(monitoringLocationIdentifier = i)
  all_data <- avail_i |> 
    filter(characteristicUserSupplied == user_char) |> 
    bind_rows(all_data)
}

Let’s see what’s available:

This table can help narrow down which specific sites to ask for the data. Maybe you need sites with recent data, maybe you need sites with lots of data, maybe 1 measurement is enough.