header_tag.html

Skip to contents

Uses either linear interpolation, spline fit, linear interpolation in the log scale, or fixed-interval smoothing on time series to fill missing data. This function gets used within the Daily functions if fill=TRUE. As a standalone function, the input can be data directly download from `dataRetrieval::read_waterdata_daily`.

Usage

fill_missing_daily(df, fill_type, maxgap = 21, value_col = "value",
  qualifier_col = "qualifier")

Arguments

df

Data frame with at least value and qualifier columns. The names of those columns are defined by value_col and qualifier_col. The data frame is expected to be a complete and uniform time series. This could be one row per day, or one row per X interval. The data in the value_col will be filled in with the assumption that the data is uniform and any missing data is set as `NA`.

fill_type

character to define what process to fill missing data. Options are "interpolation", "spline", or "tsSmooth". "interpolation" is linear interpolation from the `zoo::na.approx`. "spline" is a spline fit using `zoo::na.spline`. "tsSmooth" uses `stats::tsSmooth` which is fixed-interval smoothing on time series. "tsStruct" uses a structural time series models. "log_interp" is linear interpolation in the log space. Only used if fill is set to TRUE.

maxgap

Maximum number of NA days allowed for interpolating gaps. Default is 21. Only used if fill is set to TRUE.

value_col

Character, name of value column.

qualifier_col

Character, name of qualifier column.

Examples

Date <- seq(from = as.Date("2001/1/1"),
            to = as.Date("2002/1/2"),
            by = "day")
Qualifier <- rep("",367)
Q <- 2+sin(seq(from = 0, to = 2*pi, length.out = 367))
Q <- jitter(Q, factor = 500)
plot(Q, ylim = c(0, 3.2))

dataInput <- data.frame(time = Date,
                        value = Q,
                        qualifier = Qualifier)
# Remove some rows to test missing:
dataInput$value[4:5] <- NA
dataInput$value[10:20] <- NA

# Linear interpolation:
interp1 <- fill_missing_daily(df = dataInput,
                              fill_type = "interpolation")
plot(interp1$time[1:30],
     interp1$value[1:30],
     col = as.factor(interp1$qualifier[1:30]),
     type = "b", pch = 16, ylim = c(0, 3.2),
     main = "Linear Interpolation")


# Spline fit:
splin1 <- fill_missing_daily(dataInput,
                             fill_type = "spline")
plot(splin1$time[1:30], splin1$value[1:30],
     col = as.factor(splin1$qualifier[1:30]),
     main = "Spline Fit",
     type = "b", pch = 16, ylim = c(0, 3.2))


# Fixed-Interval Smoothing on Time Series:
df_smooth <- fill_missing_daily(dataInput,
                                fill_type = "tsSmooth")
plot(df_smooth$time[1:30],
     df_smooth$value[1:30],
     col = as.factor(df_smooth$qualifier[1:30]),
     main = "Fixed-interval smoothing on time series",
     type = "b", pch = 16, ylim = c(0, 3.2))


df_struct <- fill_missing_daily(dataInput,
                                fill_type = "tsStruct")
plot(df_struct$time[1:30],
     df_struct$value[1:30],
     col = as.factor(df_struct$qualifier[1:30]),
     main = "Fixed-interval on time series",
     type = "b", pch = 16, ylim = c(0, 3.2))


# Add a gap that is too big do deal with:
dataInput$value[200:255] <- NA

df_interp <- fill_missing_daily(dataInput,
                                fill_type = "interpolation")
plot(df_interp$time, df_interp$value,
     col = as.factor(df_interp$qualifier),
     main = "Linear Interpolation",
     type = "b", pch = 16, ylim = c(0, 3.2))

plot(df_interp$time[1:50], df_interp$value[1:50],
     col = as.factor(df_interp$qualifier[1:50]),
     main = "Linear Interpolation",
     type = "b", pch = 16, ylim = c(0, 3.2))


df_log_interp <- fill_missing_daily(dataInput,
                                    fill_type = "log_interp")
plot(df_log_interp$time, df_log_interp$value,
     col = as.factor(df_log_interp$qualifier),
     main = "Linear Interpolation in Log Scale",
     type = "b", pch = 16, ylim = c(0, 3.2))

plot(df_log_interp$time[1:50], df_log_interp$value[1:50],
     col = as.factor(df_log_interp$qualifier[1:50]),
     main = "Linear Interpolation in Log Scale",
     type = "b", pch = 16, ylim = c(0, 3.2))


df_spline <- fill_missing_daily(dataInput,
                                fill_type = "spline")
plot(df_spline$time[1:50], df_spline$value[1:50],
     col = as.factor(df_spline$qualifier[1:50]),
     main = "Spline Fit",
     type = "b", pch = 16, ylim = c(0, 3.2))


df_tsSmooth <- fill_missing_daily(dataInput,
                                  fill_type = "tsSmooth")
plot(df_tsSmooth$time[1:50], df_tsSmooth$value[1:50],
     col = as.factor(df_tsSmooth$qualifier[1:50]),
     main = "tsSmooth",
     type = "b", pch = 16, ylim = c(0, 3.2))


df_tsStruct <- fill_missing_daily(dataInput,
                                  fill_type = "tsStruct")
plot(df_tsStruct$time[1:50], df_tsStruct$value[1:50],
     col = as.factor(df_tsStruct$qualifier[1:50]),
     main = "tsStruct",
     type = "b", pch = 16, ylim = c(0, 3.2))