License

Copyright (c) 2020 Geosyntec Consultants, Inc.  Mozilla Public License Version 2.0

This software is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

Description

This notebook uses spatial predictors and outfall monitoring data to develop predictive linear regression relationships. This represents the first step in the watershed regression for the stormwater heatmap. The second step is a censored bayesian model, which adds some capabilities that simple linear models do not have. This first step is necessary to select the most predictive parameters in an efficient way.

Here are the steps in the code below:

  1. Code set up
  2. Get and prepare monitoring data
  3. Merge spatial data with monitoring data
  4. Remove mulitcolinear predictors
  5. Peform model selection
  6. Select the best model (these results are then used in the Bayesian model in step 2)

Setup

clear workspace and load libraries. (code not shown in html file )

Get and prepare monitoring data

The stormwater outfall data is available from the Department of Ecology at https://data.wa.gov/Natural-Resources-Environment/Municipal-Stormwater-Permit-Outfall-Data/d958-q2ci.

A .csv file is saved in WatershedRegression/data/S8_data.csv

all.S8.data <- read.csv("data/S8_data.csv",
                        stringsAsFactors = FALSE )

#filter out rejected data
all.S8.data <- (filter(all.S8.data,!result_data_qualifier %in% 'REJ'))

#filter out replicates 
all.S8.data <- (filter(all.S8.data,!sample_replicate_flag %in% 'Y'))

#change nondetect warnings to detects
warnings <- all.S8.data$nondetect_flag == "WARNING"
all.S8.data$nondetect_flag[warnings] <- FALSE 

#Change NA to detect
all.S8.data$nondetect_flag[is.na(all.S8.data$nondetect_flag)] <- FALSE

#Change season to factor 
all.S8.data$season <- as.factor(all.S8.data$season)

The chunk below makes a list of parameters we are interested in.

#Select Parameters
params <- c('Zinc - Water - Total',
 'Copper - Water - Total',
 'Nitrite-Nitrate - Water - Dissolved',
    'Lead - Water - Total',
 'Total Phosphorus - Water - Total',
 'Total Suspended Solids - Water - Total',
 'Total Phthalate - Water - Total',
'Total PAH - Water - Total',
#'Chrysene - Water - Total',
'CPAH - Water - Total',
'HPAH - Water - Total'
#'Total Kjeldahl Nitrogen - Water - Total',
#'Total PCB - Water - Total'
)

#save a list of all the parameters in case we want to use mor. 
params.all <- data.frame(unique(all.S8.data$parameter))
s8data <- all.S8.data 
kable(params,col.names = "Constituents of concern")
Constituents of concern
Zinc - Water - Total
Copper - Water - Total
Nitrite-Nitrate - Water - Dissolved
Lead - Water - Total
Total Phosphorus - Water - Total
Total Suspended Solids - Water - Total
Total Phthalate - Water - Total
Total PAH - Water - Total
CPAH - Water - Total
HPAH - Water - Total
#

Clean up extracted data and rename columns:

s8data <- all.S8.data %>% 

  dplyr::select(
    study_name,
    location_id,parameter,
    type,
    season,
    new_result_value,
    nondetect_flag,
    study_id,
    access_id,
    field_collection_end_date,
    field_collection_start_date,
    type)


#rename some columns
colnames(s8data)[colnames(s8data) == "location_id"] <- "Location"
colnames(s8data)[colnames(s8data) == "new_result_value"] <-
  "concentration"
s8data$nondetect_flag <- as.logical(s8data$nondetect_flag)
s8data$concentration <- as.numeric(s8data$concentration)

Check for outliers

Set up a jitter plot of all the data to look for outliers:

#make a function for scatter plots 
scatter_cocs <- function(df.coc,title) {
 p <- ggplot(df.coc, aes(1, concentration)) + geom_jitter() + labs(
  title = title,
  subtitle = "Data collected 2009-2013",
  caption =
    " Data source: Ecology, 2015",
  x = "Observations"
  #y = y.title
)  
p + facet_wrap( ~ parameter, scales = 'free')+theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) 
}

scatter_cocs(s8data[which(s8data$parameter %in% params),],'All Observations')

outliers are apprent for TSS, TP, and no2-no3. Remove these.

#remove and replot 

outlierParams <- c("Total Suspended Solids - Water - Total", "Total Phosphorus - Water - Total", "Nitrite-Nitrate - Water - Dissolved")

#This removes the highest values 
outlierVals <-
  top_n(group_by(s8data[which(s8data$parameter %in% outlierParams), ], parameter), 1, concentration)$concentration

s8data <- s8data %>%
  group_by(parameter) %>%
  slice(which(!(
    parameter %in% outlierParams & concentration %in% outlierVals
  )))

scatter_cocs(s8data[which(s8data$parameter %in% params),],'All Observations - Outliers Removed')

Looks better. Move on to the next Chunk.

Get Spatial data and merge

#Spatial predcitors have been extracted and saved as a csv file. 
#spatial_data<-read_csv("data/spatialPredictors_4_13.csv", col_types = cols(X1 = col_skip()))
spatial_data <- read_csv("C:/Users/cnilsen/Google Drive/repos/spatialPredictors.csv")
#RES and COM are compositional data. Change to a ratio
spatial_data$LU_ratio = spatial_data$COM/spatial_data$RES 
spatial_data <- dplyr::select(spatial_data, -c(RES,COM,.geo))
#merge spatial predictors with monitoring data 
s8data.wPredictors <<- merge(s8data, spatial_data)%>% 
  dplyr::select(-c(depSplusN))

kable(head(s8data.wPredictors),caption = "S8 Monitoring Data")
S8 Monitoring Data
Location study_name parameter type season concentration nondetect_flag study_id access_id field_collection_end_date field_collection_start_date system:index impervious logPopulation nighttime_lights pm25 rev_logTraffic roadDensity LU_ratio
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit Copper - Water - Total COM 4 22.800 FALSE WAR044501_S8D 36149 11/2/2011 11/2/2011 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit LPAH - Water - Total COM 1 0.060 FALSE WAR044501_S8D 102225 NA 2/12/2011 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit Dichlobenil - Water - Total COM 4 0.047 TRUE WAR044501_S8D 35675 11/1/2010 11/1/2010 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit Cadmium - Water - Total COM 4 0.065 FALSE WAR044501_S8D 33689 10/19/2012 10/18/2012 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit Fecal coliform - Water - Total COM 4 530.000 FALSE WAR044501_S8D 39755 12/8/2010 12/8/2010 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
KICCOMS8D_OUT King County Phase I Municipal Stormwater Permit Bis(2-ethylhexyl) phthalate - Water - Total COM 4 1.270 TRUE WAR044501_S8D 38768 11/23/2012 11/23/2012 0000000000000000000e 0.127 -0.624 -1.97 -0.847 -1.34 0.154 -0.653
getBaseFormula <- function(df.coc) {
  #Function to make a formula for prediction
  predictors <- df.coc %>%
    select_if(is.numeric) %>%
    dplyr::select(-c(concentration, access_id)) %>%
    colnames()
  return(as.formula(paste(
    "(concentration) ~",  (paste((predictors),  collapse = " + ")), " + (1|Location)"
  )))
  
}

Perform Linear Regression

Below, we perform linear regression one parameter at a time.

Functions for regression

We set up a series of functions to make the analysis easier.

Data dictionary for these functions

coc: constituent of concern
df.coc: filtered dataframe
model: basic linear mixed model
log_model: linear-mixed model with log-transformed concentrations
model_with_seasonality: linear-mixed model with seasonality as an added factor
model_with_seasonality_log: linear-mixed model with seasonality as an added factor with log-transformed concentrations

Plotting function

Basic function to plot results by location

plot_s8 <- function(coc) {
    df.coc <- (base::subset(s8data.wPredictors,
                parameter == coc))
  #plot the data to inspect 
  plot <- ggplot(data=(df.coc))+geom_point(aes(x=Location,y=concentration),alpha = 0.5, color = "#0088a3")+scale_y_log10()+labs(
      y = "Concentration (µg/L)",
      x = "Location",
      title =  "Measured Concentrations",
      subtitle = coc ) + theme( axis.text.x = element_text(angle=90, size = 9))
  return(plot)
}

Function to remove mulitcolinear predictors in base model

To address multicolinearity, we calculate the variance inflation factor (VIF) and iteratively remove paramters with the highest VIF. We keep removing parameters one at a time until all VIF values are below 5.0.

# #Calculate variance-inflation factors. 
# 
# calc_vif_base <- function(coc) {
#   df.coc <- (base::subset(s8data.wPredictors,
#                 parameter == coc))
#   base_formula <- getBaseFormula(df.coc)#returns a formula with all predictors
#   model.1 <<- lmer(base_formula, data = df.coc, na.action = na.omit)
#   v <- sort(vif(model.1),decreasing=TRUE)
#   return(v)
#   #kable(v,caption="Variance Inflation Factors - not filtered") #display the vif of the dataset
# }

Function to iteratively remove mulitcolinear predictors

Same as above, expect multicolinear predictors are removed.

## *Check VIF
 
check_vif <- function(coc) {
  df.coc <- (base::subset(s8data.wPredictors,
                parameter == coc))
  base_formula <- getBaseFormula(df.coc)#returns a formula with all predictors
  model.1 <- lmer(base_formula, data = df.coc, na.action = na.omit) #make into a lmer object 
  v <- sort(vif(model.1),decreasing=TRUE)
  
  #if the VIF of the highest ranked predictor is >10 then iteratively remove
  model_object <- model.1 #start with model object as the base model (all predictors included)
  
  
  for (i in 1:20) {
    interim_v <- sort(vif(model_object), decreasing = TRUE)
    if (max(interim_v) < 10) {
      break
    }
    predictor_to_drop = as.name(names(interim_v)[which(interim_v == max(interim_v))])
    model_object <-
      stats::update(model_object, paste(".~ . -", predictor_to_drop))
    }
  
  m1Terms <- (labels(terms(model.1)))
  m2Terms <- labels(terms(model_object))
  
  #compare the terms to get a list of the dropped terms
  droppedTerms <- setdiff(m1Terms, m2Terms)
  
  #make a list of selected predictors
  predictors <- m2Terms#colnames(model.frame(model_object)) 

  
  #filter df.coc to remove dropped terms.
  df.coc = dplyr::select(df.coc, -(droppedTerms))
  return(list("vif" = interim_v,"dropped" = droppedTerms,"predictors" = predictors))
  #kable(droppedTerms,caption = "These terms were dropped")
  
  #kable(interim_v,caption="Variance Inflation Factors - multicolinear factors dropped")
}

Function to perform stepwise selection and return a series of best models

## Stepwise Selection

#forward_selection(TRUE,coc,model_info$predictors)

# Extract the model that step found:

#perform forward selection on model parameters. First for non-transformed data, then for log-transformed data
forward_selection <- function(seasonal.bin, coc,predictors) {
  #seasonal.bin = binary (T/F) if seasonal model should be used
  library(lmerTest)
  #make this a lmer object 
  df.coc <- (base::subset(s8data.wPredictors,
                parameter == coc)) 
  model_object_formula <- as.formula(paste(
    "concentration ~",  (paste((predictors),  collapse = " + ")), " + (1|Location)"))
  
  model_object <- lmer(model_object_formula,data=df.coc)
  
  step.2 <-  lmerTest::step(model_object,reduce.random=FALSE,data=df.coc)
  step.2.log <- lmerTest::step(stats::update(model_object, log(concentration)~.))
  
  #extract the models 
  model.3 <- get_model(step.2)
  model.3.log <- get_model(step.2.log)
  
  #perform forward selection on model parameters, this time add seasonality . First for non-transformed data, then for log-transformed data
  step.4 <- lmerTest::step(stats::update(model_object,.~.+season))
  step.4.log <- lmerTest::step(stats::update(model_object,log(concentration) ~.+season))
  model.4 <- get_model(step.4)
  model.4.log <- get_model(step.4.log)
  
  #get formulas 
  model.3.formula <- as.formula(model.3@call$formula)
  model.3.log.formula <- as.formula(model.3.log@call$formula)
  model.4.formula <- as.formula(model.4@call$formula)
  model.4.log.formula <- as.formula(model.4.log@call$formula)
  
  #detach lmer test and remove the models. Keep the formulas. 
  detach("package:lmerTest", unload=TRUE)
  rm(model.3,model.3.log,model.4,model.4.log)
  
  #use lmer for performing modeling
  #calculate base model 
  df.coc <- (base::subset(s8data.wPredictors,
                parameter == coc)) 
  model.base <- lmer(model_object_formula,data=df.coc)
  model <- lmer(model.3.formula,data=df.coc)
  log_model<-  lmer(model.3.log.formula,data=df.coc)
  
  if(seasonal.bin) {
      #if seasonal model switch is on, calc seasonal models
    model_with_seasonality <- lmer(model.4.formula,data=df.coc)
    model_with_seasonality_log <- lmer(model.4.log.formula,data=df.coc)
      #add to list 
    modelList <- c(model,log_model,model_with_seasonality,model_with_seasonality_log)
    
    }
    else {
    modelList <-c(model,log_model)
    modelLables <-c('linear','log-linear') 
    }
   

 
 return(modelList)#,show.aic = TRUE)# = FALSE, title=coc, dv.labels = modelLables) 
  # #make a table of coefficients 
  # tab_model(modelList,
  #   model_log,
  #   model_with_seasonality,
  #   model.with_seasonality_log,
  #   show.aic = TRUE,auto.label = FALSE, title=coc, dv.labels = c('linear','log-linear','linear seasonal','log-linear seasonal'))#file=paste0("results/",coc,".html"))
}

Run through model selection functions

Now we use the functions from above to return model tables. We wrap a helper function to call the others.

results = c()
plots = c()
vifs = c()
tabs = c()
for (i in 1:length(params)){
coc = params[i]
df.coc <- (base::subset(s8data.wPredictors,
                parameter == coc)) 
plots[[coc]] <- plot_s8(coc)
model_info <- check_vif(coc)
vifs[[coc]] <- model_info$vif

results[[coc]] = forward_selection(TRUE,coc,model_info$predictors)}
modelLabels <-c('linear','log-linear','linear seasonal','log-linear seasonal') 

Wrapper function for displaying results for individual cocs.

show_results <- function(j){
#get parameter label 
lab = names(results)[j]

#plot observed 
print(plots[[j]])

#show variance inflation factors 
print(kable(vifs[[j]],caption = paste("Variance inflation factors",lab),col.names = c("vif")))

#show raw summary of results 
models <- results[[j]]
for (k in 1:2){
  print((summary(models[[k]])))
  #plot(models[[k]],,main=paste(lab,"\n","Resididuals"))
  print(qqmath(models[[k]],main=paste(lab,"\n",modelLabels[k],"\n","QQ plot of resididuals")))
}
  

#formatted table of results 
tabs = (tab_model(models,title=lab,show.aic = TRUE,dv.labels =modelLabels,viewer=FALSE ))
 return(tabs)

}

Now we can specify one parameters at a time and display the results. Generally, the model with the lowest AIC is used in Step 2 (Bayesian modeling).

Zinc

(show_results(1))

## 
## 
## Table: Variance inflation factors Zinc - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.50
## impervious          4.24
## rev_logTraffic      1.91
## logPopulation       1.89
## LU_ratio            1.29
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ impervious + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 5088
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.846 -0.265 -0.083  0.055  9.197 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept)  1810     42.5   
##  Residual             12497    111.8   
## Number of obs: 414, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    83.80      12.82    6.54
## impervious     48.05       9.32    5.16
## 
## Correlation of Fixed Effects:
##            (Intr)
## impervious -0.071

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ impervious + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 914
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.859 -0.524 -0.038  0.470  3.300 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.494    0.703   
##  Residual             0.472    0.687   
## Number of obs: 414, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    3.639      0.192   18.99
## impervious     0.716      0.139    5.15
## 
## Correlation of Fixed Effects:
##            (Intr)
## impervious -0.040
Zinc - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 83.80 58.68 – 108.93 <0.001 3.64 3.26 – 4.01 <0.001 68.71 40.42 – 96.99 <0.001 3.62 3.24 – 3.99 <0.001
impervious 48.05 29.80 – 66.31 <0.001 0.72 0.44 – 0.99 <0.001 46.99 29.18 – 64.79 <0.001 0.71 0.44 – 0.98 <0.001
season [2] 4.22 -25.91 – 34.34 0.784 -0.07 -0.26 – 0.13 0.500
season [3] 114.53 78.88 – 150.17 <0.001 0.39 0.16 – 0.62 0.001
season [4] 7.93 -16.74 – 32.60 0.529 0.00 -0.15 – 0.16 0.976
Random Effects
σ2 12496.95 0.47 11366.26 0.46
τ00 1809.55 Location 0.49 Location 1739.85 Location 0.48 Location
ICC 0.13 0.51 0.13 0.51
N 14 Location 14 Location 14 Location 14 Location
Observations 414 414 414 414
Marginal R2 / Conditional R2 0.229 / 0.326 0.494 / 0.753 0.290 / 0.384 0.503 / 0.756
AIC 5095.699 921.495 5038.647 921.976

Copper

show_results(2)

## 
## 
## Table: Variance inflation factors Copper - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.45
## impervious          4.24
## rev_logTraffic      1.92
## logPopulation       1.88
## LU_ratio            1.30
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 3139
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.131 -0.470 -0.116  0.176  5.632 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 44.7     6.69    
##  Residual             70.5     8.40    
## Number of obs: 438, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)       67.25       9.63    6.99
## rev_logTraffic    34.36       5.99    5.74
## 
## Correlation of Fixed Effects:
##             (Intr)
## rev_lgTrffc 0.982

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ impervious + rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 931
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.392 -0.535  0.001  0.552  3.057 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.273    0.523   
##  Residual             0.444    0.667   
## Number of obs: 438, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)       5.408      0.803    6.74
## impervious        0.297      0.111    2.67
## rev_logTraffic    2.201      0.499    4.41
## 
## Correlation of Fixed Effects:
##             (Intr) imprvs
## impervious  -0.344       
## rev_lgTrffc  0.984 -0.344
Copper - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 67.25 48.38 – 86.13 <0.001 5.41 3.84 – 6.98 <0.001 67.29 48.41 – 86.17 <0.001 5.41 3.84 – 6.98 <0.001
rev_logTraffic 34.36 22.63 – 46.10 <0.001 2.20 1.22 – 3.18 <0.001 34.58 22.86 – 46.31 <0.001 2.20 1.22 – 3.18 <0.001
impervious 0.30 0.08 – 0.51 0.007 0.30 0.08 – 0.51 0.007
season [2] -0.71 -2.93 – 1.50 0.528
season [3] 6.38 3.77 – 9.00 <0.001
season [4] -0.68 -2.53 – 1.17 0.470
Random Effects
σ2 70.54 0.44 66.16 0.44
τ00 44.71 Location 0.27 Location 44.74 Location 0.27 Location
ICC 0.39 0.38 0.40 0.38
N 14 Location 14 Location 14 Location 14 Location
Observations 438 438 438 438
Marginal R2 / Conditional R2 0.494 / 0.690 0.525 / 0.706 0.512 / 0.709 0.525 / 0.706
AIC 3146.801 940.604 3116.708 940.604

Nitrite-Nitrate

show_results(3)

## 
## 
## Table: Variance inflation factors Nitrite-Nitrate - Water - Dissolved
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.45
## impervious          4.23
## rev_logTraffic      1.92
## logPopulation       1.88
## LU_ratio            1.30
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ LU_ratio + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 6094
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -2.926 -0.400 -0.157  0.235  6.046 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 107632   328     
##  Residual             185073   430     
## Number of obs: 406, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    496.1       90.9    5.46
## LU_ratio        79.9       35.2    2.27
## 
## Correlation of Fixed Effects:
##          (Intr)
## LU_ratio 0.074

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ LU_ratio + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 962
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.200 -0.620 -0.005  0.590  3.236 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.428    0.654   
##  Residual             0.558    0.747   
## Number of obs: 406, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   5.7185     0.1799   31.79
## LU_ratio      0.1592     0.0698    2.28
## 
## Correlation of Fixed Effects:
##          (Intr)
## LU_ratio 0.073
Nitrite-Nitrate - Water - Dissolved
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 496.14 317.88 – 674.39 <0.001 5.72 5.37 – 6.07 <0.001 575.78 389.75 – 761.81 <0.001 5.81 5.45 – 6.16 <0.001
LU_ratio 79.94 10.88 – 149.01 0.023 0.16 0.02 – 0.30 0.023 80.83 12.29 – 149.37 0.021 0.16 0.03 – 0.29 0.019
season [2] -109.12 -231.14 – 12.89 0.080 -0.01 -0.22 – 0.19 0.885
season [3] 16.24 -130.58 – 163.05 0.828 0.48 0.24 – 0.73 <0.001
season [4] -165.25 -264.46 – -66.03 0.001 -0.37 -0.53 – -0.21 <0.001
Random Effects
σ2 185073.43 0.56 180488.75 0.50
τ00 107631.70 Location 0.43 Location 105950.38 Location 0.41 Location
ICC 0.37 0.43 0.37 0.45
N 14 Location 14 Location 14 Location 14 Location
Observations 406 406 406 406
Marginal R2 / Conditional R2 0.128 / 0.449 0.147 / 0.517 0.146 / 0.462 0.206 / 0.567
AIC 6102.003 969.788 6065.056 933.650

Lead

show_results(4)

## 
## 
## Table: Variance inflation factors Lead - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.49
## impervious          4.24
## rev_logTraffic      1.91
## logPopulation       1.88
## LU_ratio            1.29
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ impervious + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 3242
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.403 -0.437 -0.096  0.024  9.825 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept)  30.5     5.53   
##  Residual             132.6    11.51   
## Number of obs: 417, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)     9.10       1.60    5.69
## impervious      3.52       1.16    3.03
## 
## Correlation of Fixed Effects:
##            (Intr)
## impervious -0.061

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ impervious + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1083
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.777 -0.633 -0.018  0.581  3.307 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 1.21     1.10    
##  Residual             0.69     0.83    
## Number of obs: 417, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    1.198      0.297    4.03
## impervious     0.649      0.216    3.01
## 
## Correlation of Fixed Effects:
##            (Intr)
## impervious -0.037
Lead - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 9.10 5.97 – 12.24 <0.001 1.20 0.62 – 1.78 <0.001 9.10 5.97 – 12.24 <0.001 1.20 0.62 – 1.78 <0.001
impervious 3.52 1.24 – 5.79 0.002 0.65 0.23 – 1.07 0.003 3.52 1.24 – 5.79 0.002 0.65 0.23 – 1.07 0.003
Random Effects
σ2 132.56 0.69 132.56 0.69
τ00 30.53 Location 1.21 Location 30.53 Location 1.21 Location
ICC 0.19 0.64 0.19 0.64
N 14 Location 14 Location 14 Location 14 Location
Observations 417 417 417 417
Marginal R2 / Conditional R2 0.122 / 0.287 0.290 / 0.742 0.122 / 0.287 0.290 / 0.742
AIC 3250.500 1090.813 3250.500 1090.813

Total Phosphorus

show_results(5)

## 
## 
## Table: Variance inflation factors Total Phosphorus - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.43
## impervious          4.23
## rev_logTraffic      1.92
## logPopulation       1.87
## LU_ratio            1.30
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 5114
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.913 -0.530 -0.212  0.286  7.281 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 3158     56.2    
##  Residual             9811     99.0    
## Number of obs: 424, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)       478.1       82.9    5.77
## rev_logTraffic    214.0       51.6    4.15
## 
## Correlation of Fixed Effects:
##             (Intr)
## rev_lgTrffc 0.982

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 902
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.614 -0.661  0.002  0.682  3.335 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.221    0.47    
##  Residual             0.449    0.67    
## Number of obs: 424, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)       7.207      0.682   10.57
## rev_logTraffic    1.694      0.424    3.99
## 
## Correlation of Fixed Effects:
##             (Intr)
## rev_lgTrffc 0.982
Total Phosphorus - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 478.14 315.58 – 640.69 <0.001 7.21 5.87 – 8.54 <0.001 468.29 302.21 – 634.37 <0.001 7.08 5.74 – 8.42 <0.001
rev_logTraffic 214.01 112.86 – 315.16 <0.001 1.69 0.86 – 2.53 <0.001 214.28 111.18 – 317.38 <0.001 1.68 0.85 – 2.51 <0.001
season [2] -12.04 -38.11 – 14.02 0.365 -0.11 -0.28 – 0.07 0.225
season [3] 95.37 64.16 – 126.59 <0.001 0.77 0.56 – 0.98 <0.001
season [4] 8.70 -13.09 – 30.49 0.434 0.15 0.00 – 0.29 0.048
Random Effects
σ2 9810.78 0.45 8919.81 0.39
τ00 3158.19 Location 0.22 Location 3317.80 Location 0.22 Location
ICC 0.24 0.33 0.27 0.36
N 14 Location 14 Location 14 Location 14 Location
Observations 424 424 424 424
Marginal R2 / Conditional R2 0.251 / 0.434 0.289 / 0.524 0.298 / 0.488 0.349 / 0.586
AIC 5122.122 909.735 5065.815 863.509

Total Suspended Solids

show_results(6)

## 
## 
## Table: Variance inflation factors Total Suspended Solids - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.45
## impervious          4.23
## rev_logTraffic      1.93
## logPopulation       1.89
## LU_ratio            1.30
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 10252
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -2.073 -0.444 -0.265  0.046  6.520 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 6.83e+08 26128   
##  Residual             3.26e+09 57092   
## Number of obs: 415, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)      152053      39520    3.85
## rev_logTraffic    63942      24614    2.60
## 
## Correlation of Fixed Effects:
##             (Intr)
## rev_lgTrffc 0.982

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1233
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.152 -0.587  0.013  0.557  3.272 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.28     0.53    
##  Residual             1.06     1.03    
## Number of obs: 415, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)      13.198      0.790   16.72
## rev_logTraffic    1.946      0.492    3.96
## 
## Correlation of Fixed Effects:
##             (Intr)
## rev_lgTrffc 0.982
Total Suspended Solids - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 152052.47 74594.11 – 229510.83 <0.001 13.20 11.65 – 14.75 <0.001 152052.47 74594.11 – 229510.83 <0.001 13.20 11.65 – 14.75 <0.001
rev_logTraffic 63941.48 15699.51 – 112183.45 0.009 1.95 0.98 – 2.91 <0.001 63941.48 15699.51 – 112183.45 0.009 1.95 0.98 – 2.91 <0.001
Random Effects
σ2 3259447964.46 1.06 3259447964.46 1.06
τ00 682696191.57 Location 0.28 Location 682696191.57 Location 0.28 Location
ICC 0.17 0.21 0.17 0.21
N 14 Location 14 Location 14 Location 14 Location
Observations 415 415 415 415
Marginal R2 / Conditional R2 0.090 / 0.248 0.213 / 0.377 0.090 / 0.248 0.213 / 0.377
AIC 10259.857 1240.983 10259.857 1240.983

Total Phthalate

show_results(7)

## 
## 
## Table: Variance inflation factors Total Phthalate - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.47
## impervious          4.23
## rev_logTraffic      1.91
## logPopulation       1.88
## LU_ratio            1.30
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 2343
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.512 -0.337 -0.160  0.005  9.109 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept)  3.88    1.97    
##  Residual             16.20    4.02    
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    2.896      0.568    5.09

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1008
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.095 -0.717 -0.114  0.485  4.008 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.580    0.762   
##  Residual             0.604    0.777   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    0.418      0.208    2.01
Total Phthalate - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 2.90 1.78 – 4.01 <0.001 0.42 0.01 – 0.83 0.044 2.90 1.78 – 4.01 <0.001 0.42 0.01 – 0.83 0.044
Random Effects
σ2 16.20 0.60 16.20 0.60
τ00 3.88 Location 0.58 Location 3.88 Location 0.58 Location
ICC 0.19 0.49 0.19 0.49
N 14 Location 14 Location 14 Location 14 Location
Observations 412 412 412 412
Marginal R2 / Conditional R2 0.000 / 0.193 0.000 / 0.490 0.000 / 0.193 0.000 / 0.490
AIC 2349.218 1013.961 2349.218 1013.961

Total PAH

show_results(8)

## 
## 
## Table: Variance inflation factors Total PAH - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.48
## impervious          4.23
## rev_logTraffic      1.91
## logPopulation       1.88
## LU_ratio            1.29
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1267
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.648 -0.311 -0.159  0.016  9.223 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.271    0.521   
##  Residual             1.182    1.087   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    0.592      0.151    3.93

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ LU_ratio + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1191
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.710 -0.453 -0.186  0.551  4.235 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.414    0.644   
##  Residual             0.960    0.980   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  -1.4436     0.1804   -8.00
## LU_ratio     -0.1607     0.0696   -2.31
## 
## Correlation of Fixed Effects:
##          (Intr)
## LU_ratio 0.072
Total PAH - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 0.59 0.30 – 0.89 <0.001 -1.44 -1.80 – -1.09 <0.001 0.59 0.30 – 0.89 <0.001 -1.36 -1.73 – -0.99 <0.001
LU_ratio -0.16 -0.30 – -0.02 0.021 -0.16 -0.29 – -0.02 0.022
season [2] -0.32 -0.59 – -0.04 0.025
season [3] -0.31 -0.63 – 0.02 0.066
season [4] 0.04 -0.19 – 0.26 0.757
Random Effects
σ2 1.18 0.96 1.18 0.95
τ00 0.27 Location 0.41 Location 0.27 Location 0.40 Location
ICC 0.19 0.30 0.19 0.30
N 14 Location 14 Location 14 Location 14 Location
Observations 412 412 412 412
Marginal R2 / Conditional R2 0.000 / 0.187 0.122 / 0.387 0.000 / 0.187 0.136 / 0.392
AIC 1273.036 1199.303 1273.036 1202.326

CPAH

show_results(9)

## 
## 
## Table: Variance inflation factors CPAH - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.52
## impervious          4.24
## rev_logTraffic      1.89
## logPopulation       1.88
## LU_ratio            1.29
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ logPopulation + rev_logTraffic + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 755
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.510 -0.244 -0.125  0.050 11.592 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.0289   0.170   
##  Residual             0.3455   0.588   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)     -0.6946     0.3332   -2.08
## logPopulation    0.1494     0.0472    3.16
## rev_logTraffic  -0.6404     0.2147   -2.98
## 
## Correlation of Fixed Effects:
##             (Intr) lgPplt
## logPopulatn -0.521       
## rev_lgTrffc  0.986 -0.555

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ LU_ratio + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1157
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.483 -0.355 -0.159  0.273  4.166 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.266    0.516   
##  Residual             0.892    0.944   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  -2.1727     0.1471  -14.77
## LU_ratio     -0.1884     0.0566   -3.33
## 
## Correlation of Fixed Effects:
##          (Intr)
## LU_ratio 0.073
CPAH - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) -0.69 -1.35 – -0.04 0.037 -2.17 -2.46 – -1.88 <0.001 -0.69 -1.35 – -0.04 0.037 -2.17 -2.46 – -1.88 <0.001
logPopulation 0.15 0.06 – 0.24 0.002 0.15 0.06 – 0.24 0.002
rev_logTraffic -0.64 -1.06 – -0.22 0.003 -0.64 -1.06 – -0.22 0.003
LU_ratio -0.19 -0.30 – -0.08 0.001 -0.19 -0.30 – -0.08 0.001
Random Effects
σ2 0.35 0.89 0.35 0.89
τ00 0.03 Location 0.27 Location 0.03 Location 0.27 Location
ICC 0.08 0.23 0.08 0.23
N 14 Location 14 Location 14 Location 14 Location
Observations 412 412 412 412
Marginal R2 / Conditional R2 0.096 / 0.166 0.185 / 0.372 0.096 / 0.166 0.185 / 0.372
AIC 765.133 1165.104 765.133 1165.104

##HPAH

show_results(10)

## 
## 
## Table: Variance inflation factors HPAH - Water - Total
## 
##                      vif
## -----------------  -----
## nighttime_lights    4.48
## impervious          4.23
## rev_logTraffic      1.91
## logPopulation       1.88
## LU_ratio            1.29
## Linear mixed model fit by REML ['lmerMod']
## Formula: concentration ~ (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1194
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -1.673 -0.316 -0.142  0.001  9.555 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.234    0.484   
##  Residual             0.990    0.995   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    0.509      0.140    3.65

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(concentration) ~ LU_ratio + (1 | Location)
##    Data: df.coc
## 
## REML criterion at convergence: 1199
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.192 -0.410 -0.167  0.539  4.187 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Location (Intercept) 0.361    0.601   
##  Residual             0.982    0.991   
## Number of obs: 412, groups:  Location, 14
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  -1.6083     0.1697   -9.48
## LU_ratio     -0.1736     0.0653   -2.66
## 
## Correlation of Fixed Effects:
##          (Intr)
## LU_ratio 0.072
HPAH - Water - Total
  linear log-linear linear seasonal log-linear seasonal
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p
(Intercept) 0.51 0.24 – 0.78 <0.001 -1.61 -1.94 – -1.28 <0.001 0.51 0.24 – 0.78 <0.001 -1.61 -1.94 – -1.28 <0.001
LU_ratio -0.17 -0.30 – -0.05 0.008 -0.17 -0.30 – -0.05 0.008
Random Effects
σ2 0.99 0.98 0.99 0.98
τ00 0.23 Location 0.36 Location 0.23 Location 0.36 Location
ICC 0.19 0.27 0.19 0.27
N 14 Location 14 Location 14 Location 14 Location
Observations 412 412 412 412
Marginal R2 / Conditional R2 0.000 / 0.191 0.143 / 0.373 0.000 / 0.191 0.143 / 0.373
AIC 1200.321 1206.656 1200.321 1206.656