Chapter 4 Water Quality Statistics

We developed a spatial regression model to estimate concentrations for constituents of concern (COCs) in Puget Sound urban stormwater. We first used a linear mixed model to select spatial regression parameters. We then used a censored Markov chain Monte Carlo simulation to perform final regressions.

4.1 Data Sources

4.1.1 Outfall Data

The primary source of measured stormwater data is the S8.D Municipal Stormwater Permit Outfall Data (referred to as the S8 Data in this document) provided by the Washington Department of Ecology (William Hobbs et al. 2015). Special Condition S8.D of the 2007-2012 Phase I Municipal Stormwater Permit required permittees to collect and analyze data to evaluate pollutant loadings of stormwater discharged from different land uses: high density (HD) residential, low density (LD) residential, commercial, and industrial. Phase I Permittees⁵ collected water quality and flow data, sediment data, and toxicity information from stormwater discharges during storm events.

The stormwater outfall data is available from Ecology via an open-data api at: https://data.wa.gov/Natural-Resources-Environment/Municipal-Stormwater-Permit-Outfall-Data/d958-q2ci.

COCs analyzed in this study are:

Zinc - Total
Copper - Total
Nitrite-Nitrate - Dissolved
Lead - Total
Total Phosphorus - Total
Total Suspended Solids
Total Phthalate
Total Polycyclic Aromatic Hydrocarbons (PAH)
Total Carcinogenic PAH (CPAH)
Total High molecular weight PAH (HPAH)

We extracted data for these COCs, and performed minimal data cleaning. We filtered out rejected data (values with a R or REJ flag), removed replicates, and removed three data points that were obvious outliers. While our analysis is not overly sensitive to outliers, three parameters had reported data that were orders of magnitude higher than the rest of the data. One high-outlier value was removed for each of the following COCs: Total Suspended Solids, Nitrite-Nitrate, and Total Phosphorus.

Outliers were removed for Figure 4.1 shows data before outliers were removed. Figure 4.3 shows data after outliers were removed.

Figure 4.1: All observations - outliers in place

Quantile-quantile plots of COCs analyzed are shown in 4.2

Figure 4.2: Quantile-quantile plots of COCs, log scale

Figure 4.3: All observations - outliers removed

4.2 Spatial data

For this study, we did not rely on the permittee’s self-reported land use type to run regression models predicting pollution loading from land use. A visual scan of our land cover data layer versus self-reported land use types revealed little agreement among permittee definitions of the four land use types (high density residential, low density residential, commercial, industrial). Therefore, we compiled a suite of continuous and categorical landscape datasets from which to run prediction loading models. We divide these into land use and landscape data.

4.2.1 Land use

In order to employ a consistent analysis across different monitored watersheds we extracted land use data from Ecology’s 2010 Statewide Land use data set⁶. Ecology generated the coverage from digital county tax parcel layers using Department of Revenue (DOR) two digit land use codes (see; WAC 458-53-030, Stratification of assessment rolls - real property).

4.2.2 Landscape data

For each watershed contained in the S8 dataset, potentially relevant landscape data was extracted from the following sources below:

Layer	ID	Source
Nighttime Lights	nighttime_lights	Global Radiance Calibrated Nighttime Lights
Particulate Matter 2.5μm	pm25	van Donkelaar et al. 2018.
Rooftop Density	roofs	Microsoft AI US Building Footprints
Imperviousness	impervious	TNC Puget Sound land cover
Age of Impervious Surface	change_year_index	Tsinghua University FROM-GLC year of change to impervious surface
Logarithm of Population Density	logPopulation	CIESIN - Columbia University Gridded Population of the World, Version 4
Logarithm of Average Daily Traffic Volume	logTraffic	INRIX Traffic

4.3 Methods

4.3.1 Pre-processing of spatial data

In order to use the landscape data at an appropriate scale across the study area, spatial predictors were stacked and then convolved with a 100-meter gaussian kernel. This resulted in a “fuzzy” set of predictors that could apply across dataset boundaries. These values were then extracted for each monitored watershed. Values were scaled and centered for regression purposes.

4.3.2 Controlling for multicolinearity

To address multicolinearity, we calculate the variance inflation factor (VIF) and iteratively remove parameters with the highest VIF. We keep removing parameters one at a time until all VIF values are below 10.0. Table ?? shows final VIF factors for Zinc. Similar values were found for all other COCs and are not reported here.

Table 4.1: Variance inflation factors Zinc - Water - Total

	vif
change_year_index	3.41
nighttime_lights	3.27
logPopulation	2.00
logtraffic	1.92
LU_ratio	1.11

4.3.3 Censored Data

All COCs had non-detect (left-censored) data present. Ecology flagged non-detect data and provided the reporting limit for each non-detect value. For purposes of model selection, non-detect values were substituted with the reporting limit. For regression, concentration was modeled as a two-parameter response variable according to the approaches detailed by Hadfield (2010) and Helsel (2012). We set the first parameter equal to the measured concentration and the second parameter equal to the reporting limit. We then specified a censored-gaussian prior distribution or censored-log-gaussian prior distribution.

4.3.4 Model Selection

Four potential models were evaluated for each COC:
1. linear - Non-transformed concentrations with location random effects
2. log-linear - Log-transformed concentrations with location random effects
3. linear seasonal - Non-transformed concentrations with location random effects with the addition of a seasonal fixed-effect
4. log-linear seasonal - Log-transformed concentrations with location random effects with the addition of a seasonal fixed-effect

Seasonal factors were included in the model selection by designating an integer corresponding to the season in which data were collected: (1 = Winter, 2 = Spring, 3 = Summer, 4 = Autumn)

Model selection was performed through Backward elimination of random-effect terms followed by backward elimination of fixed-effect terms. Denominator degrees of freedom and F-statistics were calculated using Satterthwaite’s method.

For each model we calculated the Akaike information criterion (AIC) estimator. The model with the lowest AIC was selected for regression.

Linear model results for each COC are shown in the following tables. In general, the model with the lowest AIC was selected to move on to the Bayesian analysis. Only predictors that are statistically significant are shown. In some cases, no predictors were significant. In those cases, only the model intercept is reported.

4.3.4.1 Zinc

Model with lowest AIC: log-linear
Predictors Selected: change_year_index

Zinc - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	108.82	73.63 – 144.01	<0.001	4.01	3.62 – 4.41	<0.001
nighttime_lights	57.10	31.06 – 83.14	<0.001
season2	4.19	-26.00 – 34.37	0.786	-0.06	-0.26 – 0.13	0.521
season3	115.23	79.56 – 150.90	<0.001	0.39	0.16 – 0.62	0.001
season4	8.81	-15.87 – 33.49	0.484	0.00	-0.16 – 0.16	0.981
change_year_index				1.00	0.64 – 1.37	<0.001
Random Effects
σ²	11365.18			0.46
τ₀₀	2327.86 _Location			0.46 _Location
ICC	0.17			0.50
N	14 _Location			14 _Location
Observations	414			414
Marginal R² / Conditional R²	0.267 / 0.392			0.491 / 0.745
AIC	5040.945			920.701

4.3.4.2 Copper

Model with lowest AIC: log-linear
Predictors Selected: logtraffic,change_year_index

Copper - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	67.29	48.41 – 86.17	<0.001	5.07	3.46 – 6.67	<0.001
logtraffic	34.58	22.86 – 46.31	<0.001	1.86	0.83 – 2.90	<0.001
season2	-0.71	-2.93 – 1.50	0.528
season3	6.38	3.77 – 9.00	<0.001
season4	-0.68	-2.53 – 1.17	0.470
change_year_index				0.48	0.15 – 0.80	0.004
Random Effects
σ²	66.16			0.44
τ₀₀	44.74 _Location			0.26 _Location
ICC	0.40			0.37
N	14 _Location			14 _Location
Observations	438			438
Marginal R² / Conditional R²	0.512 / 0.709			0.532 / 0.703
AIC	3116.708			938.998

4.3.4.3 Nitrite-Nitrate

Model with lowest AIC: log-linear seasonal Predictors Selected: LU_ratio

Nitrite-Nitrate - Water - Dissolved - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	575.78	389.75 – 761.81	<0.001	5.81	5.45 – 6.16	<0.001
LU_ratio	80.83	12.29 – 149.37	0.021	0.16	0.03 – 0.29	0.019
season2	-109.12	-231.14 – 12.89	0.080	-0.01	-0.22 – 0.19	0.885
season3	16.24	-130.58 – 163.05	0.828	0.48	0.24 – 0.73	<0.001
season4	-165.25	-264.46 – -66.03	0.001	-0.37	-0.53 – -0.21	<0.001
Random Effects
σ²	180488.75			0.50
τ₀₀	105950.38 _Location			0.41 _Location
ICC	0.37			0.45
N	14 _Location			14 _Location
Observations	406			406
Marginal R² / Conditional R²	0.146 / 0.462			0.206 / 0.567
AIC	6065.056			933.650

4.3.4.4 Lead

Model with lowest AIC: log-linear (seasonal effects not significant at model selection)
Predictors Selected: change_year_index,

Lead - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	30.48	14.45 – 46.51	<0.001	1.60	1.05 – 2.16	<0.001
change_year_index	4.26	1.39 – 7.13	0.004	1.02	0.49 – 1.54	<0.001
logPopulation	-3.01	-5.49 – -0.53	0.018
logtraffic	13.08	2.51 – 23.66	0.015
Random Effects
σ²	132.50			0.69
τ₀₀	16.29 _Location			0.96 _Location
ICC	0.11			0.58
N	14 _Location			14 _Location
Observations	417			417
Marginal R² / Conditional R²	0.201 / 0.288			0.349 / 0.729
AIC	3238.277			1087.496

4.3.4.5 Total Phosphorus

Model with lowest AIC: log-linear seasonal
Predictors Selected: logtraffic

Total Phosphorus - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	468.29	302.21 – 634.37	<0.001	7.08	5.74 – 8.42	<0.001
logtraffic	214.28	111.18 – 317.38	<0.001	1.68	0.85 – 2.51	<0.001
season2	-12.04	-38.11 – 14.02	0.365	-0.11	-0.28 – 0.07	0.225
season3	95.37	64.16 – 126.59	<0.001	0.77	0.56 – 0.98	<0.001
season4	8.70	-13.09 – 30.49	0.434	0.15	0.00 – 0.29	0.048
Random Effects
σ²	8919.81			0.39
τ₀₀	3317.80 _Location			0.22 _Location
ICC	0.27			0.36
N	14 _Location			14 _Location
Observations	424			424
Marginal R² / Conditional R²	0.298 / 0.488			0.349 / 0.586
AIC	5065.815			863.509

4.3.4.6 Total Suspended Solids

Model with lowest AIC: log-linear (seasonal effects not significant at model selection)
Predictors Selected: logtraffic

Total Suspended Solids - Water - Total - No seasonal efects
	linear			log-linear
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	152052.47	74594.12 – 229510.82	<0.001	13.20	11.65 – 14.75	<0.001
logtraffic	63941.48	15699.52 – 112183.44	0.009	1.95	0.98 – 2.91	<0.001
Random Effects
σ²	3259447978.93			1.06
τ₀₀	682696072.38 _Location			0.28 _Location
ICC	0.17			0.21
N	14 _Location			14 _Location
Observations	415			415
Marginal R² / Conditional R²	0.090 / 0.248			0.213 / 0.377
AIC	10259.857			1240.983

4.3.4.7 Total Phthalate

Model with lowest AIC: log-linear (seasonal effects not significant at model selection)
Predictors Selected: none

Total Phthalate - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	2.90	1.78 – 4.01	<0.001	0.42	0.01 – 0.83	0.044
Random Effects
σ²	16.20			0.60
τ₀₀	3.88 _Location			0.58 _Location
ICC	0.19			0.49
N	14 _Location			14 _Location
Observations	412			412
Marginal R² / Conditional R²	0.000 / 0.193			0.000 / 0.490
AIC	2349.218			1013.961

4.3.4.8 Total PAH

Model with lowest AIC: log-linear
Predictors Selected: LU_ratio

Total PAH - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	0.59	0.30 – 0.89	<0.001	-1.36	-1.73 – -0.99	<0.001
LU_ratio				-0.16	-0.29 – -0.02	0.022
season2				-0.32	-0.59 – -0.04	0.025
season3				-0.31	-0.63 – 0.02	0.066
season4				0.04	-0.19 – 0.26	0.757
Random Effects
σ²	1.18			0.95
τ₀₀	0.27 _Location			0.40 _Location
ICC	0.19			0.30
N	14 _Location			14 _Location
Observations	412			412
Marginal R² / Conditional R²	0.000 / 0.187			0.136 / 0.392
AIC	1273.036			1202.326

4.3.4.9 Total CPAH (seasonal effects not significant at model selection)

Model with lowest AIC: linear Predictors Selected: logPopulation, logtraffic

CPAH - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	-0.69	-1.34 – -0.04	0.038	-2.17	-2.46 – -1.88	<0.001
logPopulation	0.18	0.07 – 0.29	0.002
logtraffic	-0.64	-1.06 – -0.22	0.003
LU_ratio				-0.19	-0.30 – -0.08	0.001
Random Effects
σ²	0.35			0.89
τ₀₀	0.03 _Location			0.27 _Location
ICC	0.08			0.23
N	14 _Location			14 _Location
Observations	412			412
Marginal R² / Conditional R²	0.096 / 0.166			0.185 / 0.372
AIC	764.803			1165.104

4.3.4.10 Total HPAH

Model with lowest AIC: linear(seasonal effects not significant at model selection)
Predictors Selected: none

HPAH - Water - Total - With seasonal efects
	linear seasonal			log-linear seasonal
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	0.51	0.24 – 0.78	<0.001	-1.61	-1.94 – -1.28	<0.001
LU_ratio				-0.17	-0.30 – -0.05	0.008
Random Effects
σ²	0.99			0.98
τ₀₀	0.23 _Location			0.36 _Location
ICC	0.19			0.27
N	14 _Location			14 _Location
Observations	412			412
Marginal R² / Conditional R²	0.000 / 0.191			0.143 / 0.373
AIC	1200.321			1206.656

4.3.5 Bayesian Regression

We performed censored bayesian regression using a Markov chain Monte Carlo approach. We use the the MCMCglmm package in R (Helsel 2012) for analysis.

Models were run for 60,000 iterations with a burn-in interval of 10,000 and a thinning interval of 13. For each regression model, we specified either a censored Gaussian or a censored log-Gaussian prior distribution, depending on the model selection results. We specified a simple prior co-variance matrix where covariances between predictors was fixed at 1. We explored other prior variances were not sensitive to these changes.

4.4 Results

4.4.1 Zinc

4.4.1.1 Regression Coefficients

Regression results for Total Zinc are summarized in Table 4.2.

Table 4.2: Zinc - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	4.039	3.596	4.48	0	fixed
change_year_index	1.011	0.613	1.44	0	fixed
Location	0.551	0.157	1.10	NA	random
units	1.000	1.000	1.00	NA	residual

4.4.1.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.1.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.2 Copper

4.4.2.1 Regression Coefficients

Regression results for Total Zinc are summarized in Table 4.3.

Table 4.3: Copper - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	5.051	3.195	6.805	0.000	fixed
logtraffic	1.860	0.729	3.082	0.004	fixed
change_year_index	0.490	0.122	0.864	0.012	fixed
Location	0.309	0.082	0.660	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.2.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.2.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.3 Nitrite-Nitrate

4.4.3.1 Regression Coefficients

Regression results for Nitrite-Nitrate are summarized in Table 4.4.

Table 4.4: Nitrite-Nitrate - Water - Dissolved - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	5.715	5.301	6.087	0.000	fixed
LU_ratio	0.158	-0.006	0.302	0.049	fixed
Location	0.509	0.157	1.040	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.3.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.3.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.4 Lead

4.4.4.1 Regression Coefficients

Regression results for Total Lead are summarized in Table 4.5.

Table 4.5: Lead - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	1.314	0.718	1.98	0.002	fixed
impervious	0.873	0.257	1.50	0.009	fixed
Location	1.441	0.483	2.84	NA	random
units	1.000	1.000	1.00	NA	residual

4.4.4.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

#### Model Diagnostics Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.5 Total Phosphorus

4.4.5.1 Regression Coefficients

Regression results for Total Phosphorus are summarized in Table 4.6.

Table 4.6: Total Phosphorus - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	7.231	5.799	8.734	0.000	fixed
logtraffic	1.714	0.806	2.649	0.002	fixed
Location	0.251	0.065	0.517	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.5.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.5.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.
### Total Suspended Solids #### Regression Coefficients Regression results for Total Suspended Solids are summarized in Table 4.7.

Table 4.7: Total Suspended Solids - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	13.217	11.457	14.976	0.000	fixed
logtraffic	1.961	0.885	3.058	0.002	fixed
Location	0.349	0.091	0.707	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.5.4 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.5.5 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.6 Total PAH

4.4.6.1 Regression Coefficients

Regression results for Total PAH are summarized in Table 4.8.

Table 4.8: Total PAH - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	-1.815	-2.430	-1.243	0.00	fixed
LU_ratio	-0.164	-0.396	0.061	0.15	fixed
Location	1.226	0.400	2.480	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.6.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.6.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.7 Total CPAH

4.4.7.1 Regression Coefficients

Regression results for Total CPAH are summarized in Table 4.9.

Table 4.9: CPAH - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	-3.916	-8.059	-0.059	0.060	fixed
logPopulation	0.539	-0.149	1.178	0.095	fixed
logtraffic	-0.702	-3.287	1.957	0.521	fixed
Location	1.271	0.298	2.833	NA	random
units	1.000	1.000	1.000	NA	residual

4.4.7.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.7.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.4.8 Total HPAH

4.4.8.1 Regression Coefficients

Table 4.10: HPAH - Water - Total - Bayseian Regression Results
Predictor	Posterior Mean	lower 95% CI	upper 95% CI	MCMC p value	effect
(Intercept)	-2.2172457	-2.9392675	-1.4083642	0.0002599	fixed
LU_ratio	-0.1939536	-0.4814508	0.0923844	0.1715623	fixed
Location	1.9186908	0.4360824	4.1605803	NA	random
units	1.0000000	1.0000000	1.0000000	NA	residual

4.4.8.2 Posterior Uncertainty Levels

Estimated posterior density curves with 80% confidence intervals are shown in Figure ??.

## [[1]]

4.4.8.3 Model Diagnostics

Figure ?? shows the diagnostic trace plot of MCMC draws. This was used to verify prior distribution assumptions.

4.5 Predictions

We used the models developed above to generate predictions of pollutant concentrations to verify regression parameters. We first evaluated the posterior means of the predictors to evaluate data fit. We then simulated 100 draws from the full posterior distributions to evaluate the range of predictions. Results are show below.

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

## [[1]]

4.6 Discussion

Our models appear to fit both the mean concentrations within each location and capture the full spread of the data within the bounds of the 95% confidence intervals. Model fit appears to be the strongest for metals (Zinc, Copper, Lead) and not as strong for organics (PAH, CPAH and HPAH). These may be improved through specifying a better prior distribution.

To our knowledge, this work represents the first effort to predict stormwater pollution loading from landscape characteristics, rather than from simple land use categories. In the stormwaterheatmap tool, we use these predictive models to spatially output predicted levels of commonly reported stormwater pollutants.

4.7 Generation of heatmap layers

To produce heatmap layers of pollutant concentration, we use the original spatial predictor layers along with a linear relationship between response and predictors. To avoid extrapolating beyond the available data, we first clamp the predictors to the high and low values observed with monitored watersheds. We then reduce predictor layers into an array-valued image, where each pixel in the model domain contains an array of predictor values. We then multiply this layer by an array that contains the intercept of the regression relationship, along with coefficients.

For example for each COC, we have a linear regression in the form of:

\[ y_{i,j\ }=\ \beta_{0\ }+\beta_1x_1+\ ...\ +\beta_{n\ }+\epsilon_{i,j} \]

Where β₀ is the intercept and B_n are the regression coefficients. We express this as a 1-D array in the form of \[ \begin{bmatrix} \beta_0 \\ \beta_1 \\ ...\\\ \beta_n \end{bmatrix} \]

The predictor layer is comprised of pixels with 1-D arrays of predictor values in the form of: \[ \begin{bmatrix} 1 \\ x_1 \\ ...\\ x_n \\ \end{bmatrix} \]

We then calculate the dot product of the two arrays and reduce the values to a single image. This is expressed in the same manner as the linear regression equation:

\[ y = \beta_0+\beta_1x_1+...\beta_nx_n \]

Cities of Tacoma and Seattle; King, Snohomish, Pierce and Clark counties; Ports of Tacoma and Seattle↩︎
See: https://fortress.wa.gov/ecy/gispublic/DataDownload/ECY_CAD_Landuse2010.htm for more information↩︎