As a preface, I will define here some functions that will be useful later on in the analysis.

```
## Functions used in Time Series Project -- Lake Erie Water Levelss
library(RColorBrewer)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
sse<-function(pred, data){
true<-as.vector(data)
sum((pred-true)^2)
}
colors2<-col_vector
addplot<-function(pred,range=1:length(lkerie),color1){
lines(x=c(time(lkerie))[range],y=c(lkerietrain,pred)[range],col=color1,lty="dashed")
}
get.best.arima <- function(x_ts, maxord = c(1,1,1,1,1,1))
{
best.aic <- 1e8
n <- length(x_ts)
for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])
for (P in 0:maxord[4]) for(D in 0:maxord[5]) for(Q in 0:maxord[6])
{
fit <- arima(x_ts, order = c(p,d,q),
seas = list(order = c(P,D,Q),
frequency(x_ts)), method = "CSS")
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.model <- c(p,d,q,P,D,Q)
}
}
list(best.aic, best.fit, best.model)
}
```

```
library(forecast)
library(prophet)
library(tidyr)
library(TSA)
library(lubridate)
```

The dataset I have chosen is a monthly recording of water levels in Lake Erie from the period between 1921 and 1970. The dataset is available at the following link: https://datamarket.com/data/set/22pw/monthly-lake-erie-levels-1921-1970#!ds=22pw&display=line&numberformat=n1. Below is a plot of the time series over time:

# Importing the data¶

```
#lkerie<-dmseries("https://datamarket.com/data/set/22pw/monthly-lake-erie-levels-1921-1970#!ds=22pw&display=line")
#lkerie <- as.ts(lkerie,frequency=12)
#Note: update as of 2019, datamarket is no longer availiable. As an alternative, I am using data from a different source
# that has the lake erie water levels as well.
lkerie = read.csv('http://lre-wm.usace.army.mil/ForecastData/GLHYD_data_english.csv', skip=12)
mask = (lkerie$year <= 1970) & (lkerie$year >= 1921)
lkerie = lkerie[mask, 'Erie']
lkerie <- ts(lkerie,frequency=12, start=c(1921, 1))
```

```
plot(lkerie)
```

We can see that the water levels seem to jump somewhat cyclically over time. This is typical of physical processes, and thus it is an interesting dataset to analyze. We see that over time the water levels decreased initially, and then began to increase again somewhat steadily from the 1940s onwards. While there is clearly a seasonal component at work here, it is not so clear how this seasonal component operates initially from the plot above.

I first begin by fitting a seasonal means with linear trend model to summarize and get an idea of the seasonal component of the data. I then built four models โ the Holt Winters model, an SARIMA model, a model decomposing the seasonal component with spectral analysis, and a model from the prophet package. I then compare the models using the sum squared errors and plotting them against the true data values.

I have separated the time series into two sets, as shown below. These are the training and test sets I will be using to evaluate how effective each of the models I fit are at estimating the true time series. The following code initializes the dataset as a time series, and creates a training and test dataset to perform analysis on. The test set is the last 12 observations, or one year, of the time series.

```
lkerietrain<-ts(lkerie[1:(length(lkerie)-12)], frequency=12)
lkerietest<-lkerie[(length(lkerie)-11):length(lkerie)]
tail(lkerietrain)
```

After this initialization, the dataset is in good condition and doesnโt need to be cleaned any further. The first 6 values of the time series are shown below:

# Seasonal Means Model¶

The first model I fit was a seasonal means model, to get a sense of the seasonality of the dataset. Although this is a simplistic model that will not be used to model the data itself, it provides good insight into the seasonal nature of the time series. Below is a plot of the time series with the months added to the plot:

```
season_<-season(lkerie)
time_<-time(lkerie)
```

We can see that the time series tends to peak around the summer time, and has troughs in the colder months. The seasonal means model gives us more insight into how this is occurring:

```
help(plot)
```

```
seasonmean<-lm(lkerie~season_+time_)
summary(seasonmean)
plot(lkerie)
points(y=lkerie,x=time(lkerie),pch=as.vector(season(lkerie)))
```

Indeed, we can see that the seasonal means model has quite a bit of significant components, suggesting that there is a good amount of seasonality in the data. However, this is not consistent, with some months having insignificant values. In general, we can see that the summer months have positive coefficients, while the colder months (which are not nearly as significant) sit somewhere close to zero. This agrees with the earlier findings from the time series plot.

Next, I perform a decomposition of the time series, to see how this package can break up the time series into seasonal, trend and random components.

```
madecomp<-decompose(lkerie)
plot(madecomp)
madecomp$seasonal[1:12]
```

We can see again that there is clearly a seasonal component to the time series. We can also see the general trend of decreasing and increasing. When these components are removed, we see a relatively stationary random process. This cyclical nature of the seasonality lead me to consider spectral analysis. However, first we will look at two other models which can deal with seasonality โ the Holt Wintersโ Model and a seasonal ARIMA model.

# Holt Winters' Model¶

I use two Holt-Wintersโ Models (an additive and multiplicative model) to model the data. Below, I fit the two models, and plot the resulting fits. The Holt-Winters models both consider seasonal components, although the multiplicative model and additive model come up with different quantities for the coefficients (see the appendix for the summaries). However, the smoothing parameters are relatively the same in character, with alpha=0.8973698 beta=0.003048736 and gamma: 1. This indicates that the level and seasonality are smoothing based on observations in the distant past (with gamma = 1 meaning it is considering only the previous observations), while a small beta indicates that it is using the trend of only the nearest values. I have plotted one of the models below, but the other is very similar in nature. However, as we will see later, the prediction error is quite high โ (and the prediction error for the multiplicative model is, predictably, much higher).

```
hw1<-HoltWinters(lkerietrain)
hw1
plot(hw1)
```

```
hw2<-HoltWinters(lkerietrain,beta=NULL,seasonal="multiplicative")
hw2
plot(hw2)
hw1pred<-predict(hw1,n.ahead=12,prediction.interval=TRUE)
hw2pred<-predict(hw2,n.ahead=12,prediction.interval=TRUE)
plot(forecast(hw1,12))
plot(forecast(hw2,12))
ssehw1<-sse(hw1pred[,1], lkerietest)
ssehw2<-sse(hw2pred[,1], lkerietest)
```