---
title: "AR simulated example Chapter 4"
author: "Kathleen M. Gates"
output:
word_document: default
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Set up the environment:
```{r}
library(pracma)
library(signal)
library(fUnitRoots)
library(tseries)
library(portes)
```
## Look at trends and order
First let's generate some data with an AR(1) coefficient of 0.20 and no trend:
$$
\begin{equation}
y(t) = 0.20*y(t-1) + \zeta(t)
\end{equation}
$$
```{r}
#initiate
seed <- 123456
noise <- rnorm(200, 0, 5)
# AR series
yAR <- rnorm(1, 0, 1)
for (p in 2:200)
yAR[p] <- 0.20*yAR[p-1] + noise[p]
plot(ts(yAR))
```
Test order, coefficients, and for nonstationarity.
```{r}
par(mfrow = c(1,2))
plot(acf(yAR, plot = F), main = "ACF for yAR")
plot(pacf(yAR, plot = F), main = "PACF for yAR")
fitAR <- ar(yAR)
# examine order
fitAR$order
# examine coefficient estimates
fitAR$ar
```
The lag order and estimates look good.
Let's examine a linear trend without any autoregressive relations and see what happens.
$$
\begin{equation}
y(t) = 0.20*time(t) + \zeta(t)
\end{equation}
$$
```{r}
# trend series
time <- 1:200
ytrend <- rnorm(1, 0, 1)
for (p in 2:200)
ytrend[p] <- 0.2*time[p] + noise[p]
plot(ts(ytrend))
par(mfrow = c(1,2))
plot(acf(ytrend, plot = F), main = "ACF for ytrend")
plot(pacf(ytrend, plot = F), main = "PACF for ytrend")
fitTrend <- ar(ytrend)
# examine order
fitTrend$order
# examine coefficient estimates
fitTrend$ar
```
We see that a linear trend may show up as AR effects.
Now let's put them together - a linear trend and an AR(1) relation.
$$
\begin{equation}
y(t) = 0.20*y(t-1) + 0.20*time(t) + \zeta(t)
\end{equation}
$$
```{r}
# AR series + trend
yARtrend <- rnorm(1, 0, 1)
for (p in 2:200)
yARtrend[p] <- 0.20*yARtrend[p-1] + 0.2*time[p]+ noise[p]
plot(ts(yARtrend))
par(mfrow = c(1,2))
plot(acf(yARtrend, plot = F), main = "ACF for yARtrend")
plot(pacf(yARtrend, plot = F), main = "PACF for yARtrend")
fitARtrend <- ar(yARtrend)
# examine order
fitARtrend$order
# examine coefficient estimates
fitARtrend$ar
```
Again, we see inflated AR coefficients.
Fortunately, we can remove trends pretty easily. We can either include it during the model (much like in our data-generating equations above) or we can detrend prior to analysis. Let's detrend.
```{r}
## estimating trends ##
fitc<- lm(yARtrend ~ time) #run a simple regression predicting time
summary(fitc)
```
We can see here that trends exist for $yARtrend$. You can check on your own to ensure that the coefficient for the trend variable $time$ was not signfiicant for the $yAR$ data.
Now that we have identified the presence of a linear trend we can remove it.
```{r}
##### detrend the yARtrend data - the easy way ####
ydetrend1 <- detrend(yARtrend, tt = 'linear')
# If we do the below we can see exactly what is being done.
# Remember that fitc was the regression of the yARtrend variable on the time vector.
ypredict <- fitc$coefficients[1] + fitc$coefficients[2]*time # predict Y using coefficients
ydetrend2 <- yARtrend - ypredict # subtract the predicted values from observed to obtain the residuals. This is the detrended data.
# This is what is done in the detrend function, and is perfectly correlated with ydetrend1
# now look at results:
par(mfrow = c(1,2))
plot(acf(ydetrend2, plot = F), main = "ACF for ydetrend2")
plot(pacf(ydetrend2, plot = F), main = "PACF for ydetrend2")
ar(ydetrend2) # AR(1) selected
```
Now we obtain the AR(1) coefficient used to generate the data.
## Testing for stability and stationarity.
It is common to also test for stability and stationarity. The former helps to identify if the data will "blow up" if taken to T = infinity.
```{r}
adf.test(yARtrend, alternative = "stationary")
```
The ADF test found that the null hypothesis that the data are not stable is rejected. The alternative hypothesis, that the data are stationary, was supported. This highlights the need to explore linear trends in the data where extreme values are unlikely.
## Testing residuals.
Now let's simulate a new series with greater lags - an AR(3) process. We'll fit an
```{r}
# AR(3) series
yAR3 <- rnorm(1, 0, 1)
yAR3[2] <- 0.3*yAR3[1]+noise[2]
yAR3[3] <- 0.3*yAR3[1] + 0.3*yAR3[2] +noise[2]
for (p in 4:200)
yAR3[p] <- 0.30*yAR3[p-1] + 0.30*yAR3[p-2] - 0.40*yAR3[p-3] + noise[p]
plot(ts(yAR3))
acf(yAR3)
pacf(yAR3)
fitAR3 <- ar(yAR3)
Box.test(fitAR3$resid, lag = 5, type = c("Box-Pierce"), fitdf = 2)
```
When the correct lag order is used in the model the residuals pass the Box-Pierce test. They are not correlated across time.
Let's see what happens when we only estimate an AR(2).
```{r}
fitAR2 <- ar(yAR3, order.max = 2)
Box.test(fitAR2$resid, lag = 5, type = c("Box-Pierce"), fitdf = 2)
Box.test(fitAR2$resid, lag=20,type = "Ljung-Box") # portmanteu test
```
We see that the tests were significant - the errors are correlated.