Title: | Sequential Change Point Detection for High-Dimensional VAR Models |
---|---|
Description: | Implements the algorithm introduced in Tian, Y., and Safikhani, A. (2024) <doi:10.5705/ss.202024.0182>, "Sequential Change Point Detection in High-dimensional Vector Auto-regressive Models". This package provides tools for detecting change points in the transition matrices of Vector Auto-Regressive (VAR) models, effectively identifying shifts in temporal and cross-correlations within high-dimensional time series data. The package includes functions to generate synthetic VAR data, detect change points in high-dimensional time series, and analyze real-world data. It also demonstrates an application to financial data: the daily log returns of 186 S&P 500 stocks from 2004-02-06 to 2016-03-02. |
Authors: | Yuhan Tian [aut, cre], Abolfazl Safikhani [aut] |
Maintainer: | Yuhan Tian <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-01-10 05:20:09 UTC |
Source: | https://github.com/helloworld9293/varcpdetectonline |
This function generates Vector Auto-Regressive (VAR) data based on the provided parameters.
generateVAR(n, As, Sig, h, isOldProvided = FALSE, oldxs = NULL)
generateVAR(n, As, Sig, h, isOldProvided = FALSE, oldxs = NULL)
n |
Integer. The length of the VAR data to be generated. |
As |
List. A list containing the transition matrices for the VAR process. |
Sig |
Matrix. The covariance matrix of errors. |
h |
Integer. The order of the VAR process. |
isOldProvided |
Logical. If TRUE, the VAR data will be generated based on the last observations from the previous segment of data. Defaults to FALSE. |
oldxs |
Matrix. A |
A data matrix of dimensions p
by n
.
# Example usage As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2)) Sig <- diag(2) data <- generateVAR(n = 100, As = As, Sig = Sig, h = 1)
# Example usage As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2)) Sig <- diag(2) data <- generateVAR(n = 100, As = As, Sig = Sig, h = 1)
This function clusters alarms into groups and identifies the starting points of the alarm clusters.
If the next alarm occurs within a specified window size (w
) from the current alarm,
it will be considered part of the current cluster. Otherwise, a new cluster will be formed.
get_cps(alarms, w)
get_cps(alarms, w)
alarms |
A numeric vector. The alarms raised during the monitoring process. |
w |
An integer. The window size used to group alarms into clusters. |
A numeric vector containing the starting points of the alarm clusters.
If the next alarm is within w
observations of the current alarm,
the next alarm will be considered part of the current alarm cluster.
Otherwise, a new cluster is formed and the next alarm is considered the beginning
of a new alarm cluster.
# Example usage: alarms <- c(10, 15, 30, 35, 60) change_points <- get_cps(alarms, w = 10)
# Example usage: alarms <- c(10, 15, 30, 35, 60) change_points <- get_cps(alarms, w = 10)
This dataset contains daily log returns for 186 stocks in the S&P 500 index from February 6, 2004, to March 2, 2016. The daily log returns are calculated using the adjusted daily closing prices. The dataset also contains the corresponding dates for each log return.
data(sp500)
data(sp500)
A list with two elements:
sp500$log_daily_return
A matrix with dimensions 3037 (rows, trading days) by 186 (columns, stocks).
sp500$date
A vector of length 3037, containing the dates for each trading day.
The dataset is provided as an .RData
file containing:
sp500$log_daily_return
: A matrix of daily log returns with 3037 rows (trading days) and 186 columns (stocks).
sp500$date
: A vector of length 3037 containing the dates for each daily log return.
Data from the S&P 500 stock index (2004-2016).
# Example Usage: Applying Change Point Detection to S&P 500 Data # This is an example of how to apply the change point detection method # (using the VAR_cpDetect_Online function) on the daily log return # dataset from the S&P 500 (stored in the sp500 dataset). The code # below calculates the average return volatility for all stocks, applies # the change point detection algorithm, and plots the results with detected # change points shown as vertical red and black lines. ## Not run: # Load the dataset data(sp500) # Set parameters library(ggplot2) set.seed(2024) n_sp <- nrow(sp500$log_daily_return) p_sp <- ncol(sp500$log_daily_return) # Calculate average return volatility for all data points volatility_sum <- rep(0, (n_sp - 21)) for(col in 1:p_sp){ temp <- as.numeric(sp500$log_daily_return[, col]) temp1 <- rep(0, (n_sp - 21)) for(row in 1:(n_sp - 21)){ temp1[row] <- sd(temp[(row):(row + 21)]) } volatility_sum <- volatility_sum + temp1 } volatility_ave <- volatility_sum / p_sp # Apply change point detection method n0 <- 200 w <- 22 alpha <- 1 / 5000 res <- VAR_cpDetect_Online(t(sp500$log_daily_return), n0, w, alpha, 1, FALSE, TRUE, 5 / w, TRUE) res_sp <- res$alarm_locations + n0 res_sp_cps <- res$cp_locations + n0 # Get the estimated starting points of each alarm cluster cps_est_sp <- unique(res_sp_cps[which(res_sp %in% get_cps(res_sp, w))]) # Prepare data for plotting y_values <- c(volatility_ave) x_values <- sp500$date[1:(n_sp - 21)] df <- data.frame(y_values, x_values) plot_sp <- ggplot(df, aes(y = y_values, x = x_values)) + geom_line() + theme(legend.position = "none") + labs(title = "", x = "", y = "") + scale_x_date(date_breaks = "1 year", date_labels = "%Y") + geom_vline(xintercept = sp500$date[res_sp], linetype = "solid", color = "red", alpha = .1) + geom_vline(xintercept = sp500$date[cps_est_sp], linetype = "solid", color = "black") # Print the detected change points sp500$date[cps_est_sp] # The dates for the starting of the alarm clusters plot_sp ## End(Not run)
# Example Usage: Applying Change Point Detection to S&P 500 Data # This is an example of how to apply the change point detection method # (using the VAR_cpDetect_Online function) on the daily log return # dataset from the S&P 500 (stored in the sp500 dataset). The code # below calculates the average return volatility for all stocks, applies # the change point detection algorithm, and plots the results with detected # change points shown as vertical red and black lines. ## Not run: # Load the dataset data(sp500) # Set parameters library(ggplot2) set.seed(2024) n_sp <- nrow(sp500$log_daily_return) p_sp <- ncol(sp500$log_daily_return) # Calculate average return volatility for all data points volatility_sum <- rep(0, (n_sp - 21)) for(col in 1:p_sp){ temp <- as.numeric(sp500$log_daily_return[, col]) temp1 <- rep(0, (n_sp - 21)) for(row in 1:(n_sp - 21)){ temp1[row] <- sd(temp[(row):(row + 21)]) } volatility_sum <- volatility_sum + temp1 } volatility_ave <- volatility_sum / p_sp # Apply change point detection method n0 <- 200 w <- 22 alpha <- 1 / 5000 res <- VAR_cpDetect_Online(t(sp500$log_daily_return), n0, w, alpha, 1, FALSE, TRUE, 5 / w, TRUE) res_sp <- res$alarm_locations + n0 res_sp_cps <- res$cp_locations + n0 # Get the estimated starting points of each alarm cluster cps_est_sp <- unique(res_sp_cps[which(res_sp %in% get_cps(res_sp, w))]) # Prepare data for plotting y_values <- c(volatility_ave) x_values <- sp500$date[1:(n_sp - 21)] df <- data.frame(y_values, x_values) plot_sp <- ggplot(df, aes(y = y_values, x = x_values)) + geom_line() + theme(legend.position = "none") + labs(title = "", x = "", y = "") + scale_x_date(date_breaks = "1 year", date_labels = "%Y") + geom_vline(xintercept = sp500$date[res_sp], linetype = "solid", color = "red", alpha = .1) + geom_vline(xintercept = sp500$date[cps_est_sp], linetype = "solid", color = "black") # Print the detected change points sp500$date[cps_est_sp] # The dates for the starting of the alarm clusters plot_sp ## End(Not run)
This function performs sequential change point detection in high-dimensional time series data modeled as a Vector Auto-Regressive (VAR) process, targeting changes in the transition matrices that encode temporal and cross-correlations.
VAR_cpDetect_Online( data, n0, w, alpha, h, RLmode = TRUE, needRefine = TRUE, refineSize = 1/5, needRefineCorrection = TRUE )
VAR_cpDetect_Online( data, n0, w, alpha, h, RLmode = TRUE, needRefine = TRUE, refineSize = 1/5, needRefineCorrection = TRUE )
data |
A matrix where rows represent different dimensions (features) and columns represent observations. The first |
n0 |
Integer. The size of the historical data (number of columns in |
w |
Integer. The size of the sliding window used for calculating test statistics; referred to as the pre-specified detection delay. |
alpha |
Numeric. The desired false alarm rate, where 1/alpha represents the targeted average run length (ARL), which should exceed the length of the data to be monitored. |
h |
Integer. The order of the VAR process. |
RLmode |
Logical. If |
needRefine |
Logical. If |
refineSize |
Numeric. The proportion of the new window size to the original window size, used during refinement. |
needRefineCorrection |
Logical. If |
This function fits a VAR model to the historical data using the l1 penalty and calculates test statistics for the sliding window to detect change points. If refinement is enabled, a second step narrows down the change point location. Optionally, a correction step can verify the detected change points.
A list containing:
RL
The index (ignoring historical data) of the last observation read by the algorithm when the first alarm is issued. This is returned only if RLmode = TRUE
.
cp_refined
The refined estimate for the location (ignoring historical data) of the change point. This is returned only if RLmode = TRUE
and needRefine = TRUE
.
alarm_locations
A vector of indices (ignoring historical data) where alarms were raised. This is returned only if RLmode = FALSE
.
cp_locations
A vector of refined change point locations (ignoring historical data), corresponding 1-to-1 with the alarm_locations
. This is returned only if RLmode = FALSE
and needRefine = TRUE
.
library(sparsevar) library(MASS) set.seed(2024) As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2)) As_new <- list(matrix(c(-0.5, 0.2, 0.1, -0.4), 2, 2)) Sig <- diag(2) data_IC <- generateVAR(n = 400, As = As, Sig = Sig, h = 1) data_OC <- generateVAR(n = 100, As = As_new, Sig = Sig, h = 1, isOldProvided = TRUE, oldxs = data_IC[, ncol(data_IC)]) data <- cbind(data_IC, data_OC) result <- VAR_cpDetect_Online(data, n0 = 300, w = 20, alpha = 1/200, h = 1, RLmode = TRUE, needRefine = TRUE, refineSize = 1/5, needRefineCorrection = TRUE) print(result)
library(sparsevar) library(MASS) set.seed(2024) As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2)) As_new <- list(matrix(c(-0.5, 0.2, 0.1, -0.4), 2, 2)) Sig <- diag(2) data_IC <- generateVAR(n = 400, As = As, Sig = Sig, h = 1) data_OC <- generateVAR(n = 100, As = As_new, Sig = Sig, h = 1, isOldProvided = TRUE, oldxs = data_IC[, ncol(data_IC)]) data <- cbind(data_IC, data_OC) result <- VAR_cpDetect_Online(data, n0 = 300, w = 20, alpha = 1/200, h = 1, RLmode = TRUE, needRefine = TRUE, refineSize = 1/5, needRefineCorrection = TRUE) print(result)