Assessing Significance in a Continuous Time Series: Some Functions and R Code

November 4, 2023

Analyzing Time-Series Data

In this blog post, I demonstrate a method for assessing significance in a time series of continuous ratings using a method developed by Dr. Emery Schubert.

Dr. Schubert is a professor of music at the University of New South Wales and among the first to apply continuous response methodology to perceptions of emotion in music. His second-order standard deviation threshold (SSDT) method is useful when analyzing continuous responses to a temporally-changing stimulus like music. I draw from two of his publications (Schubert, 2007; 2013) and share some reproducible R code I wrote to calculate significance thresholds.

Identifying Significant Moments in a Time Series

Imagine you are a researcher interested in understanding how emotional responses to a piece of music change as it progresses. You set up an experiment where participants continuously rate emotions by moving their mouse along their computer screen as time passes. You record a participant’s mouse position as many times as the experiment computer’s screen refreshes.

For most computers, the screen refreshes 60 times a second. Just one participant’s responses to a five minute piece would produce 18000 unique data points. With as few as six participants, you’ll have over 100,000 data points to analyze! How do you assess which mouse movements are actually meaningful?

Second-Order Standard Deviation Threshold

It can be difficult to establish whether sudden changes in a time series represent important differences in responses to a stimulus, or the result of erratic movements. To help distinguish important changes in ratings from unimportant ones, Schubert’s method estimates significant changes in responses with a significance threshold. The SSDT method measures changes in significance along an ensemble time series (e.g., representing data from multiple raters).
The method uses intuitive summary statistics like means and standard deviations to assess significance.

First, we calculate a forward-moving central tendency (e.g., mean, median). We’ll divide the time series into multiple epochs using a consistent time interval (e.g., every three seconds) and average responses in each epoch. We’ll also measure the vertical standard deviation at each point along the time series–representing variability across all raters’ responses, SD1. The code examples below use these calculations.

Measuring changes in the variability of rater responses can help us understand where ratings along the ensemble time series are most consistent. However, we need a threshold to establish how much variability in raters’ responses should be considered acceptable. To do this, we calculate the second-order standard deviation, sd2, which measures the variability across SD1. Finally, we can create a significance threshold using sd2 and the sample mean of SD1 values: Eq 1 (Schubert, 2013), where n is an adjustable parameter changing the threshold width. The line over SD1 denotes that we’re using the sample mean of SD1.

To demonstrate the SSDT method, I’ve prepared some reproducible code and accompanying functions to calculate it in R. I wrote this code as part of a PhD module project with McMaster’s LiveLab. This means the functions programmed below and potential mistakes within them are from my own work, based on my readings of Schubert. I plan to put these together into a package later, but in the mean time, you can copy paste each function and modify them to suit your own needs.

R Example and Functions

Data Preparation

Let’s imagine we have time series data from three participants. In R, the following code creates a fake data set of (fake) participants’ time series responses:

do.call('rbind',
        list(
          # participant "a"
          # t: (time) is a continuous sequence of 105 numbers between 0 and 30
          # y: 105 random samples from an exponential distribution
          # z: participant ID
          data.frame(t = seq(from = 0, to=30, length.out = 105), 
                     y = rexp(length(seq(from = 0, to=30, length.out = 105))), 
                     name = 'a', stringsAsFactors = T), 
          # participant "b"
          # t: (time) is a continuous sequence of 100 numbers between 0 and 30
          # y: 100 random samples from an exponential distribution
          # z: participant ID
          data.frame(t = seq(from = 0, to=30, length.out = 100), 
                     y = rexp(length(seq(from = 1, to=30, length.out = 100))), 
                     name = 'b', stringsAsFactors = T), 
          # participant "c"
          # t: (time) is a continuous sequence of 100 numbers between 0 and 30
          # y: 115 random samples from an exponential distribution
          # z: participant ID
          data.frame(t = seq(from = 0, to=30, length.out = 115), 
                     y = rexp(length(seq(from = 1, to=30, length.out = 115))), 
                     name = 'c', stringsAsFactors = T) 
        )
) -> sim1

summary(sim1)
       t                y            name   
 Min.   : 0.000   Min.   :0.000736   a:105  
 1st Qu.: 7.467   1st Qu.:0.295772   b:100  
 Median :15.000   Median :0.658338   c:115  
 Mean   :15.000   Mean   :0.981667          
 3rd Qu.:22.533   3rd Qu.:1.385930          
 Max.   :30.000   Max.   :6.484484          

t represents the time at which a participant’s mouse position is recorded.
y represents the participant’s rating of the given dimension.
name represents the unique participant. We have three: a, b, and c. 
We have 105 responses from participant A, 100 responses from participant B, and 115 responses from participant C. In a real experiment, we might see small differences in the number of samples for each participant (or large ones if the participants have different screen refresh rates).

Now we can look at the raw responses prior to any analysis. We’ll use the ggplot2 library to do this. It is part of a larger package called tidyverse, which will come in handy later.

# install.packages('tidyverse') # if not already installed.  
library(tidyverse)  

# pass ggplot our dataset. We'll plot the time on x axis, response on y
# group and colour points by participant
ggplot(sim1, aes(x=t, y=y, group = name, colour = name))+   
geom_point()+   # add points
geom_smooth()+  # add loess curve
labs(colour = 'participant')+   
theme_classic()

Fig 1

This scatter plot depicts time periods on the x axis and responses on the y axis. LOESS (Locally Estimated Scatter plot Smoothing) curves show local changes in each participant’s responses.

Functions for calculating moving averages

I wrote two separate functions to compute moving averages. The first function computes moving averages according to a user-defined argument called time_window. I set the default window (epoch) to 3 seconds.

In a second function, we’ll apply the first function to each rater’s data separately. These are both internal functions, meaning we’ll use them inside a larger function defined later. The period in front of their names hides the functions from R’s global environment.

Function 1:

.define_epochs = function(dat, time_window = 3) # user supplies dataframe and time epoch
{
  # define an array to place epoch values in:
  epoch_array = c()
  # find floor values divisible by 3-second window (floor rounds number down)
  round(dat$t) %% time_window == 0 -> time_floors 
  counter = 1
  
  # loop through indices of floor values:
  
  for(i in 1:length(time_floors))
  {
    # if first index, simply add current value of counter:
    
    if(i == 1)
    {
      epoch_array = append(epoch_array, counter)
    }
    # if next index value is different than current index and IS divisible by time window,
    # increment counter and add it 
    # (start a new epoch if we hit a 3-second interval):
    else if (time_floors[i] != time_floors[i-1] & time_floors[i] == TRUE)
    {
      # add counter value to array, then update counter
      counter = counter + 1
      epoch_array = append(epoch_array, counter) 
      
    }
    # if after first index and next index is the same, 
    # keep current counter value:
    else epoch_array = append(epoch_array, counter) 

    
  }
  dat$e = epoch_array # add epochs as new column in data
  return(dat)
}

Details: This was surprisingly tricky to code. The function creates a series of epochs according to the user-supplied value time_window. Then, for each time stamp in the column t, it rounds the value to a whole number, and checks what the remainder is (using the modulo operator %%) when dividing by time_window. Each time the remainder is 0, the function knows to start a new epoch. For example, 6%%3 = 0 and 9%%3 = 0, whereas 7%%3=1. The function will start a new epoch in the former cases, but not the latter case. Grouping responses into epochs allows us to average across responses in each epoch.

Next, we’ll apply the .define_epochs function separately to each participant’s data:

# this will split participant ratings into separate 3-second epochs
# loops through each participant, applying .define_epochs()

.make_participant_epochs = function(dat, time_window = 3) # input data
{
  
  # first, create empty data frame. we'll put the information back here later,
  # after exiting loop:
  summaryDataframe = data.frame()
  
  # subset out current group (imagine this is a participant)
  # we're going to take forward moving average across their time series
  
  for(thisSS in unique(dat$name)) {
    #   cat('looking at subset ', thisSS, '\n')
    subsetDat = subset(dat, name == thisSS)
    subsetDat = .define_epochs(subsetDat, time_window = time_window)
    
    # the data will be largely the same, but with 3-second epochs included.
    summaryDataframe = rbind(summaryDataframe, subsetDat)
    # LOOP ENDS, now we have a grouping variable with 3-second epochs.
  }
  return(summaryDataframe)
  
}

Finally, we can use these functions to apply the SSDT method to our time series responses.

SSDT Function

Now that we’re able to calculate moving averages for each participant, we can calculate SSDTs and create a useful dataframe with several summary statistics. To do so, we’ll make a new function that calls .make_participant_epochs along with a function that calculates SSDTs called .thresholdLvl (defined within). You can ask the function to calculate multiple thresholds by passing a vector of n values to the n_values argument. The function will create separate columns for each n you specify.

summarize_time_series = function(dat, y, n_values = c(0.5, 1, 2)) 
{
  dat$y = y
  dat <- .make_participant_epochs(dat)
  # for calculating threshold later (used at end)
  # equation 4 from schubert (2012) p. 355 (PoM):
  .thresholdLvl = function(dat, n = 1, bound = 'u') 
  {
    # define upper and lower thresholds:
    
    #tau_n_upper bound
    tau_n_u = dat$sd1_bar + (n*dat$sd2)
    #tau_n_lower bound
    tau_n_l = dat$sd1_bar - (n*dat$sd2)
    # conditioally return lowerbound based on "bound" argument:
    if(bound == 'l') return(tau_n_l)
    else return(tau_n_u)
  }
  
  
  dat %>% 
    group_by(name, e) %>%  # group by  participant, then epoch:
    summarize(t = max(t), # within each epoch, summarize the current time
              # as the maximum time in the epoch. First time will be 3s.
              y = mean(y)) -> dat # take the mean of the ratings
  # ***in this 3-second interval!!
  
  # now pass the same data forward and start summarizing it:
  dat %>%  
    # ungroup() %>% # remove grouping variable structure
    group_by(e) %>%  # now group by epoch, time (across participants z)
    mutate(m = mean(y), # take mean of time series
           mdn = median(y), # median of time series
           sd1 = sd(y) # sd of time series
    ) -> dat
  
  
  # now we'll start calculating equations from Schubert 2007 (pp. 354-55)
  
  dat %>% ungroup %>%  # now ungroup.
    mutate(sd1_bar = mean(sd1), # mean of SD1 (Equation 1)
           sd2 = sd(sd1)) -> dat # second order SD (Equation 2)
  
  
  # finally, we'll calculate significant thresholds.
  for(thisThreshold in n_values) # loop through tau values:
  {
    threshold_name = paste0('n', thisThreshold) # make column name
    # calculate threshold for this level of 'n'
    dat[,paste0(threshold_name,"_u")] = .thresholdLvl(dat, 
                                                      n = thisThreshold)
    dat[,paste0(threshold_name,"_l")] = .thresholdLvl(dat, 
                                                      n = thisThreshold, 
                                                      bound = 'l')
    # add column evaluating whether this is significant
    threshold_name_sig = paste0(threshold_name, '_sig')
    dat[,threshold_name_sig] = !dat$sd1 >= dat[,paste0(threshold_name, '_u')] &
      !dat$sd1 <= dat[,paste0(threshold_name, '_l')]
  } 
  return(dat)
}

Code Example

Now that we’ve defined and compiled the functions we need to calculate SSDTs, let’s put them to use!

# calculate a summary with forward moving averages
ssdt <- summarize_time_series(sim1, y=sim1$y)

We can take a look at the output with the following code:

str(ssdt)
tibble [33 × 18] (S3: tbl_df/tbl/data.frame)
 $ name    : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
 $ e       : num [1:33] 1 2 3 4 5 6 7 8 9 10 ...
 $ t       : num [1:33] 2.31 5.48 8.37 11.25 14.42 ...
 $ y       : num [1:33] 1.248 1.066 1.085 0.59 0.984 ...
 $ m       : num [1:33] 1.268 0.912 0.9 0.816 1.146 ...
 $ mdn     : num [1:33] 1.248 1.066 0.867 0.927 0.984 ...
 $ sd1     : num [1:33] 0.197 0.323 0.17 0.196 0.661 ...
 $ sd1_bar : num [1:33] 0.351 0.351 0.351 0.351 0.351 ...
 $ sd2     : num [1:33] 0.204 0.204 0.204 0.204 0.204 ...
 $ n0.5_u  : num [1:33] 0.453 0.453 0.453 0.453 0.453 ...
 $ n0.5_l  : num [1:33] 0.249 0.249 0.249 0.249 0.249 ...
 $ n0.5_sig: logi [1:33] FALSE TRUE FALSE FALSE FALSE TRUE ...
 $ n1_u    : num [1:33] 0.556 0.556 0.556 0.556 0.556 ...
 $ n1_l    : num [1:33] 0.147 0.147 0.147 0.147 0.147 ...
 $ n1_sig  : logi [1:33] TRUE TRUE TRUE TRUE FALSE TRUE ...
 $ n2_u    : num [1:33] 0.76 0.76 0.76 0.76 0.76 ...
 $ n2_l    : num [1:33] -0.0577 -0.0577 -0.0577 -0.0577 -0.0577 ...
 $ n2_sig  : logi [1:33] TRUE TRUE TRUE TRUE TRUE TRUE ...

There’s a lot of information packed in here, so let’s break it down:

name: data corresponding to the individual participant.
e: a number representing the current time epoch (default: every 3 seconds).
t: point in time series (in seconds).
y: participant’s continuous rating at this time point.
m: mean of all participants’ continuous rating in current epoch.
mdn: median of all participants’ continuous rating in current epoch.
sd1: standard deviation of all participants’ continuous rating in current epoch.
sd1_bar: mean of SD1.
sd2: standard deviation of SD1 (i.e., second-order standard deviation).
n[value]_u: upper bound of SSDT threshold for user-specified n.
n[value]_l: lower bound of SSDT threshold for user-specified n.
n[value]_sig: TRUE/FALSE values representing regions of significance according to SSDT method.

Plotting SSDTs

Now we can plot changes in the time series of all participants and assess significance.

n = 0.5

ggplot(ssdt, aes(x = t, y = sd1))+ # plot series
  geom_point(aes(color = n0.5_sig))+ # colour by significance threshold
  geom_smooth(se=F)+ # add LOESS curve for ensemble time series
  ggtitle(expression(n == 0.5)) + # add title
  scale_colour_manual(values = c('red', 'blue'),  # change colours
                      labels = c('Nonsignificant', 'Significant'))+ # add meaningful labels
  theme_classic()+ # theme
  geom_hline(yintercept = unique(ssdt$n0.5_l), # add threshold lower line
             linetype = 2)+
  geom_hline(yintercept = unique(ssdt$n0.5_u), # add threshold upper line
             linetype = 2) +
  labs(colour = 'Significance')

Fig 2 Plot depicting changes in standard deviation of ratings (y axis) over time (x axis). Dashed horizontal lines depict the upper and lower bounds of the significance threshold.

n = 1

Notice that increasing the significance threshold creates a wider interval and treats more observations as significant.

  ggplot(ssdt, aes(x = t, y = sd1))+
  geom_point(aes(color = n1_sig))+
  geom_smooth(se=F)+
  ggtitle(expression(n == 1)) +
  scale_colour_manual(values = c('red', 'blue'))+
  theme_classic()+
  geom_hline(yintercept = unique(ssdt$n1_l),
             linetype = 2)+
  geom_hline(yintercept = unique(ssdt$n1_u),
             linetype = 2) +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.line.y = element_blank(),
        legend.position = 'none')

Fig 3

Conclusion

This blog post has discussed Dr. Schubert’s method for evaluating significance in a time series with some accompanying R code from a research project I did as part of my PhD coursework. The full detail of this method is outlined in Dr. Schubert’s papers referenced below. I encourage you to read and engage with them and would be grateful for any feedback.

Thanks for reading!

References

Schubert, Emery. 2007. “When Is an Event in a Time-Series Significant.” In Proceedings of the Inaugural International Conference on Music Communication Science, 135–38.

———. 2013. “Reliability Issues Regarding the Beginning, Middle and End of Continuous Emotion Ratings to Music.” Psychology of Music 41 (3): 350–71.