GENEAclassifyDemo

Activinsights Ltd

17 June 2020

GENEAclassify

Overview

GENEActiv is the original wrist-worn, raw data accelerometer for objective behavioural measurement. It is the perfect tool for analysing free-living human behaviour, studying the impact of physical activity on health and understanding lifestyle. The device is an ergonomic body worn instrument:

The package GENEAread provides data import functionality, giving researchers access to cutting edge analytical tools from the R environment. Imported data can be summarised by a segmentation process which cuts the dataset into time periods of characteristically similar behaviour and calculates a wide range of features for each event. The activities in each segment can be evaluated by an rpart GENEA classification tree. A sample rpart GENEA classification tree, trainingFit, is provided with GENEAclassify. This package provides classification tools, allowing researchers to segment training data and create custom classification trees. For best results, you will need to collect some training data for the activities that you expect your users to perform, label the appropriate segments, and create a new classification tree. Training data is data captured by the GENEActiv accelerometer during expected behaviours of your study participants, such as sleeping, sitting or running. To train the classification tree, ask a sample of your participants to wear the accelerometer and perform specific activities. These can be used to classify field data into behaviours of interest, to automatically process raw output into complete diary histories.

Summary

There are multiple ways in which GENEAclassify can be used to understand your GENEActiv data. The analysis flow is typically:

Contents

  1. Introduction and Installation.
    1.    Preface
    2.   Installing R
    3.  Using GENEAclassifiyDemonstration.R 
    4.   Installing and loading required libraries
    5.    Installing GENEAclassify
    6.   Development of GENEAclassify on GitHub
  2. Segmentation
    1.    Introduction
    2.   Loading Data
    3.  Segmenting Data
    4.   Segmentation Variables, Functions and Features
    5.    Varying Step Counting Algorithms
  3. Applying a Classification Model
    1.    Introduction
    2.   Creating a classification model form Training Data
    3.  Classifying a file
    4.   Classifying a directory
  4. Creating a Classification Model
    1.    Introduction
    2.   Manually Classifying files
    3.  Creating a Training Data set

1. Introduction and Installation.

i. Preface

This pdf file will give an introduction to using the programming language R with the package GENEAclassify which has been provided in a zip folder. The following steps will provide the user with the tools to use the package before running through the script. Please ensure that the folder has been decompressed. The folder found from the Dropbox link should contain the following:

ii. Installing R.

To begin with install R from https://www.r-project.org. There is an introduction to the R environment here https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf that would familiarize a user. We would also recommend downloading the IDE (integrated development environment) RStudio from https://rstudio.com/products/rstudio/ after you have installed R. RStudio provides the user with more than the console to work with and gives the option of having a script, console, view of the R environment and file locations in one window. There is a list of tips here on using RStudio here https://rstudio.com/resources/cheatsheets/.

Ctrl-R or Cmd-Ent runs the line that the cursor is on or you can simply copy and paste the line of code into the console

Note: (You will also need to install x11 forward https://www.xquartz.org/ to run on OS.)

iii. Using GENEAclassifiyDemonstration.R

Throughout this tutorial, commands are shown and briefly explained which are to be entered into the console. If you open the script GENEAclassifyDemostration.R (which is in the zip folder) you will find a detailed and commented script that you can work through, running each line at a time and making appropriate changes to get the results desired. This pdf runs through that script giving further explanation. Please remember that R is a case sensitive language.

The script provided will run through these steps:
  1. Installing and loading required libraries
  2. Installing GENEAclassify
  3. Loading in a data file/directory to segment
  4. Loading a training data set
  5. Creating the classification model from the Training Data
  6. Classifying a file
  7. Classifying a directory
  8. Setting up the step counting algorithm
  9. Varying Step Counting algorithms
  10. Manually Classifying files
  11. Creating a Training data set

The code shown in this PDF can also be copied and pasted into the console.

iv. Installing and loading required libraries

v. Installing GENEAclassify

Whilst GENEAclassify is in development, the easiest way to install the package is to use the Tar.gz file inside the zip folder. By running the code below GENEAclassify can be installed:

Once the package has been installed load in the library:

vi. Development of GENEAclassify on GitHub.

If you intend on working with the development of the package then we suggest setting up an account on GitHub here https://github.com/. RStudio can directly link to the repository for the development of the package by selecting to set-up a new project from the top right hand corner, selecting version control and cloning the GitHub repository.

This guide on using RStudio with GitHub is helpful https://www.r-bloggers.com/2015/07/rstudio-and-github/.

Once GitHub has been set-up we would recommend creating a personal branch for contributions which can be assessed and discussed by Activinsights before adding any changes to the master repository.

To use GitHub for development on windows, R tools will have to be downloaded from this link:

and a latex compiler found here:

For OS, xcode developer tools will have to be downloaded from this link:

and a latex compiler found here:

For more information go to https://www.activinsights.com/.

The package can also be installed using a GitHub authentication key which will go in the "" of auth_token. The key will be provided on request. The package devtools is also required to install from GitHub:

Again loading in the package to the work space:

This vignette can be viewed from inside R by running the following code:

The pdf will appear on the right of RStudio or as a pop up if called from R.

2. Segmentation

i. Introduction

The segmentation process outputs event based data from a change point analysis. The function determines when the statistical properties of the data have changed and hence the observed behaviour has also changed. This following section gives demonstrations on how this works given the GENEActiv .bin input data.

ii. Loading Data

Now that we have the libraries required to segment and classify files/directories the data needs to be imported. Beginning with a file to import run the following lines of code:

The start and end times can be set using values between 0 and 1 or using a 24 hour character string (time inside "“). The former divides the file into sections specified. For example if you have 10 days of data this might be useful. A 24 hour character string e.g start =”1 3:00“,end =”2 3:00".The 1 represents the day and the time uses a 24 hour format. Ensure you leave a space between the days and the time.

The parameter ‘Use.Timestamps’ can be set to TRUE to use timestamps as the start and end times within dataImport. This parameter can be used in getGENEAsegments and classifyGENEA.

The output from the command head(ImportData) shows the variables calculated from importing the data.

The variable Downsample gives the user the option to compress the data to make the process less computationally heavy. This has a default value of 100 but can be made smaller to allow a higher resolution, although this will take longer to run.

iii. Segmenting Data

The segmentation function in GENEAclassify works by finding changepoints in one or two selected streams on data by identifying differences in mean or variances across varying segment durations.

After loading this data, the segmentation can be applied. There are a number of methods for change point analysis within the package and the variable changepoint controls which analysis to perform. Some of these methods combine two methods and the analysis uses the function cpt.mean, cpt.var and cpt.meanvar from the package changepoint on both datasets before merging the two.

iv. Segmentation Variables, Functions and Features

Once a segment has been identified, the variables can be summarised using different functions to create features. For example, the mean of UpDown variable can be found and reported as a single numeric to become a feature of the segment. Within segmentation, the dataCols variable is a character vector that specifies what summary features are to be output for each segment. The format of each individual element of this vector has to be the variable name followed by the function applied to the variable. For example “UpDown.mean” will output the mean of the UpDown variable for each segment.

Variables that can be assessed with functions include: - UpDown (arm elevation) - Degrees (wrist rotation) - Magnitude (vector magnitude of acceleration) - Principal.Frequency (frequency domain analysis of acceleration) - Light (light meter) - Temp (temperature sensor) - Step (step internal counter) - Radians (If Radians = TRUE is selected)

These variables can be assessed with a range of standard R functions, and typical examples include: - mean - var - sd - max

However, any function that is loaded into the environment of R when using GENEAclassify can be used if it accepts a vector input and returns a single numeric. Functions returning other objects will cause an error.

Inside GENEAclassify there is also a range of custom functions: - GENEAratio (calculates the ratio of signal energies around a defined frequency) - GENEAskew (skewness, a measure of centredness) - sumdiff (finds the sum of the differences between samples) - meandiff (finds the mean of the differences between samples) - abssumdiff (finds the absolute sum of the differences between samples) - sddiff (finds the standard deviation of the differences between samples) - MeanDir (circular mean direction for radians) - CirVar (circular variance for radians) - CirSD (circular sd for radians) - CirDisp (circular dispersion for radians) - CirSkew (circular skewness for radians) - CirKurt (circular kurtosis for radians) - impact (calculates the proportion of samples above a defined absolute magnitude)

To find more information on these functions use the ? before the function in question. For example ?GENEAskew will provide details on that function in the help window of RStudio or as a pop-up.

The output of the function is created by taking raw data and returning calculated variables. These variables can be viewed using the function head:

getGENEAsegments combines the functions dataImport and segmentation:

v. Varying Step Counting Algorithms

The segmentation function also applies a default step counting algorithm when no arguments are passed through the function. The step counting algorithm works by taking the y axis, filtering the signal with a chebyshev filter, applying a hysteresis threshold where the zero crossing are counted over a given window.

There are then 4 separate variables, shown with their defaults that can be changed in the Step Counter function:

The filter order, boundaries and Rp are found in the cheby1 function from the package signal and are applied to the acceleration signal before a hysteresis is a applied.

To view all of the arguments that can be passed to the function stepCounter inside getGENEAsegments run the line ?stepCounter.

The following commands give examples from the training data provided:

3. Applying a Classification Model

i. Introduction

Once the data has been segmented a classification model can be used to classify each segment as an activity.

A classification model takes a set of training data that has been classified previously to form a decision tree using the rpart package and function, given the features from the segmentation function. This model can then be applied to the segmented data to classify individual behaviours/activities provided by the training data set.

ii. Creating a classification model from Training Data

There is a .csv file that contains a training data set located inside the zip folder, called TrainingData.csv. This model contains a comprehensive amount of classified data which can be used to create a classification model. To load the data in, please use the following lines:

Now the Training Data can be used to create a classification model. All of the features have been listed here but some can be removed to refine the model:

By removing the features Segment.Duration, Light.mean, Temp.mean and Step.Count an improved model can be created. These features have been removed because of ambiguity when making decisions on what activity a segment is:

Once the model has been created files can be classified using the function classifyGENEA.

iii. Classifying a file

The function classifyGENEA segments a file/directory and uses the classification model provided to classify each segment as an activity. Select a .bin file to classify and run the following lines. The start and end times work the same as the function getGENEAsegments:

iv. Classifying a directory

To classify a directory the DataDirectory has to be selected one day for every data file in the data directory:

4. Creating a Classification Model

i. Introduction

There are two ways to classify files: automatically using a classification model or manually. To manually classify a file in R a list can be created for each segment then added to the data in the environment. Taking the run walk file provided in the zip folder which contains raw data of someone running then walking.

ii. Manually Classifying files

Using the default step counting parameters to segment the data and then view the output variables using the function head:

Listing the activities chronologically with respect to the segments shown give:

Or by classifying each row individually:

iii. Creating a Training Data set

A Training Data set that has been manually classified, can be used to create a Training model which can automatically classify files.

To do this, the activities that are going to be identified must feature in the training model. Below is a demonstration of how to create a classification model by using the sample training data provided in the zip file.

Running the following lines of code segments each of the .bin files in the sample training data. The second line manually classifies each of the activities which can be used to create the training model. The sample training data has been organised so that the .bin files in each sub folder only contain the activity named:

This provides the data required for the classification model. Combining all of these files together using the function rbind to form the training data:

Creating the classification model from this data using the commands from 3ii: