GGIR is an R-package to process multi-day raw accelerometer data for physical activity and sleep research. GGIR will write all output files into two sub-directories of ./meta and ./results. GGIR is increasingly being used by a number of academic institutes across the world.
postGGIR is an R-package to data processing after running GGIR for accelerometer data. In detail, all necessary R/Rmd/shell files were generated for data processing after running GGIR for accelerometer data. Then in part 1, all csv files in the GGIR output directory were read, transformed and then merged. In part 2, the GGIR output files were checked and summarized in one excel sheet. In part 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. In part 4, the cleaned activity data was imputed by the average ENMO over all the valid days for each subject. Finally, a comprehensive report of data processing was created using Rmarkdown, and the report includes few explortatory plots and multiple commonly used features extracted from minute level actigraphy data in part 5-7. This vignette provides a general introduction to postGGIR.
The R package postGGIR has been released with an open-source GPL-3 license on CRAN, and postGGIR can run on Windows and Linux. Parallel computing in Linux is recommended due to the memory requirements associated with reading in multiple of the large data files. The package contains one primary function for users which, when run, generates all necessary R/R Markdown/shell executable files for data processing after running GGIR for accelerometer data; load, read, transform and merge long activity data; examine and summarize GGIR outputs; clean the merged activity data according to the number of valid hours per night and the number of valid days per subject; activity data imputation by taking the average across the valid days for each subject; build a comprehensive report of data processing and exploratory plots; extract multiple commonly used features and study feature structure by the covariance decomposition. Figure 1 presents a flowchart for each step in this process which is described in greater detail below. The procedure, R functions, inputs, and outputs are all described in this package vignette. In addition, more documentation and example data could be found in postGGIR repository on GitHub (URL: https://github.com/dora201888/postGGIR).
Mirroring the GGIR structure of processing individual data files in multiple parts, the postGGIR package is split into seven parts, grouping functionalities in logical processing order. The parts are numbered from 1 to 7. Parts 1 to 4 are dedicated to data processing. Parts 5 to 7 are dedicated to producing R Markdown reports of data cleaning, feature extraction, and unsupervised covariance decomposition via the joint and individual variance explained (JIVE) method, respectively. These seven parts are carried out sequentially with milestone data automatically being saved locally. To use postGGIR, the first step for users is to install and load the postGGIR package. Then, users run the create.postGGIR() function which creates a single R script. The newly created R script, Studyname_part0.maincall.R, is then edited by users, allowing for the specification of arguments relevant for each of the seven parts. All optional arguments and their defaults are described in the package vignette. In addition, for users with access to a cluster for parallel processing, a shell function, named as part9_swarm.sh is created which can parallelizes the processing of individual files with minor modifications by the user. These modifications are described in the package vignette. Computationally, part 1 is the most time-consuming task, taking up at least 60% of the processing time, which the activity data in .csv format was transformed and merged. Generally, part 1 takes about 10~30 minutes to process a file with 14 days of data recorded at 30 Hz on a GeneActiv device in processor cores of 36 x 2.3 GHz (Intel Gold 6140). All output created for each part is described in the package vignette. Briefly, part 1 and part 2 output are saved using a directory structure with a depth of two, containing output data and summary for all participants. The reports for parts 5 to 7 are saved in .html format and are generated using R Markdown (.Rmd) files. These .Rmd files are included in the output, users the flexibility to adapt the source code to their research purpose.
Figure 1: Overview of main steps and output in postGGIR workflow.
All postGGIR code is written in R and reports generated in R Markdown. The R packages ActFrag and ActCR are used for the calculation of certain physical activity and circadian rhythmicity features. The R package r.jive is used to perform the feature interaction analysis and to study the joint and individual variation structure by JIVE.
Download and install RStudio (optional, but recommended)
Download GGIR with its dependencies, you can do this with one command from the console command line:
install.packages("postGGIR", dependencies = TRUE)
library(postGGIR)
create.postGGIR()
The function will create a template shell script of postGGIR in the current directory, names as STUDYNAME_part0.maincall.R.
cat STUDYNAME_part0.maincall.R
options(width=2000)
= commandArgs(TRUE);
argv print(argv)
print(paste("length=",length(argv),sep=""))
<-as.numeric(argv[1])
modeprint(c("mode =", mode))
# (Note) Please remove the above lines if you are running this within R console
# instead of submitting jobs to a cluster.
#########################################################################
# (user-define 1) you need to redefine this according different study!!!!
#########################################################################
# example 1
.1<-function(x) unlist(strsplit(y1,"\\."))[1]
filename2id
# example 2 (use csv file =c("filename","ggirID"))
.2<-function(x) {
filename2id<-read.csv("./postGGIR/inst/example/filename2id.csv",head=1,stringsAsFactors=F)
d<-which(d[,"filename"]==x)
y1if (length(y1)==0) stop(paste("Missing ",x," in filename2id.csv file",sep=""))
if (length(y1)>=1) y2<-d[y1[1],"newID"]
return(as.character(y2))
}
#########################################################################
# main call
#########################################################################
<-function(mode,filename2id=filename2id.1){
call.afterggir
library(postGGIR)
#########################################################################
# (user-define 2) Fill in parameters of your ggir output
##########################################################################
=
currentdir =
studyname =
bindir =
outputdir setwd(currentdir)
=FALSE # keep all subjects in postGGIR
rmDup=c(50,100,400)
PA.threshold="WW_L50M125V500_T5A5"
part5FN= 5
epochIn = 5
epochOut = 60
flag.epochOut = FALSE
use.cluster = 9250
log.multiplier = 7
QCdays.alpha = 16
QChours.alpha <-NULL
useIDs.FN="R"
Rversion="US/Eastern"
desiredtz=FALSE
RemoveDaySleeper=part5FN,
part5FN=20
NfileEachBundle=FALSE
trace#########################################################################
# remove duplicate sample IDs for plotting and feature extraction
#########################################################################
if (mode==3 & rmDup){
# step 1: read ./summary/*remove_temp.csv file (output of mode=2)
<-TRUE #keep the latest visit for each sample
keep.last<-paste(currentdir,"/summary",sep="")
sumdirsetwd(sumdir)
<-paste(studyname,"_samples_remove_temp.csv",sep="")
inFN<-paste(sumdir,"/",studyname,"_samples_remove.csv",sep="")
useIDs.FN
#########################################################################
# (user-define 3 as rmDup=TRUE) create useIDs.FN file
#########################################################################
# step 2: create the ./summary/*remove.csv file manually or by R commands
<-read.csv(inFN,head=1,stringsAsFactors=F)
d<-d[order(d[,"Date"]),]
d<-d[order(d[,"newID"]),]
dwhich(is.na(d[,"newID"])),]
d[<-duplicated(d[,"newID"],fromLast=keep.last) #keep the last copy for nccr
S"duplicate"]<-"remove"
d[S,write.csv(d,file=useIDs.FN,row.names=F)
}
#########################################################################
# call afterggir
#########################################################################
setwd(currentdir)
afterggir(mode=mode,
useIDs.FN=useIDs.FN,
currentdir=currentdir,
studyname=studyname,
bindir=bindir,
outputdir=outputdir,
epochIn=epochIn,
epochOut=epochOut,
flag.epochOut=flag.epochOut,
log.multiplier=log.multiplier,
use.cluster=use.cluster,
QCdays.alpha=QCdays.alpha,
QChours.alpha=QChours.alpha,
QCnights.feature.alpha=QCnights.feature.alpha,
Rversion=Rversion,
filename2id=filename2id,
PA.threshold=PA.threshold,
desiredtz=desiredtz,
RemoveDaySleeper=RemoveDaySleeper,
part5FN=part5FN,
NfileEachBundle=NfileEachBundle,
trace=trace)
} #########################################################################
call.afterggir(mode)
#########################################################################
# Note: call.afterggir(mode)
# mode = 0 : creat sw/Rmd file
# mode = 1 : data transform using cluster or not
# mode = 2 : summary
# mode = 3 : clean
# mode = 4 : impu
Three places were marked as “user-define” and need to be edited by user in the STUDYNAME_part0.maincall.R file. Please rename the file by replacing your real studyname after the edition.
This user-defined function will change the filename of the raw accelerometer file to the short ID. For example, the first example change “0002__026907_2016-03-11 13-05-59.bin” to new ID of “0002”. If you prefer to define new ID by other way, you could create a .CSV file including “filename” and “newID” at least and then defined this function as the second example. The new variable of “newID”, included in the output files, could be used as the key ID in the summary report of postGGIR and be used to define the duplicate samples as well.
User needs to define the following parameters as follows,
Variables | Description |
---|---|
rmDup | Set rmDup = TRUE if user want to remove some samples such as duplicates. Set rmDup = FALSE if user want to keep all samples. |
mode | Specify which of the five parts need to be run, e.g. mode = 0 makes that all R/Rmd/sh files are generated for other parts. When mode = 1, all csv files in the GGIR output directory were read, transformed and then merged. When mode = 2, the GGIR output files were checked and summarized in one excel sheet. When mode = 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. When mode = 4, the cleaned data was imputed. |
useIDs.FN | Filename with or without directory for sample information in CSV format, which including “filename” and “duplicate” in the headlines at least. If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part 5-7. Defaut is NULL, which makes all accelerometer files will be used in part 5-7. |
currentdir | Directory where the output needs to be stored. Note that this directory must exist. |
studyname | Specify the study name that used in the output file names |
bindir | Directory where the accelerometer files are stored or list |
outputdir | Directory where the GGIR output was stored. |
epochIn | Epoch size to which acceleration was averaged (seconds) in GGIR output. Defaut is 5 seconds. |
epochOut | Epoch size to which acceleration was averaged (seconds) in part1. Defaut is 5 seconds. |
flag.epochOut | Epoch size to which acceleration was averaged (seconds) in part 3. Defaut is 60 seconds. |
log.multiplier | The coefficient used in the log transformation of the ENMO data, i.e. log( log.multiplier * ENMO + 1), which have been used in part 5-7. Defaut is 9250. |
use.cluster | Specify if part1 will be done by parallel computing. Default is TRUE, and the CSV file in GGIR output will be merged for every 20 files first, and then combined for all. |
QCdays.alpha | Minimum required number of valid days in subject specific analysis as a quality control step in part2. Default is 7 days. |
QChours.alpha | Minimum required number of valid hours in day specific analysis as a quality control step in part2. Default is 16 hours. |
QCnights.feature.alpha | Minimum required number of valid nights in day specific mean and SD analysis as a quality control step in the JIVE analysis. Default is c(0,0), i.e. no additional data cleaning in this step. |
Rversion | R version, eg. “R/3.6.3”. Default is “R”. |
filename2id | User defined function for converting filename to sample IDs. Default is NULL. |
PA.threshold | Threshold for light, moderate and vigorous physical activity. Default is c(50,100,400). |
desiredtz | desired timezone: see also https://en.wikipedia.org/wiki/Zone.tab. Used in g.inspectfile(). Default is “US/Eastern”. |
RemoveDaySleeper | Specify if the daysleeper nights are removed from the calculation of number of valid days for each subject. Default is FALSE. |
part5FN | Specify which output is used in the GGIR part5 results. Defaut is “WW_L50M125V500_T5A5”, which means that part5_daysummary_WW_L50M125V500_T5A5.csv and part5_personsummary_WW_L50M125V500_T5A5.csv are used in the analysis. |
NfileEachBundle | Number of files in each bundle when the csv data were read and processed in a cluster. Default is 20. |
trace | Specify if the intermediate results is printed when the function was executed. Default is FALSE. |
The postGGIR package not only simply transform/merge the activity and sleep data, but it also can do some prelimary data analysis such as principle componet analysis and feature extraction. Therefore, the basic data clean will be processed first as follows,
If you prefer to use all samples, just skip this part and use rmDup=FALSE
as the default. Otherwise, if you want to remove some samples such as duplicates, there are two ways as follows,
call.afterggir(mode,filename2id)
Variables | Description |
---|---|
mode | Specify which of the five parts need to be run, e.g. mode = 0 makes that all R/Rmd/sh files are generated for other parts. When mode = 1, all csv files in the GGIR output directory were read, transformed and then merged. When mode = 2, the GGIR output files were checked and summarized in one excel sheet. When mode = 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. When mode = 4, the cleaned data was imputed. |
filename2id | This user-defined function will change the filename of the raw accelerometer file to the short ID for the purpose of identifying duplicate IDs. |
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
~/.bash_profile
source
/postGGIR/inst/example/afterGGIR;
cd
module load R ; --no-save --no-restore --args < studyname_ggir9s_postGGIR.pipeline.maincall.R 0
R --no-save --no-restore --args < studyname_ggir9s_postGGIR.pipeline.maincall.R 1
R --no-save --no-restore --args < studyname_ggir9s_postGGIR.pipeline.maincall.R 2
R --no-save --no-restore --args < studyname_ggir9s_postGGIR.pipeline.maincall.R 3
R --no-save --no-restore --args < studyname_ggir9s_postGGIR.pipeline.maincall.R 4
R
-e "rmarkdown::render('part5_studyname_postGGIR.report.Rmd' )"
R -e "rmarkdown::render('part6_studyname_postGGIR.nonwear.report.Rmd' )"
R -e "rmarkdown::render('part7a_studyname_postGGIR_JIVE_1_somefeatures.Rmd' )"
R -e "rmarkdown::render('part7b_studyname_postGGIR_JIVE_2_allfeatures.Rmd' )"
R -e "rmarkdown::render('part7c_studyname_postGGIR_JIVE_3_excelReport.Rmd' )" R
The functions of part 1 read the activity data measured by Euclidian norm minus one (ENMO), a rotationally invariant measure of volume of acceleration, in long csv-spreadsheets and then the data was merged and transformed into a square matrix for all subjects. Depending on the time zone on which the devices were initialized, a day may have either 23 or 25 due to daylights savings time. On daylight savings crossovers, ENMO is averaged for duplicate timestamps between 1:00 AM and 2:00 AM, allowing for straightforward comparisons to standard 24-hour days. In addition to recording ENMO at each epoch, the angle of the z-axis (ANGLEZ) relative to the horizontal plane (degrees), used for estimating sleep periods, is merged and saved into an excel file. Additional data recorded by certain device, including the light mean, light peak, temperature mean, clipping score, and the Euclidian norm metric (EN) if available are merged with the ENMO data and saved under the ./data directory for part 1.
The activity data in the GGIR output is formatted in long csv-spreadsheets as follows,
timestamp | ENMO | anglez |
2017-11-30T00:00:00+0100 | 8e-04 | -32.5758 |
2017-11-30T00:00:05+0100 | 0.0198 | -25.5726 |
2017-11-30T00:00:10+0100 | 0.0177 | 3.7972 |
2017-11-30T00:00:15+0100 | 0.0118 | 6.7154 |
2017-11-30T00:00:20+0100 | 0.0106 | 10.0357 |
2017-11-30T00:00:25+0100 | 0.0341 | 21.0143 |
2017-11-30T00:00:30+0100 | 0.1708 | 19.5008 |
…… | …… | …… |
2017-11-30T23:59:55+0100 | 0.1504 | -0.596 |
Each row represents the corresponding ENMO and ANGLEZ values at a timestamp per 5 seconds epoch, which is specified by GGIR parameter (windowsizes) when running GGIR. After running part 1, the ENMO and ANGLEZ data are transformed into wide matrix in which each row represents 24 hours data for a day. For example, the ENMO data is formated as follows,
Date | 0:00:00 | 0:00:05 | 0:00:10 | 0:00:15 | 0:00:20 | 0:00:25 | 0:00:30 | …… | 23:59:55 |
11/30/2017 | 8.00E-04 | 0.0198 | 0.0177 | 0.0118 | 0.0106 | 0.0341 | 0.1708 | …… | 0.1504 |
Finally, the data was merged for all days and all subjects.
The functions of part 2 introduce descriptive variables of all accelerometer files in the GGIR output. In part 2, an excel file was output under the ./summary directory, which includes nine pages as follows, 1) List of files in the GGIR output (2) Summary of numbers of output files (3) List of duplicate IDs (4) ID errors (5) Number of valid days (6) Table of number of valid/missing days (7) Missing pattern (8) Frequency of the missing pattern (9) Description of all accelerometer files. Multiple plots were generated in a pdf file including the number of valid hours, days, missing pattern, etc. Technically, the raw accelerometer data was visited to obtain a complete input list for the purpose of examining missingness of GGIR output when user specified the path. Additionally, the summary results from GGIR output were also visited here to form a comprehensive data quality report in the part 2 of postGGIR.
The functions of part 3 introduce ‘flag’ variables for data cleaning of the merged ENMO and ANGLEZ data. By default, days with more than 16 hours were marked as valid days and subjects with more than 7 valid days were marked as valid samples. User can set these two parameters in the call to the main function (QCdays.alpha = 7, QChours.alpha = 16). Further, the data can be aggregated into minute-level or hour-level data as required by users. In part 4, ENMO data during estimated non-wear periods were imputed by taking the subject-level mean over all the valid days for each subject at that time. All output is saved under the ./data directory. In the output file, the following description variables are included: (1) number of valid hours; (2) missing pattern for each subject; (3) non-wear time in minutes; (4) an indicator variable to indicate if the visit should be removed for having multiple visits for some subjects which might lead to invalidity of independent and identically distribution in statistics and (5) the number of missing values after imputation. When the number of missing values after imputation is not zero, it means the activity data is missing on same timestamps among all days and therefore could not be imputed, therefore, such samples would be removed when systematic missingness was observed.
The functions of part 5 generate a comprehensive report in .html format. The report includes data quality checks and an exploratory data analysis using valid days of data. First, the numbers and missingness of GGIR output files are summarized for all GGIR parts. Second, duplicate samples were checked and marked, where the duplication might be caused by having multiple visits for some samples but only one visit will be kept in the data analysis such as functional principal component analysis (FPCA). Third, as shown in Figure 1, data quality is presented visually using the number of valid days, non-wear time, and missing data pattern. As an exploratory analysis, the data correlation, and the output of FPCA analysis was plotted in the report.
Estimated non-wear periods are loaded from the \(M\$metalong\$nonwearscore\) variable of the R data that was stored in the folder of /meta/basic of the GGIR output, which generate the matrix to clarify when data was imputed for each long epoch time window and the reason for imputation. This function will generate a non-wear matrix at minute level, coded as 0/1 for wear/non-wear time. This function will generate a non-wear matrix at minute level, and it could be skipped if user chose to use the imputation data in the JIVE application as default.
In part 7, 88 features were extracted from minute level activity data of three domains of sleep, physical activity, and circadian rhythmicity, which were based on outputs from GGIR v2.4.0 and calculated by R ActFrag and ActCR packages. The standard deviation across days on each subject was also created for each feature. The weekday and weekend specific features were extracted as well since most features in the sleep and physical activity showed significant difference between weekdays and weekends. In brief, sleep domain referred to sleep duration, midpoint, efficiency, etc. Physical activity referred to daily motor activity such as sedentary behavior, light, and moderate-to-vigorous physical activity (MVPA). Circadian rhythms were natural rhythms that regulates the sleep-wake cycle within every 24 hours. For example, the cosinor curve and FPCA analysis were used in modeling of biological rhythms. A comprehensive list of all features could be found in the supplementary table and the detailed definition could be found in the GGIR manual and Di et al.’s publication in 2019.
Output | Description |
---|---|
part1_data.transform.R (use.cluster=TRUE, optional) | R code for data transformation and merge for every 20 files in each partition. When the number of .bin files is large ( > 1000), the data merge could take long time, user could split the job and submit the job to a cluster for parallel computing. |
part1_data.transform.sw (use.cluster=TRUE, optional) | Submit the job to a cluster for parallel computing |
part1_data.transform.merge.sw (use.cluster=TRUE, optional) | Merge all partitions for the ENMO and ANGLEZ data |
part5_studyname_postGGIR.report.Rmd | R markdown file for generate a comprehensive report of data processing and explortatory plots. |
part6_studyname_postGGIR.nonwear.report.Rmd | R markdown file for generate a report of nonwear score. |
part7a_studyname_postGGIR_JIVE_1_somefeatures.Rmd | Extract some features from the actigraphy data using R |
part7b_studyname_postGGIR_JIVE_2_allfeatures.Rmd | Extract other features from the GGIR output and merge all features together |
part7c_studyname_postGGIR_JIVE_3_runJIVE.Rmd | Perform JIVE Decomposition for All Features using r.jive |
part7d_studyname_postGGIR_JIVE_4_somefeatures_weekday.Rmd | Extract some weekday/weekend specific features from the actigraphy data using R |
part9_swarm.sh | shell script to submit all jobs to the cluster
Output | Description |
---|---|
studyname_filesummary_csvlist.csv | File list in the ./csv folder of GGIR |
studyname_filesummary_Rdatalist.csv | File list in the ./basic folder of GGIR |
All_studyname_ANGLEZ.data.csv | Raw data of ANGLEZ after merge |
All_studyname_ENMO.data.csv | Raw data of ENMO after merge |
nonwearscore_studyname_f0_f1_Xs.csv | Data matrix of nonwearscore |
nonwearscore_studyname_f0_f1_Xs.pdf | Plots for nonwearscore |
plot.nonwearVSnvalidhours.csv | Nonwear data for plot |
plot.nonwearVSnvalidhours.pdf | Nonwear plots |
lightmean_studyname_f0_f1_Xs.csv | Data matrix of lightmean |
lightpeak_studyname_f0_f1_Xs.csv | Data matrix of lightpeak |
temperaturemean_studyname_f0_f1_Xs.csv | Data matrix of temperaturemean |
clippingscore_studyname_f0_f1_Xs.csv | Raw data of clippingscore |
EN_studyname_f0_f1_Xs.csv | Data matrix of EN |
f0 and f1 are the file index to start and finish with
Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output
Main input files
+ ./data/All_studyname_ENMO.data.csv
+ GGIR results: part2, part4 and part5 (please specify \(part5FN\) in the main function)
+ GGIR raw data when bindir was specified
Command = call.afterggir(mode=2)
Output folder = ./summary
Output | Description |
---|---|
studyname_ggir_output_summary.xlsx | Description of all accelerometer files in the GGIR output. This excel file includs 9 pages as follows, (1) List of files in the GGIR output (2) Summary of files (3) List of duplicate IDs (4) ID errors (5) Number of valid days (6) Table of number of valid/missing days (7) Missing patten (8) Frequency of the missing pattern (9) Description of all accelerometer files. |
part2daysummary.info.csv | Intermediate results for description of each accelerometer file. |
studyname_ggir_output_summary_plot.pdf | Some plots such as the number of valid days, which were included in the part2a_studyname_postGGIR.report.html file as well. |
studyname_samples_remove_temp.csv | Create studyname_samples_remove.csv file by filling “remove” in the “duplicate” column in this template. If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part5. |
Output | Description |
---|---|
flag_All_studyname_ANGLEZ.data.Xs.csv | Adding flags for data cleaning of the raw ANGLEZ data |
flag_All_studyname_ENMO.data.Xs.csv | Adding flags for data cleaning of the raw ENMO data |
IDMatrix.flag_All_studyname_ENMO.data.60s.csv | ID matrix |
*Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output
Output | Description |
---|---|
impu.flag_All_studyname_ENMO.data.60s.csv | Imputation data for the merged ENMO data, and the missing values were imputated by the average ENMO over all the valid days for each subject. |
Variable | Description |
---|---|
filename | accelerometer file name |
Date | date recored from the GGIR part2.summary file |
id | IDs recored from the GGIR part2.summary file |
calender_date | date in the format of yyyy-mm-dd |
N.valid.hours | number of hours with valid data recored from the part2_daysummary.csv file in the GGIR output |
N.hours | number of hours of measurement recored from the part2_daysummary.csv file in the GGIR output |
weekday | day of the week-Day of the week |
measurementday | day of measurement-Day number relative to start of the measurement |
newID | new IDs defined as the user-defined function of filename2id(), e.g. substrings of the filename |
Nmiss_c9_c31 | number of NAs from the 9th to 31th column in the part2_daysummary.csv file in the GGIR output |
missing | “M” indicates missing for an invalid day, and “C” indicates completeness for a valid day |
Ndays | number of days of measurement |
ith_day | rank of the measurementday, for example, the value is 1,2,3,4,-3,-2,-1 for measurementday = 1,…,7 |
Nmiss | number of missing (invalid) days |
Nnonmiss | number of non-missing (valid) days |
misspattern | indicators of missing/nonmissing for all measurement days at the subject level |
RowNonWear | number of columnns in the non-wearing matrix |
NonWearMin | number of minutes of non-wearing |
remove16h7day | indicator of a key qulity control output. If remove16h7day=1, the day need to be removed. If remove16h7day=0, the day need to be kept. |
duplicate | If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part5. |
ImpuMiss.b | number of missing values on the ENMO data before imputation |
ImpuMiss.a | number of missing values on the ENMO data after imputation |
KEEP | The value is “keep”/“remove”, e.g. KEEP=“remove” if remove16h7day=1 or duplicate=“remove” or ImpuMiss.a>0 |
Output | Description |
---|---|
part5_studyname_postGGIR.report.html | A comprehensive report of data processing and explortatory plots. |
Folder | Output | Description |
---|---|---|
./ | part6_studyname_postGGIR.nonwear.report.html | A report of nonwear score. |
./data | JIVEraw_nonwearscore_studyname_f0_f1_Xs.csv | Imputation data matrix of nonwearscore (1/0) |
./data | JIVEimpu_nonwearscore_studyname_f0_f1_Xs.csv | Data matrix of nonwearscore (1/0/NA) |
f0 and f1 are the file index to start and finish with
Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output
Output | Description |
---|---|
part7_studyname_all_features_dictionary.xlsx | Description of features |
part7a_studyname_postGGIR_JIVE_1_somefeatures.html | Extract some features from the actigraphy data using R |
part7a_studyname_some_features_page1_features.csv | List of some features |
part7a_studyname_some_features_page2_face_day_PCs.csv | Function PCA at the day level using fpca.face( ) |
part7a_studyname_some_features_page3_face_subject_PCs.csv | Function PCA at the subject level using fpca.face( ) |
part7a_studyname_some_features_page4_denseFLMM_day_PCs.csv | Function PCA at the day level using denseFLMM( ) |
part7a_studyname_some_features_page5_denseFLMM_subject_PCs.csv | Function PCA at the subject level using denseFLMM( ) |
Output | Description |
---|---|
part7b_studyname_postGGIR_JIVE_2_allfeatures.html | Extract other features from the GGIR output and merge all features together |
part7b_studyname_all_features_1.csv | Raw data of all features |
part7b_studyname_all_features_2.csv | Keep sample with valid ENMO inputs |
part7b_studyname_all_features_2.csv.log | Log file of each variable of part5b_studyname_all_features_2.csv |
plot_part7b_studyname_all_features_2.csv.pdf | Plot of each variable of part5b_studyname_all_features_2.csv |
part7b_studyname_all_features_3.csv | Average variable at the subject level |
part7b_studyname_all_features_3.csv.log | Log file of each variable of part5b_studyname_all_features_3.csv |
plot_part7b_studyname_all_features_3.csv.pdf | Plot of each variable of part5b_studyname_all_features_3.csv |
part7b_studyname_all_features_4.csv | subject level SD of each feature |
Output | Description |
---|---|
part7c_studyname_postGGIR_JIVE_4_outputReport.html | Perform JIVE Decomposition for All Features using r.jive |
part7c_studyname_jive_Decomposition.csv | Joint and individual structure estimates |
part7c_studyname_jive_predScore.csv | PCA scores of JIVE ( missing when jive.predict failes) |
part7c_studyname_jive_predScore.csv | PCA scores of JIVE ( missing when jive.predict failes) |
Output | Description |
---|---|
part7d_studyname_some_features_page1.csv | Perform JIVE Decomposition for All Features using r.jive |
part7d_weekday_studyname_all_features_3.csv | subject level mean of each feature on weekday |
part7d_weekday_studyname_some_features_page4_denseFLMM_day_PCs.csv | Function PCA at the day level using denseFLMM( ) on weekday |
part7d_weekday_studyname_some_features_page5_denseFLMM_subject_PCs.csv | Function PCA at the subject level using denseFLMM( ) on weekday |
part7d_weekend_studyname_all_features_3.csv | subject level mean of each feature on weekend |
part7d_weekend_studyname_some_features_page4_denseFLMM_day_PCs.csv | Function PCA at the day level using denseFLMM( ) on weekend |
part7d_weekend_studyname_some_features_page5_denseFLMM_subject_PCs.csv | Function PCA at the subject level using denseFLMM( ) on weekend |
Sleep Domain
Variable | Description |
---|---|
sleeponset | Detected onset of sleep expressed as hours since the midnight of the previous night. |
wakeup | Detected waking time (after sleep period) expressed as hours since the midnight of the previous night. |
number_sib_wakinghours | Number of sustained inactivity bouts during the day, with day referring to the time outside the Sleep Period Time window. |
SleepDurationInSpt | Total sleep duration, which equals the accumulated nocturnal sustained inactivity bouts within the Sleep Period Time |
sleep_Midpoint | Sleep Midpoint = (sleeponset + wakeup)/2. |
sleep_efficiency | Sleep Efficiency = SleepDurationInSpt / (wakeup - sleeponset), which was the fraction of time spent asleep in the Sleep Period Time window. |
N_atleast5minwakenight | Number of times awake during the night for at least 5 minutes |
ACC_spt_sleep_mg | Average acceleration during sleep (mg) |
Nblocks_spt_sleep | Number of blocks of night sleep within the Sleep period time window. |
Physical Activity Domain
Variable | Description |
---|---|
TAC | Total volume of physical activity in the daytime |
TLAC | Total volume of log-transformed physical activity in the daytime |
TAC_24h | Total volume of physical activity within 24 hours |
TLAC_24h | Total volume of log-transformed physical activity within 24 hours |
sed_dur | Total daytime duration in minutes of sedentary activity |
light_dur | Total daytime duration in minutes of light activity |
mod_dur | Total daytime duration in minutes of moderate activity |
vig_dur | Total daytime duration in minutes of vigorous activity |
MVPA_dur | Total daytime duration in minutes of moderate to vigorous activity |
sed_dur_M10 | Total duration in minutes of sedentary activity within M10 window |
light_dur_M10 | Total duration in minutes of light activity within M10 window |
mod_dur_M10 | Total duration in minutes of moderate activity within M10 window |
vig_dur_M10 | Total duration in minutes of vigorous activity within M10 window |
MVPA_dur_M10 | Total duration in minutes of moderate to vigorous activity within M10 window |
sed_dur_24h | Total duration in minutes of sedentary activity within 24 hours |
light_dur_24h | Total duration in minutes of light activity within 24 hours |
mod_dur_24h | Total duration in minutes of moderate activity within 24 hours |
vig_dur_24h | Total duration in minutes of vigorous activity within 24 hours |
MVPA_dur_24h | Total duration in minutes of moderate to vigorous activity within 24 hours |
mean_r | Mean sedentary bout duration. |
mean_a | Mean active bout duration. |
SATP | Sedentary to active transition probabilities. |
ASTP | Active to sedentary transition probabilities. |
Gini_r | Gini index for active bout, absolute variability normalized to the average bout duration (resting). |
Gini_a | Gini index for sedentary bout, absolute variability normalized to the average bout duration (active). |
alpha_r | Power law parameter for sedentary bout. |
alpha_a | Power law parameter for active bout. |
h_r | Hazard function for sedentary bout. |
h_a | Hazard function for active bout. |
dur_day_total_IN_min | Total duration of day in minutes spent in total inactivity during the day |
dur_day_total_LIG_min | Total duration of day in minutes of light activity during the day |
dur_day_total_MOD_min | Total duration of day in minutes of moderate activity during the day |
dur_day_total_VIG_min | Total duration of day in minutes of vigorous activity during the day |
dur_day_MVPA_bts_10_min | Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 10 minutes or more |
dur_day_MVPA_bts_5_10_min | Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 5 to 10 minutes |
dur_day_MVPA_bts_1_5_min | Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 1 to 5 minutes |
Nbouts_day_IN_bts_30 | Number of bouts of inactivity for bouts 30 minutes or more |
Nbouts_day_IN_bts_20_30 | Number of bouts of inactivity for bouts 20 to 30 minutes |
Nbouts_day_IN_bts_10_20 | Number of bouts of inactivity for bouts 10 to 20 minutes |
Nbouts_day_LIG_bts_10 | Number of bouts of light for bouts 10 minutes or more |
Nbouts_day_LIG_bts_5_10 | Number of bouts of light for bouts 5 to 10 minutes |
Nbouts_day_LIG_bts_1_5 | Number of bouts of light for bouts 1 to 5 minutes |
Nbouts_day_MVPA_bts_10 | Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 10 minutes or more |
Nbouts_day_MVPA_bts_5_10 | Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 5 to 10 minutes |
Nbouts_day_MVPA_bts_1_5 | Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 1 to 5 minutes |
Nblocks_day_total_IN | Number of blocks of total inactivity during day |
Nblocks_day_total_LIG | Number of blocks of total light activity during day |
Nblocks_day_total_MOD | Number of blocks of total moderate activity during day |
Nblocks_day_total_VIG | Number of blocks of total vigorous activity during day |
Circadian Rhythmicity Domain
Variable | Description |
---|---|
L5TIME_num | Timing of least active 5 hours |
M10TIME_num | Timing of most active 10 hours |
L5VALUE | Average acceleration value (mg) of least active 5 hours |
M10VALUE | Average acceleration value (mg) of most active 10 hours |
RA_ggir | Relative amplitude reflects the difference between M10 activity and L5 activity and RA_ggir = (M10VALUE-L5VALUE)/(M10VALUE+L5VALUE). |
L5 | Average acceleration value (mg) of least active 5 hours |
M10 | Average acceleration value (mg) of most active 10 hours |
L5TIME | Timing of least active 5 hours |
M10TIME | Timing of most active 10 hours |
RA | Relative amplitude: (M10-L5)/(M10+L5) where M10= most active 10 hrs, L5 = least active 5 hour. |
IV | Intra-daily variability measures fragmentation in the rest/activity rhythms. |
IS | Inter-daily stability measures fragmentation in the rest/activity rhythms between different days. |
IV_intradailyvariability | Intra-daily variability measures fragmentation in the rest/activity rhythms. |
IS_interdailystability | Inter-daily stability measures fragmentation in the rest/activity rhythms between different days. |
mesor | MESOR which is short for midline statistics of rhythm, which is a rhythm adjusted mean. This represents mean activity level. |
amp | Amplitude, a measure of half the extend of predictable variation within a cycle. This represents the highest activity one can achieve. |
acro | Acrophase, a measure of the time of the overall high values recurring in each cycle. Here it has a unit of radian. This represents time to reach the peak. |
mesor_L | Mesor in the cosinor model by using log-transformed data. |
amp_L | Amplitude log-transformed in the cosinor model by using log-transformed data. |
acro_L | Acrophase log-transformed in the cosinor model by using log-transformed data. |
PC1 | 1st principal component score from functional principal component analysis (FPCA). |
PC2 | 2nd principal component score from functional principal component analysis (FPCA). |
PC3 | 3rd principal component score from functional principal component analysis (FPCA). |
PC4 | 4th principal component score from functional principal component analysis (FPCA). |
PC5 | 5th principal component score from functional principal component analysis (FPCA). |
PC6 | 6th principal component score from functional principal component analysis (FPCA). |
PC7 | 7th principal component score from functional principal component analysis (FPCA). |
PC8 | 8th principal component score from functional principal component analysis (FPCA). |
PC9 | 9th principal component score from functional principal component analysis (FPCA). |
PC10 | 10th principal component score from functional principal component analysis (FPCA). |