Getting Started

Getting Started with xportr

The demo will make use of a small ADSL data set that is apart of the {admiral} package. The script that generates this ADSL dataset can be created by using this command admiral::use_ad_template("adsl").

The ADSL has the following features:

To create a fully compliant v5 xpt ADSL dataset, that was developed using R, we will need to apply the 6 main functions within the xportr package:

# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(admiral)

# Loading in our example data
adsl <- admiral::admiral_adsl



NOTE: Dataset can be created by using this command admiral::use_ad_template("adsl").

Preparing your Specification Files


In order to make use of the functions within xportr you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr functions. Please see our example spec sheets in system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr") to see how xportr expects the specification sheets.


var_spec <- readxl::read_xlsx(
  system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr"), sheet = "Variables") %>%
  dplyr::rename(type = "Data Type") %>%
  rlang::set_names(tolower) 
  


Below is a quick snapshot of the specification file pertaining to the ADSL data set, which we will make use of in the 6 xportr function calls below. Take note of the order, label, type, length and format columns.



xportr_type()


In order to be compliant with transport v5 specifications an xpt file can only have two data types: character and numeric/dbl. Currently the ADSL data set has chr, dbl, time, factor and date.

look_for(adsl, details = TRUE)
   pos variable label                      col_type values                    
   1   STUDYID  Study Identifier           chr      range: CDISCPILOT01 - CDI~
   2   USUBJID  Unique Subject Identifier  chr      range: 01-701-1015 - 01-7~
   3   SUBJID   Subject Identifier for th~ chr      range: 1001 - 1448        
   4   RFSTDTC  Subject Reference Start D~ chr      range: 2012-07-09 - 2014-~
   5   RFENDTC  Subject Reference End Dat~ chr      range: 2012-09-01 - 2015-~
   6   RFXSTDTC Date/Time of First Study ~ chr      range: 2012-07-09 - 2014-~
   7   RFXENDTC Date/Time of Last Study T~ chr      range: 2012-08-28 - 2015-~
   8   RFICDTC  Date/Time of Informed Con~ chr      range:                    
   9   RFPENDTC Date/Time of End of Parti~ chr      range: 2012-08-13 - 2015-~
   10  DTHDTC   Date/Time of Death         chr      range: 2013-01-14 - 2014-~
   11  DTHFL    Subject Death Flag         chr      range: Y - Y              
   12  SITEID   Study Site Identifier      chr      range: 701 - 718          
   13  AGE      Age                        dbl      range: 50 - 89            
   14  AGEU     Age Units                  chr      range: YEARS - YEARS      
   15  SEX      Sex                        chr      range: F - M              
   16  RACE     Race                       chr      range: AMERICAN INDIAN OR~
   17  ETHNIC   Ethnicity                  chr      range: HISPANIC OR LATINO~
   18  ARMCD    Planned Arm Code           chr      range: Pbo - Xan_Lo       
   19  ARM      Description of Planned Arm chr      range: Placebo - Xanomeli~
   20  ACTARMCD Actual Arm Code            chr      range: Pbo - Xan_Lo       
   21  ACTARM   Description of Actual Arm  chr      range: Placebo - Xanomeli~
   22  COUNTRY  Country                    chr      range: USA - USA          
   23  DMDTC    Date/Time of Collection    chr      range: 2012-07-06 - 2014-~
   24  DMDY     Study Day of Collection    dbl      range: -37 - -2           
   25  TRT01P   Description of Planned Arm chr      range: Placebo - Xanomeli~
   26  TRT01A   Description of Actual Arm  chr      range: Placebo - Xanomeli~
   27  TRTSDTM  —                          dttm     range: 2012-07-09 - 2014-~
   28  TRTEDTM  —                          dttm     range: 2012-08-28 23:59:5~
   29  TRTSDT   —                          date     range: 2012-07-09 - 2014-~
   30  TRTEDT   —                          date     range: 2012-08-28 - 2015-~
   31  TRTDURD  —                          dbl      range: 1 - 212            
   32  SCRFDT   —                          date     range: 2012-08-13 - 2014-~
   33  EOSDT    —                          date     range: 2012-09-01 - 2015-~
   34  EOSSTT   —                          chr      range: COMPLETED - DISCON~
   35  FRVDT    —                          date     range: 2013-02-18 - 2014-~
   36  DTHDT    —                          date     range: 2013-01-14 - 2014-~
   37  DTHDTF   —                          chr      range:                    
   38  DTHADY   —                          dbl      range: 12 - 175           
   39  LDDTHELD —                          dbl      range: 0 - 2              
   40  LSTALVDT —                          date     range: 2012-09-01 - 2015-~
   41  AGEGR1   —                          fct      <18                       
                                                    18-64                     
                                                    >=65                      
   42  SAFFL    —                          chr      range: Y - Y              
   43  RACEGR1  —                          chr      range: Non-white - White  
   44  REGION1  —                          chr      range: NA - NA            
   45  LDDTHGR1 —                          chr      range: <= 30 - <= 30      
   46  DTH30FL  —                          chr      range: Y - Y              
   47  DTHA30FL —                          chr      range:                    
   48  DTHB30FL —                          chr      range: Y - Y


Using xport_type and the supplied specification file, we can coerce the variables in the ADSL set to be either numeric or character.


adsl_type <- xportr_type(adsl, var_spec, domain = "ADSL", verbose = "message") 


Now all appropriate types have been applied to the dataset as seen below.

look_for(adsl_type, details = TRUE)
   pos variable label col_type values                                          
   1   STUDYID  —     chr      range: CDISCPILOT01 - CDISCPILOT01              
   2   USUBJID  —     chr      range: 01-701-1015 - 01-718-1427                
   3   SUBJID   —     chr      range: 1001 - 1448                              
   4   RFSTDTC  —     chr      range: 2012-07-09 - 2014-09-02                  
   5   RFENDTC  —     chr      range: 2012-09-01 - 2015-03-05                  
   6   RFXSTDTC —     chr      range: 2012-07-09 - 2014-09-02                  
   7   RFXENDTC —     chr      range: 2012-08-28 - 2015-03-05                  
   8   RFICDTC  —     chr      range:                                          
   9   RFPENDTC —     chr      range: 2012-08-13 - 2015-03-05T14:40            
   10  DTHDTC   —     chr      range: 2013-01-14 - 2014-11-01                  
   11  DTHFL    —     chr      range: Y - Y                                    
   12  SITEID   —     chr      range: 701 - 718                                
   13  AGE      —     dbl      range: 50 - 89                                  
   14  AGEU     —     chr      range: YEARS - YEARS                            
   15  SEX      —     chr      range: F - M                                    
   16  RACE     —     chr      range: AMERICAN INDIAN OR ALASKA NATIVE - WHITE 
   17  ETHNIC   —     chr      range: HISPANIC OR LATINO - NOT HISPANIC OR LAT~
   18  ARMCD    —     chr      range: Pbo - Xan_Lo                             
   19  ARM      —     chr      range: Placebo - Xanomeline Low Dose            
   20  ACTARMCD —     chr      range: Pbo - Xan_Lo                             
   21  ACTARM   —     chr      range: Placebo - Xanomeline Low Dose            
   22  COUNTRY  —     chr      range: USA - USA                                
   23  DMDTC    —     chr      range: 2012-07-06 - 2014-08-29                  
   24  DMDY     —     dbl      range: -37 - -2                                 
   25  TRT01P   —     chr      range: Placebo - Xanomeline Low Dose            
   26  TRT01A   —     chr      range: Placebo - Xanomeline Low Dose            
   27  TRTSDTM  —     dbl      range: 1341792000 - 1409616000                  
   28  TRTEDTM  —     dbl      range: 1346198399 - 1425599999                  
   29  TRTSDT   —     dbl      range: 15530 - 16315                            
   30  TRTEDT   —     dbl      range: 15580 - 16499                            
   31  TRTDURD  —     dbl      range: 1 - 212                                  
   32  SCRFDT   —     dbl      range: 15565 - 16181                            
   33  EOSDT    —     dbl      range: 15584 - 16499                            
   34  EOSSTT   —     chr      range: COMPLETED - DISCONTINUED                 
   35  FRVDT    —     dbl      range: 15754 - 16389                            
   36  DTHDT    —     dbl      range: 15719 - 16375                            
   37  DTHDTF   —     chr      range:                                          
   38  DTHADY   —     dbl      range: 12 - 175                                 
   39  LDDTHELD —     dbl      range: 0 - 2                                    
   40  LSTALVDT —     dbl      range: 15584 - 16499                            
   41  AGEGR1   —     chr      range: 18-64 - >=65                             
   42  SAFFL    —     chr      range: Y - Y                                    
   43  RACEGR1  —     chr      range: Non-white - White                        
   44  REGION1  —     chr      range: NA - NA                                  
   45  LDDTHGR1 —     chr      range: <= 30 - <= 30                            
   46  DTH30FL  —     chr      range: Y - Y                                    
   47  DTHA30FL —     chr      range:                                          
   48  DTHB30FL —     chr      range: Y - Y

xportr_length()


Next we can apply the lengths from a variable level specification file to the data frame. xportr_length will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.


str(adsl)
  tibble [306 × 48] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ TRTSDTM : iso_dtm[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDTM : iso_dtm[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHDTF  : chr [1:306] NA NA NA NA ...
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
   $ DTH30FL : chr [1:306] NA NA NA NA ...
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
   $ DTHB30FL: chr [1:306] NA NA NA NA ...


No lengths have been applied to the variables as seen in the printout - the lengths would be in the attr part of each variables. Let’s now use xportr_length to apply our lengths from the specification file.

adsl_length <- adsl %>% xportr_length(var_spec, domain = "ADSL", "message")


str(adsl_length)
  tibble [306 × 48] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
    ..- attr(*, "width")= num 21
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
    ..- attr(*, "width")= num 30
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
    ..- attr(*, "width")= num 8
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
    ..- attr(*, "width")= num 19
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
    ..- attr(*, "width")= num 19
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
    ..- attr(*, "width")= num 19
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
    ..- attr(*, "width")= num 19
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
    ..- attr(*, "width")= num 19
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
    ..- attr(*, "width")= num 2
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
    ..- attr(*, "width")= num 5
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
    ..- attr(*, "width")= num 8
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
    ..- attr(*, "width")= num 10
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
    ..- attr(*, "width")= num 1
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
    ..- attr(*, "width")= num 60
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
    ..- attr(*, "width")= num 100
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
    ..- attr(*, "width")= num 20
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 200
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
    ..- attr(*, "width")= num 20
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 200
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
    ..- attr(*, "width")= num 3
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
    ..- attr(*, "width")= num 19
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
    ..- attr(*, "width")= num 8
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 40
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 40
   $ TRTSDTM : iso_dtm[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDTM : iso_dtm[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
    ..- attr(*, "width")= num 8
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
    ..- attr(*, "width")= num 200
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHDTF  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 2
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
    ..- attr(*, "width")= num 20
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
    ..- attr(*, "width")= num 2
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
    ..- attr(*, "width")= num 200
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
    ..- attr(*, "width")= num 80
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTH30FL : chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHB30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   - attr(*, "_xportr.df_arg_")= chr "ADSL"

Note the additional attr(*, "width")= after each variable with the width. These have been directly applied from the specification file that we loaded above!

xportr_order()

Please note that the order of the ADSL variables, see above, does not match specification file order column. We can quickly remedy this with a call to xportr_order(). Note that the variable SITEID has been moved as well as many others to match the specification file order column.

adsl_order <- xportr_order(adsl,var_spec, domain = "ADSL", verbose = "message") 

xportr_format()

Now we apply formats to the dataset. These will typically be DATE9., DATETIME20 or TIME5, but many others can be used. Notice that 8 Date/Time variables are missing a format in our ADSL dataset. Here we just take a peak at a few TRT variables, which have a NULL format.

attr(adsl$TRTSDT, "format.sas")
  NULL
attr(adsl$TRTEDT, "format.sas")
  NULL
attr(adsl$TRTSDTM, "format.sas")
  NULL
attr(adsl$TRTEDTM, "format.sas")
  NULL

Using our xportr_format() we apply our formats.

adsl_fmt <- adsl %>% xportr_format(var_spec, domain = "ADSL", "message")
attr(adsl_fmt$TRTSDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTEDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTSDTM, "format.sas")
  [1] "DATETIME20."
attr(adsl_fmt$TRTEDTM, "format.sas")
  [1] "DATETIME20."

xportr_label()


Please observe that our ADSL dataset is missing many variable labels. Sometimes these labels can be lost while using R’s function. However, A CDISC compliant data set needs to have each variable with a variable label.

look_for(adsl, details = FALSE)
   pos variable label                             
    1  STUDYID  Study Identifier                  
    2  USUBJID  Unique Subject Identifier         
    3  SUBJID   Subject Identifier for the Study  
    4  RFSTDTC  Subject Reference Start Date/Time 
    5  RFENDTC  Subject Reference End Date/Time   
    6  RFXSTDTC Date/Time of First Study Treatment
    7  RFXENDTC Date/Time of Last Study Treatment 
    8  RFICDTC  Date/Time of Informed Consent     
    9  RFPENDTC Date/Time of End of Participation 
   10  DTHDTC   Date/Time of Death                
   11  DTHFL    Subject Death Flag                
   12  SITEID   Study Site Identifier             
   13  AGE      Age                               
   14  AGEU     Age Units                         
   15  SEX      Sex                               
   16  RACE     Race                              
   17  ETHNIC   Ethnicity                         
   18  ARMCD    Planned Arm Code                  
   19  ARM      Description of Planned Arm        
   20  ACTARMCD Actual Arm Code                   
   21  ACTARM   Description of Actual Arm         
   22  COUNTRY  Country                           
   23  DMDTC    Date/Time of Collection           
   24  DMDY     Study Day of Collection           
   25  TRT01P   Description of Planned Arm        
   26  TRT01A   Description of Actual Arm         
   27  TRTSDTM  —                                 
   28  TRTEDTM  —                                 
   29  TRTSDT   —                                 
   30  TRTEDT   —                                 
   31  TRTDURD  —                                 
   32  SCRFDT   —                                 
   33  EOSDT    —                                 
   34  EOSSTT   —                                 
   35  FRVDT    —                                 
   36  DTHDT    —                                 
   37  DTHDTF   —                                 
   38  DTHADY   —                                 
   39  LDDTHELD —                                 
   40  LSTALVDT —                                 
   41  AGEGR1   —                                 
   42  SAFFL    —                                 
   43  RACEGR1  —                                 
   44  REGION1  —                                 
   45  LDDTHGR1 —                                 
   46  DTH30FL  —                                 
   47  DTHA30FL —                                 
   48  DTHB30FL —


Using the xport_label function we can take the specifications file and label all the variables available. xportr_label will produce a warning message if you the variable in the data set is not in the specification file.


adsl_update <- adsl %>% xportr_label(var_spec, domain = "ADSL", "message")
look_for(adsl_update, details = FALSE)
   pos variable label                                  
    1  STUDYID  Study Identifier                       
    2  USUBJID  Unique Subject Identifier              
    3  SUBJID   Subject Identifier for the Study       
    4  RFSTDTC  Subject Reference Start Date/Time      
    5  RFENDTC  Subject Reference End Date/Time        
    6  RFXSTDTC Date/Time of First Study Treatment     
    7  RFXENDTC Date/Time of Last Study Treatment      
    8  RFICDTC  Date/Time of Informed Consent          
    9  RFPENDTC Date/Time of End of Participation      
   10  DTHDTC   Date / Time of Death                   
   11  DTHFL    Subject Death Flag                     
   12  SITEID   Study Site Identifier                  
   13  AGE      Age                                    
   14  AGEU     Age Units                              
   15  SEX      Sex                                    
   16  RACE     Race                                   
   17  ETHNIC   Ethnicity                              
   18  ARMCD    Planned Arm Code                       
   19  ARM      Description of Planned Arm             
   20  ACTARMCD Actual Arm Code                        
   21  ACTARM   Description of Actual Arm              
   22  COUNTRY  Country                                
   23  DMDTC    Date/Time of Collection                
   24  DMDY     Study Day of Collection                
   25  TRT01P   Planned Treatment for Period 01        
   26  TRT01A   Actual Treatment for Period 01         
   27  TRTSDTM  Datetime of First Exposure to Treatment
   28  TRTEDTM  Datetime of Last Exposure to Treatment 
   29  TRTSDT   Date of First Exposure to Treatment    
   30  TRTEDT   Date of Last Exposure to Treatment     
   31  TRTDURD  Total Duration of Trt  (days)          
   32  SCRFDT   Screen Failure Date                    
   33  EOSDT    End of Study Date                      
   34  EOSSTT   End of Study Status                    
   35  FRVDT    Final Retrievel Visit Date             
   36  DTHDT    Death Date                             
   37  DTHDTF   Date of Death Imputation Flag          
   38  DTHADY   Relative Day of Death                  
   39  LDDTHELD Elapsed Days from Last Dose to Death   
   40  LSTALVDT Date Last Known Alive                  
   41  AGEGR1   Pooled Age Group 1                     
   42  SAFFL    Safety Population Flag                 
   43  RACEGR1  Pooled Race Group 1                    
   44  REGION1  Geographic Region 1                    
   45  LDDTHGR1 Last Does to Death Group               
   46  DTH30FL  Under 30  Group                        
   47  DTHA30FL Over 30  Group                         
   48  DTHB30FL Over 30 plus 30 days Group

xportr_write()


Finally, we arrive at exporting the R data frame object as a xpt file with the function xportr_write(). The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, %>%. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.

adsl %>%
  xportr_type(var_spec, "ADSL", "message") %>%
  xportr_length(var_spec, "ADSL", "message") %>%
  xportr_label(var_spec, "ADSL", "message") %>%
  xportr_order(var_spec, "ADSL", "message") %>% 
  xportr_format(var_spec, "ADSL", "message") %>% 
  xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")

That’s it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file.

As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s Github page.