library(pureseqtmr)
library(testthat)
library(knitr)

pureseqtmr is a package to call PureseqTM from R. PureseqTM predicts the topology of a membrane protein, where the topology can be either inside, or not inside the membrane.

To be able to call PureseqTM, it needs to be installed:

pureseqtmr::install_pureseqtm()

Note that this code is not actually run, to comply with CRAN guidelines.

PureseqTM supplies some example files. Use get_example_filenames to get the path to all these files:

if (is_pureseqtm_installed()) {
  get_example_filenames()
}
#> [1] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.fasta"        
#> [2] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.pdbtm"        
#> [3] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.tgt"          
#> [4] "/home/richel/.local/share/PureseqTM_Package/example/4j7cK.fasta"        
#> [5] "/home/richel/.local/share/PureseqTM_Package/example/4j7cK.top"          
#> [6] "/home/richel/.local/share/PureseqTM_Package/example/Q8TCT8.fasta"       
#> [7] "/home/richel/.local/share/PureseqTM_Package/example/human_20416.fasta"  
#> [8] "/home/richel/.local/share/PureseqTM_Package/example/pipeline_v0.20.png" 
#> [9] "/home/richel/.local/share/PureseqTM_Package/example/test_proteome.fasta"

In this example, 1bhaA.fasta will be used. To obtain the full path, use get_example_filename. get_example_filename will give an error if the file is not found.

if (is_pureseqtm_installed()) {
  fasta_filename <- get_example_filename("1bhaA.fasta")
  head(readLines(fasta_filename))
}
#> [1] ">1bhaA"                                                                 
#> [2] "QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF"

Getting the topology of this protein:

if (is_pureseqtm_installed()) {
  topology <- predict_topology(fasta_filename)
  kable(topology)
}
name topology
1bhaA 00000000001111111111111111111000000000000001111111111111111111100000000

Or show the topology as a plot:

if (is_pureseqtm_installed()) {
  plot_topology(topology)
}

plot of chunk unnamed-chunk-5

One needs the exact same code for a full proteome. Here we use a pureseqtmr example file, which is the COVID-19 reference proteome, as downloaded from https://www.uniprot.org/proteomes/UP000464024.

fasta_filename <- system.file(
  "extdata",
  "UP000464024.fasta",
  package = "pureseqtmr"
)
expect_true(file.exists(fasta_filename))

Show the (top of the) proteome:

head(readLines(fasta_filename))
#> [1] ">sp|P0DTC7|NS7A_SARS2 Protein 7a OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=7a PE=3 SV=1"                
#> [2] "MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFS"                                                                  
#> [3] "TQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKT"                                                                  
#> [4] "E"                                                                                                                             
#> [5] ">sp|P0DTD1|R1AB_SARS2 Replicase polyprotein 1ab OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=rep PE=1 SV=1"
#> [6] "MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV"

Getting the topology of this protein:

if (is_pureseqtm_installed()) {
  topology <- predict_topology(fasta_filename)
}

Instead of directly showing the raw data, the protein names are shortened first:

if (is_pureseqtm_installed()) {
  topology$name <- stringr::str_match(
    string = topology$name,
    pattern = "..\\|.*\\|(.*)_SARS2"
  )[, 2]
}

Show the topology as a plot:

if (is_pureseqtm_installed()) {
  plot_topology(topology)
}

plot of chunk unnamed-chunk-10

And tally the number of transmembrane helices per protein:

if (is_pureseqtm_installed()) {
  kable(tally_tmhs(topology))
}
name n_tmhs
NS7A 1
R1AB 17
NS6 0
R1A 17
VME1 3
A0A663DJA2 0
SPIKE 1
ORF9B 0
NS8 0
NCAP 0
AP3A 3
Y14 1
VEMP 1
NS7B 1