Olink® Analyze is an R package that provides a versatile toolbox to
enable fast and easy handling of Olink® NPX data for your proteomics
research. Olink® Analyze provides functions for using Olink data,
including functions for importing Olink® NPX datasets exported from the
NPX Manager, as well as quality control (QC) plot functions and
functions for various statistical tests. This package is meant to
provide a convenient pipeline for your Olink NPX data analysis.
Preprocessing
Read NPX data (read_NPX)
The read_NPX function imports an NPX file of wide format that has
been exported from Olink® NPX Manager and converts the data into the
(preferred by R) long format. The wide format is the most common way
Olink® delivers data for Olink® Target 96, however, for data analysis a
long format is preferred. No prior alterations to the output of the NPX
Manager should be made for this function to work as expected.
Function arguments
- filename: Path to the NPX Manager output file.
data <- read_NPX("~/NPX_file_location.xlsx")
Function output
A tibble in long format containing:
- SampleID: Sample names or IDs.
- Index: Unique number for each SampleID. It is used to make up for
non unique sample IDs.
- OlinkID: Unique ID for each assay assigned by Olink. In case the
assay is included in more than one panels it will have a different
OlinkID in each one.
- UniProt: UniProt ID.
- Assay: Common gene name for the assay.
- MissingFreq: Missing frequency for the OlinkID, i.e. frequency of
samples with NPX value below limit of detection (LOD).
- Panel: Olink Panel that samples ran on. Read more about Olink Panels
here: https://www.olink.com/products-services/.
- Panel_Version: Version of the panel. A new panel version might
include some different or improved assays.
- PlateID: Name of the plate.
- QC_Warning: Indication whether the sample passed Olink QC. Read more
here: https://www.olink.com/faq/how-is-quality-control-of-the-data-performed/.
- LOD: Limit of detection (LOD) is the minimum level of an individual
protein that can be measured. LOD is defined as 3 times the standard
deviation over background.
- NPX: Normalized Protein eXpression, is Olink’s unit of protein
expression level in a log2 scale. The majority of the
functions of this package use NPX values for calculations. Read more
about NPX here: https://www.olink.com/faq/what-is-npx/.
Randomize samples on plate (olink_plate_randomizer)
The olink_plate_randomizer function randomly assigns samples to a
plate well with the option to keep the same individuals on the same
plate. Olink® does not recommend to force balance based on other
clinical variables.
Function arguments
- Manifest: tibble/data frame in long format containing all sample
ID’s. Sample ID column should be named SampleID.
- PlateSize: Integer, either 96 or 48. 96 is default and should be
used for Olink® Target 96 and Olink® Explore projects. For Olink® Target
48 projects, use 48.
- SubjectColumn: (Optional) Column name of the subject ID column.
Cannot contain missing values. If provided, subjects are kept on the
same plate.
- iterations: Number of iterations for fitting subjects on the same
plate.
- available.spots: (Optional) Integer. Number of wells available on
each plate. Maximum 40 for Olink® Target 48 and 88 for Olink® Target
96/Explore. Can also take a vector equal to the number of plates to be
used indicating the number of wells available on each plate.
- seed: Seed to set. Highly recommend setting it for
reproducibility.
olink_plate_randomizer(manifest,
SubjectColumn ="SampleID",
seed=111)
Function output
A tibble including SampleID, SubjectID etc. assigned to well
positions.
Select bridge samples (olink_bridgeselector)
The bridge selection function selects a number of bridge samples
based on the input data. Bridge samples are used to normalize two
dataframes/projects that have been ran at different time points, hence,
a batch effect is expected. It select samples that fulfill certain
criteria that include good detectability, passing quality control and
covering a wide range of data points. When possible the function
recommends 8-16 bridge samples.
Bridge sample selection strategy: Start by choosing samples with at
most 10% missingness (sampleMissingFreq = 0.1), and in case there are
not enough samples to output, increase the threshold to 20%
(sampleMissingFreq = 0.2).
Function arguments
- df: tibble/data frame in long format such as produced by the
read_NPX function.
- sampleMissingFreq: The threshold for sample wise missingness.
- n: Number of bridge samples to be selected.
# Select overlapping samples
olink_bridgeselector(df = npx_data1,
sampleMissingFreq = 0.1,
n = 8)
Function output
Tibble with sample IDs and mean NPX for the pre-defined number of
bridging samples.
Normalizing NPX data (olink_normalization)
The olink_normalization is a function used to normalize NPX values
between two different dataframes/projects which have been ran at
different times. Commonly, there is a shift in (mean) NPX values between
runs, however, the spread of the data remains the same. This is why
normalization between dataframes/projects is required. When
normalization is performed, one of the two provided dataframes/projects
shall be used as a reference. If two dataframes/projects have been
normalized to one another, Olink® by default uses the chronologically
older one as reference. The function handles three different types of
normalization:
- Bridging normalization: One of the dataframes is
adjusted to another using overlapping samples (bridge samples). The
overlapping samples should have the same IDs between dataframes, and
adjustment is made using the median of the paired differences between
the bridge samples. The two dataframes are provided as the inputs df1
and df2, while the one being used as reference is specified by the
reference_project and the overlapping samples are specified by the
overlapping_samples_df1. Only overlapping_samples_df1 should be provided
regardless of which dataframe is used as reference_project.
- Subset normalization: A subset of samples is used
to normalize two dataframes, one of which is used as a
reference_project. Adjustment is made using the differences of medians
between the sample subsets from the two dataframes. Both
overlapping_samples_df1 and overlapping_samples_df2 should be provided
as input. The sample IDs do not need to overlap. A special case of
subset normalization is where all samples (except control samples and
samples with QC warning) from df1 are used as input in
overlapping_samples_df1 and all samples from df2 are used as input in
overlapping_samples_df2. This is useful if no bridge samples was
included and one can assume that the distribution of the two datasets
should be very similar.
- Reference median normalization: Works only on one
dataframe. This is effectively subset normalization, but using
difference of medians to pre-recorded median values. df1,
overlapping_samples_df1 and reference_medians need to be specified.
Adjustment of df1 is made using the differences in median between the
overlapping samples and the reference medians.
Function arguments
- df1: First dataframe to be used in normalization (required).
- df2: Second dataframe to be used in normalization.
- overlapping_samples_df1: Samples to be used for adjustment factor
calculation in df1 (required).
- overlapping_samples_df2: Samples to be used for adjustment factor
calculation in df2.
- df1_project_nr: Project name of first dataset.
- df2_project_nr: Project name of second dataset.
- reference_project: Project name of reference_project. Needs to be
the same as either df1_project_nr or df2_project_nr. The project to
which the second project is to be adjusted to.
- reference_medians: Dataframe which needs to contain columns
“OlinkID”, and “Reference_NPX”. Used for reference median
normalization.
# Find overlapping samples
overlap_samples <- intersect(npx_data1$SampleID, npx_data2$SampleID) %>%
data.frame() %>%
filter(!str_detect(., 'CONTROL_SAMPLE')) %>% #Remove control samples
pull(.)
# Perform Bridging normalization
olink_normalization(df1 = npx_data1,
df2 = npx_data2,
overlapping_samples_df1 = overlap_samples,
df1_project_nr = '20200001',
df2_project_nr = '20200002',
reference_project = '20200001')
# Example of using all samples for normalization
subset_df1 <- npx_data1 %>%
filter(QC_Warning == 'Pass') %>%
filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
pull(SampleID) %>%
unique()
subset_df2 <- npx_data2 %>%
filter(QC_Warning == 'Pass') %>%
filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
pull(SampleID) %>%
unique()
olink_normalization(df1 = npx_data1,
df2 = npx_data2,
overlapping_samples_df1 = subset_df1,
overlapping_samples_df2 = subset_df2,
df1_project_nr = '20200001',
df2_project_nr = '20200002',
reference_project = '20200001')
Function output
A tibble of NPX data in long format containing normalized NPX values,
including adjustment factors:
- SampleID: Sample names or IDs.
- Index: Unique number for each SampleID. It is used to make up for
non unique sample IDs.
- OlinkID: Unique ID for each assay assigned by Olink®. In case the
assay is included in more than one panels it will have a different
OlinkID in each one.
- UniProt: UniProt ID.
- Assay: Common gene name for the assay.
- MissingFreq: Missing frequency for the OlinkID, i.e. frequency of
samples with NPX value below limit of detection (LOD).
- Panel: Olink Panel that samples ran on. Read more about Olink Panels
here: https://www.olink.com/products-services/.
- Panel_Version: Version of the panel. A new panel version might
include some different or improved assays.
- PlateID: Name of the plate.
- QC_Warning: Indication whether the sample passed Olink QC. Read more
here: https://www.olink.com/faq/how-is-quality-control-of-the-data-performed/.
- LOD: Limit of detection (LOD) is the minimum level of an individual
protein that can be measured. LOD is defined as 3 times the standard
deviation over background.
- NPX: Normalized Protein eXpression, is Olink®’s unit of protein
expression level in a log2 scale. The majority of the
functions of this package use NPX values for calculations. Read more
about NPX here: https://www.olink.com/faq/what-is-npx/.
- Project: Name given from the dataframe of origin.
- Adj_factor: Adjustment factor, i.e. how much was added to or
subtracted from the original NPX value.
Statistical analysis
T-test analysis (olink_ttest)
The olink_ttest function performs a Welch 2-sample t-test or paired
t-test at confidence level 0.95 for every protein (by OlinkID) for a
given grouping variable using the function t.test from the R
library stats and corrects for multiple testing using the
Benjamini-Hochberg method (“fdr”) using the function p.adjust
from the R library stats. Adjusted p-values are logically
evaluated towards adjusted p-value<0.05. The resulting t-test table
is arranged by ascending p-values.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with 2
levels.
- variable: Character value that should represent a column in the df
to be used as a grouping variable. Needs to have exactly 2 levels.
- pair_id: Character value indicating which column contains the paired
sample identifier. Only used for paired t-tests.
olink_ttest(df = npx_data1,
variable = 'Treatment')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink® ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink® Panel.
- estimate <dbl>: Difference in mean NPX between
groups.
- statistic <dbl>: Value of the t-statistic.
- p.value <dbl>: P-value for the test.
- parameter <dbl>: Degrees of freedom for the
t-statistic.
- conf.low <dbl>: Low bound of the confidence interval
for the mean.
- conf.high <dbl>: High bound of the confidence
interval for the mean.
- method <chr>: Method that was used.
- alternative <chr>: : Description of the alternative
hypothesis.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Mann-Whitney U Test analysis (olink_wilcox)
The olink_wilcox function performs a welch 2-sample Mann-Whitney U
test or paired Mann-Whitney U test at confidence level 0.95 for every
protein (by OlinkID) for a given grouping variable using the function
wilcox.test from the R library stats and corrects for
multiple testing using the Benjamini-Hochberg method (“fdr”) based on
the function p.adjust from the R library stats.
Adjusted p-values are logically evaluated towards adjusted
p-value<0.05. The resulting Mann-Whitney U table is arranged by
ascending p-values.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with 2
levels.
- variable: Character value that should represent a column in the df
to be used as a grouping variable. Needs to have exactly 2 levels.
- pair_id: Character value indicating which column contains the paired
sample identifier. Only used for paired Mann-Whitney U tests.
olink_wilcox(df = npx_data1,
variable = 'Treatment')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink® ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink® Panel.
- statistic <dbl>: Value of the Mann-Whitney U
statistic.
- p.value <dbl>: P-value for the test.
- method <chr>: Method that was used.
- alternative <chr>: : Description of the alternative
hypothesis.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Analysis for variance (ANOVA) (olink_anova)
The olink_anova is a wrapper function that performs an ANOVA F-test
for each assay using the function Anova from the R library
car and Type III sum of squares. The function handles both
factor and numerical variables, and/or confounding factors.
Samples with missing variable information or factor levels are
excluded from the analysis. Character columns in the input data frame
are converted to factors. The automatic handling of the data from above
is announced by a message if the flag verbose=TRUE.
Crossed/interaction analysis, i.e. A*B formula notation, is inferred
from the variable argument in the following cases:
- c(‘A’,‘B’)
- c(‘A:B’)
- c(‘A:B’, ‘B’) or c(‘A:B’, ‘A’)
Inference is specified in a message if verbose=TRUE.
For covariates, crossed analyses need to be specified explicitly,
i.e. two main effects will not be expanded with a c(‘A’,‘B’) notation.
Main effects present in the variable take precedence. The formula
notation of the final model is specified in a message if
verbose=TRUE.
Adjusted p-values are calculated using the function p.adjust
from the R library stats with the Benjamini & Hochberg
(1995) method (“fdr”). The threshold is determined by logic evaluation
of Adjusted_pval < 0.05. Covariates are not included in the p-value
adjustment.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- return.covariates: Logical. Default: False. Returns F-test results
for the covariates. Note: Adjusted p-values will be NA for
covariates.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# One-way ANOVA, no covariates
anova_results_oneway <- olink_anova(df = npx_data1,
variable = 'Site')
# Two-way ANOVA, no covariates
anova_results_twoway <- olink_anova(df = npx_data1,
variable = c('Site', 'Time'))
# One-way ANOVA, Treatment as covariates
anova_results_oneway <- olink_anova(df = npx_data1,
variable = 'Site',
covariates = 'Treatment')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- df <dbl>: Numerator of degrees of freedom.
- sumsq <dbl>: Sum of squares.
- meansq <dbl>: Mean of squares.
- statistic <dbl>: Value of F-statistic.
- p.value <dbl>: P-value for the test.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Post-hoc ANOVA analysis (olink_anova_posthoc)
olink_anova_posthoc performs a post-hoc ANOVA test using the function
emmeans from the R library emmeans with Tukey p-value
adjustment per assay (by OlinkID) at confidence level 0.95.
The function handles both factor and numerical variables and/or
covariates. The post-hoc test for a numerical variable compares the
difference in means of the outcome variable (default: NPX) for 1
standard deviation (SD) difference in the numerical variable, e.g. mean
NPX at mean (numerical variable) versus mean NPX at mean (numerical
variable) + 1*SD (numerical variable).
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- olinkid_list: Character vector of OlinkID’s on which to perform the
post-hoc analysis. If not specified, all assays in df are used.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- effect: Term on which to perform the post-hoc analysis. Character
vector. Must be subset of or identical to the variable and no adjustment
is performed.
- mean_return: Logical. If true, returns the mean of each factor level
rather than the difference in means (default). Note that no p-value is
returned for mean_return = TRUE.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# calculate the p-value for the ANOVA
anova_results_oneway <- olink_anova(df = npx_data1,
variable = 'Site')
# extracting the significant proteins
anova_results_oneway_significant <- anova_results_oneway %>%
filter(Threshold == 'Significant') %>%
pull(OlinkID)
anova_posthoc_oneway_results <- olink_anova_posthoc(df = npx_data1,
olinkid_list = anova_results_oneway_significant,
variable = 'Site',
effect = 'Site')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- contrast <chr>: Variables (in term) that are
compared.
- estimate <dbl>: Difference in mean NPX between
variables (from contrast).
- conf.low <dbl>: Low bound of the confidence interval
for the mean.
- conf.high <dbl>: High bound of the confidence
interval for the mean.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
One way non-parametric test (olink_one_non_parametric)
The olink_one_non_parametric is a wrapper function that performs
either a Kruskal-Wallis test or a Friedman test for each assay using the
function kruskal.test from the R library stats or the
function friedman_test from the R library rstatix and
a posthoc test using the function wilcox_test from the R
library rstatix. The function handles both factor and numerical
variables, and/or confounding factors.
Samples with missing variable information or factor levels are
excluded from the analysis. Character columns in the input data frame
are converted to factors. The automatic handling of the data from above
is announced by a message if the flag verbose=TRUE.
Adjusted p-values are calculated using the function
wilcox_test from the R library rstatix with the
Benjamini & Hochberg (1995) method (“fdr”). The threshold is
determined by logic evaluation of Adjusted_pval < 0.05.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
- dependence: Logical. Default: FALSE. When the groups are
independent, the kruskal-Wallis will run, when the groups are dependent,
the Friedman test will run.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# One-way Kruskal-Wallis Test
kruskal_results <- olink_one_non_parametric(df = npx_df,
variable = "Time")
# One-way Friedman Test
friedman_results <- olink_one_non_parametric(df = npx_df,
variable = "Time",
dependence = TRUE)
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation.
- df <dbl>: Numerator of degrees of freedom.
- method <dbl>: Name of the performed test.
- statistic <dbl>: Value of the test’s statistic.
- p.value <dbl>: P-value for the test.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Post-hoc one way non-parametric analysis
(olink_one_non_parametric_posthoc)
olink_one_non_parametric_posthoc performs a post-hoc Wilcoxon test
using the function wilcox_test from the R library
rstatix with Benjamini & Hochberg p-value adjustment per
assay (by OlinkID) at confidence level 0.95. The function handles both
factor and numerical variables and/or covariates.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- olinkid_list: Character vector of OlinkID’s on which to perform the
post-hoc analysis. If not specified, all assays in df are used.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
#Friedman Test
Friedman_results <- olink_one_non_parametric(npx_df, "Time", dependence = TRUE)
#Filtering out significant and relevant results.
significant_assays <- Friedman_results %>%
filter(Threshold == 'Significant') %>%
dplyr::select(OlinkID) %>%
distinct() %>%
pull()
#Posthoc test for the results from Friedman Test
friedman_posthoc_results <- olink_one_non_parametric_posthoc(npx_df, variable = c("Time"), olinkid_list = significant_assays)
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation.
- contrast <chr>: Variables (in term) that are
compared.
- estimate <dbl>: Difference in mean NPX between
variables (from contrast).
- conf.low <dbl>: Low bound of the confidence interval
for the location parameter.
- conf.high <dbl>: High bound of the confidence
interval for the location parameter.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Regression models for ordinal data (olink_ordinalRegression)
The olink_ordinalRegression is a wrapper function that performs an
ANOVA F-test for each assay (ordinal transformed) using the function
Anova from the R library car and Type II sum of
squares. The function handles both factor and numerical variables,
and/or confounding factors.
Samples with missing variable information or factor levels are
excluded from the analysis. Character columns in the input data frame
are converted to factors. The automatic handling of the data from above
is announced by a message if the flag verbose=TRUE.
Crossed/interaction analysis, i.e. A*B formula notation, is inferred
from the variable argument in the following cases:
- c(‘A’,‘B’)
- c(‘A:B’)
- c(‘A:B’, ‘B’) or c(‘A:B’, ‘A’)
Inference is specified in a message if verbose=TRUE.
For covariates, crossed analyses need to be specified explicitly,
i.e. two main effects will not be expanded with a c(‘A’,‘B’) notation.
Main effects present in the variable take precedence. The formula
notation of the final model is specified in a message if
verbose=TRUE.
Adjusted p-values are calculated using the function p.adjust
from the R library stats with the Benjamini & Hochberg
(1995) method (“fdr”). The threshold is determined by logic evaluation
of Adjusted_pval < 0.05. Covariates are not included in the p-value
adjustment.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- return.covariates: Logical. Default: False. Returns F-test results
for the covariates. Note: Adjusted p-values will be NA for
covariates.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# Two-way ordinal regression, no covariates
ordinalRegression_results_twoway <- olink_ordinalRegression(df = npx_data1,
variable = c('Site', 'Time'))
# One-way ordinal regression, Treatment as covariates
ordinalRegression_oneway <- olink_anova(df = npx_data1,
variable = 'Site',
covariates = 'Treatment')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- df <dbl>: Numerator of degrees of freedom.
- statistic <dbl>: Value of F-statistic.
- p.value <dbl>: P-value for the test.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Post-hoc of regression models for ordinal data analysis
(olink_ordinalRegression_posthoc)
olink_ordinalRegression_posthoc performs a post-hoc ANOVA test using
the function emmeans from the R library emmeans with
Tukey p-value adjustment per assay (by OlinkID) at confidence level
0.95. The function handles both factor and numerical variables and/or
covariates.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and an outcome factor with at
least 3 levels.
- olinkid_list: Character vector of OlinkID’s on which to perform the
post-hoc analysis. If not specified, all assays in df are used.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- effect: Term on which to perform the post-hoc analysis. Character
vector. Must be subset of or identical to the variable and no adjustment
is performed.
- mean_return: Logical. If true, returns the mean of each factor level
rather than the difference in means (default). Note that no p-value is
returned for mean_return = TRUE.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# Two-way Ordinal Regression
ordinalRegression_results <- olink_ordinalRegression(df = npx_data1,
variable="Treatment:Time")
# extracting the significant proteins
significant_assays <- ordinalRegression_results %>%
filter(Threshold == 'Significant' & term == 'Treatment:Time') %>%
select(OlinkID) %>%
distinct() %>%
pull()
# Posthoc test for the model NPX~Treatment*Time,
ordinalRegression_posthoc_results <- olink_ordinalRegression_posthoc(npx_data1,
variable=c("Treatment:Time"),
covariates="Site",
olinkid_list = significant_assays,
effect = "Treatment:Time")
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- contrast <chr>: Variables (in term) that are
compared.
- estimate <dbl>: Difference in mean NPX between
variables (from contrast).
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Linear mixed effects model analysis (olink_lmer)
The olink_lmer fits a linear mixed effects model for every protein
(by OlinkID) in every panel, using the function lmer from the R
library lmerTest and the function anova from the R
library stats. The function handles both factor and numerical
variables and/or covariates.
Samples with missing variable information or factor levels are
excluded from the analysis. Character columns in the input data frame
are converted to factors. The automatic handling of the data from above
is announced by a message if the flag verbose=TRUE.
Crossed/interaction analysis, i.e. A*B formula notation, is inferred
from the variable argument in the following cases:
- c(‘A’,‘B’)
- c(‘A:B’)
- c(‘A:B’, ‘B’) or c(‘A:B’, ‘A’)
Inference is specified in a message if verbose=TRUE.
For covariates, crossed analyses need to be specified explicitly,
i.e. two main effects will not be expanded with a c(‘A’,‘B’) notation.
Main effects present in the variable take precedence. The formula
notation of the final model is specified in a message if
verbose=TRUE.
Adjusted p-values are calculated using the function p.adjust
from the R library stats with the Benjamini & Hochberg
(1995) method (“fdr”). The threshold is determined by logic evaluation
of Adjusted_pval < 0.05. Covariates are not included in the p-value
adjustment.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and 1-2 variables with at least 2
levels and subject ID.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- random: Single character value or character array with random
effects.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- return.covariates: Logical. Default: False. Returns F-test results
for the covariates. Note: Adjusted p-values will be NA for
covariates.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# Linear mixed model with one variable.
lmer_results_oneway <- olink_lmer(df = npx_data1,
variable = 'Site',
random = 'Subject')
# Linear mixed model with two variables.
lmer_results_twoway <- olink_lmer(df = npx_data1,
variable = c('Site', 'Treatment'),
random = 'Subject')
Function outcome
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- sumsq <dbl>: Sum of squares.
- meansq <dbl>: Mean of squares.
- NumDF <dbl>: Numerator of degrees of freedom.
- DenDF <dbl>: Denominator of degrees of freedom.
- statistic <dbl>: Value of F-statistic.
- p.value <dbl>: P-value for the test.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Post-hoc linear mixed effects model analysis
(olink_lmer_posthoc)
The olink_lmer_posthoc function is similar to olink_lmer but performs
a post-hoc analysis based on a linear mixed model effects model using
the function lmer from the R library lmerTest and the
function emmeans from the R library emmeans. The
function handles both factor and numerical variables and/or covariates.
Differences in estimated marginal means are calculated for all pairwise
levels of a given output variable. Degrees of freedom are estimated
using Satterthwaite’s approximation. The post-hoc test for a numerical
variable compares the difference in means of the outcome variable
(default: NPX) for 1 standard deviation difference in the numerical
variable, e.g. mean NPX at mean(numerical variable) versus mean NPX at
mean(numerical variable) + 1*SD(numerical variable). The output tibble
is arranged by ascending adjusted p-values.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and 1-2 variables with at least 2
levels and subject ID.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- olinkid_list: Character vector of OlinkID’s on which to perform the
post-hoc analysis. If not specified, all assays in df are used.
- effect: Term on which to perform the post-hoc analysis. Character
vector. Must be subset of or identical to the variable.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- random: Single character value or character array with random
effects.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- mean_return: Logical. If true, returns the mean of each factor level
rather than the difference in means (default). Note that no p-value is
returned for mean_return = TRUE and no adjustment is performed.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
# Linear mixed model with two variables.
lmer_results_twoway <- olink_lmer(df = npx_data1,
variable = c('Site', 'Treatment'),
random = 'Subject')
# extracting the significant proteins
lmer_results_twoway_significant <- lmer_results_twoway %>%
filter(Threshold == 'Significant', term == 'Treatment') %>%
pull(OlinkID)
# performing post-hoc analysis
lmer_posthoc_twoway_results <- olink_lmer_posthoc(df = npx_data1,
olinkid_list = lmer_results_twoway_significant,
variable = c('Site', 'Treatment'),
random = 'Subject',
effect = 'Treatment')
Function output
A tibble with the following columns:
- Assay <chr>: Assay name.
- OlinkID <chr>: Unique Olink ID.
- UniProt <chr>: UniProt ID.
- Panel <chr>: Olink Panel.
- term <chr>: Name of the variable that was used for
the p-value calculation. The “:” between variables indicates interaction
between variables.
- contrast <chr>: Variables (in term) that are
compared.
- estimate <dbl>: Difference in mean NPX between
variables (from contrast).
- conf.low <dbl>: Low bound of the confidence interval
for the mean.
- conf.high <dbl>: High bound of the confidence
interval for the mean.
- Adjusted_pval <dbl>: Adjusted p-value for the test
(Benjamini & Hochberg).
- Threshold <chr>: Text indication if assay is
significant (adjusted p-value < 0.05).
Pathway Enrichment (olink_pathway_enrichment)
The olink_pathway_enrichment function can be used to perform Gene Set
Enrichment Analysis (GSEA) or Over-representation Analysis (ORA) using
MSigDB, Reactome, KEGG, or GO. MSigDB includes curated gene sets (C2)
and ontology gene sets (C5) which encompasses Reactome, KEGG, and GO.
This function performs enrichment using the gsea or
enrich functions from clusterProfiler from BioConductor. The
function uses the estimate from a previous statistical analysis for one
contrast for all proteins. MSigDB is subset if ontology is KEGG, GO, or
Reactome. test_results must contain estimates for all assays. Posthoc
results can be used but should be filtered for one contrast to improve
interpretability.
Alternative statistical results can be used as input as long as they
include the columns “OlinkID”, “Assay”, and “estimate”. A column named
“Adjusted_pal” is also needed for ORA. Any statistical results that
contains one estimate per protein will work as long as the estimates are
comparable to each other.
clusterProfiler is originally developed by Guangchuang Yu at the
School of Basic Medical Sciences at Southern Medical University.
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L
Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141. doi: 10.1016/j.xinn.2021.100141
Function Arguments
- data: NPX data frame in long format with columns Assay, OlinkID,
UniProt, SampleID, QC_Warning, NPX, and LOD
- test_results: a data frame of statistical test results including
Adjusted_pval and estimate columns
- method: String of method name. Must be either “GSEA” (default) or
“ORA”
- ontology: String of database to query. Must be either “MSigDb”,
“KEGG”, “GO”, and “Reactome”
- organism: String of name of organism. Must be either “human” or
“mouse”
npx_df <- npx_data1 %>% filter(!grepl("control", SampleID, ignore.case = TRUE))
ttest_results <- olink_ttest(
df = npx_df,
variable = "Treatment",
alternative = "two.sided")
try({ # This expression might fail if dependencies are not installed
gsea_results <- olink_pathway_enrichment(data = npx_data1, test_results = ttest_results)
ora_results <- olink_pathway_enrichment(
data = npx_data1,
test_results = ttest_results, method = "ORA")
}, silent = TRUE)
Function Output
A data frame of enrichment results. Columns for ORA include:
- ID <chr>: Pathway ID from MSigDB
- Description <chr>: Description of Pathway from
MSigDB
- GeneRatio <chr>: ratio of input proteins that are
annotated in a term
- BgRatio <chr>: ratio of all genes that are annotated
in this term
- pvalue <dbl>: p-value of enrichment
- p.adjust <dbl>: Adjusted p-value
(Benjamini-Hochberg)
- qvalue <dbl>: false discovery rate, the estimated
probability that the normalized enrichment score represents a false
positive finding
- geneID: <chr> list of input proteins (Gene Symbols)
annotated in a term delimited by “/”
- Count <dbl>: Number of input proteins that are
annotated in a term
Columns for GSEA:
- ID <chr>: Pathway ID from MSigDB
- Description <chr>: Description of Pathway from
MSigDB
- setSize <dbl>: ratio of input proteins that are
annotated in a term
- enrichmentScore <dbl>: Enrichment score, degree to
which a gene set is over-represented at the top or bottom of the ranked
list of genes
- NES <dbl>: Normalized Enrichment Score, normalized to
account for differences in gene set size and in correlations between
gene sets and expression data sets. NES can be used to compare analysis
results across gene sets.
- pvalue <dbl>: p-value of enrichment
- p.adjust <dbl>: Adjusted p-value
(Benjamini-Hochberg)
- qvalue <dbl>: false discovery rate, the estimated
probability that the normalized enrichment score represents a false
positive finding
- rank <dbl>: the position in the ranked list where the
maximum enrichment score occurred
- leading_edge <chr>: contains tags, list, and signal.
Tags gives an indication of the percentage of genes contributing to the
enrichment score. List gives an indication of where in the list the
enrichment score is obtained. Signal represents the enrichment signal
strength and combines the tag and list.
- core_enrichment <chr>: list of input proteins (Gene
Symbols) annotated in a term delimited by “/”
Visualization
Boxplots for outcomes (olink_boxplot)
The olink_boxplot function is used to generate boxplots of NPX values
stratified on a variable for a given list of proteins. olink_boxplot
uses the functions ggplot and geom_boxplot of the R
library ggplot2.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt and a grouping variable.
- variable: Single character value indicating the column name to use
as a grouping variable in the x axis.
- olinkid_list: Character vector of OlinkID’s that should be used for
the boxplot. If not specified, all assays in df are used.
- posthoc_results: Data frame from ANOVA posthoc analysis. This data
frame need to be generated using the olink_anova_posthoc()
function.
- ttest_results: Data frame from ttest analysis. This data frame need
to be generated using the olink_ttest() function.
- verbose: Logical. Default: False. Flag indicating if plots shall be
printed additionally to assigned to a list variable.
- number_of_proteins_per_plot: Number of boxplots to include in the
facets plot. Default 6.
plot <- npx_data1 %>%
na.omit() %>% # removing missing values which exists for Site
olink_boxplot(variable = "Site",
olinkid_list = c("OID00488", "OID01276"),
number_of_proteins_per_plot = 2)
plot[[1]]
anova_posthoc_results<-npx_data1 %>%
olink_anova_posthoc(olinkid_list = c("OID00488", "OID01276"),
variable = 'Site',
effect = 'Site')
plot2 <- npx_data1 %>%
na.omit() %>% # removing missing values which exists for Site
olink_boxplot(variable = "Site",
olinkid_list = c("OID00488", "OID01276"),
number_of_proteins_per_plot = 2,
posthoc_results = anova_posthoc_results)
plot2[[1]]
Function output
A list of objects of class ggplot.
Note: Please note that plots will not appear in the
plots panel of Rstudio if not assigned to a variable
and printing it (see sample code above).
Boxplots for QC (olink_dist_plot)
The olink_dist_plot function generates boxplots of NPX values for
each sample, faceted by Olink panel. This is used as an initial QC step
to identify potential outliers. olink_dist_plot uses the functions
ggplot and geom_boxplot of the R library
ggplot2.
Function arguments
- df: NPX data frame in long format should minimally contain SampleID,
NPX and Panel.
- color_g: Character value indicating the column name that should be
used as fill color. Default: QC_Warning.
npx_data1 %>%
filter(Panel == 'Olink Cardiometabolic') %>% # For this example only plotting one panel.
olink_dist_plot() +
theme(axis.text.x = element_blank()) # Due to the number of samples one can remove the text or rotate it
Function output
A ggplot object.
Point-range plot for LMER (olink_lmer_plot)
The function olink_lmer_plot generates a point-range plot for a given
list of proteins based on linear mixed effect model. The points
illustrate the mean NPX level for each group and the error bars
illustrate 95% confidence intervals. Facets are labeled by the protein
name and corresponding OlinkID for the protein.
Function arguments
- df: NPX data frame in long format should minimally contain protein
name (Assay), OlinkID, UniProt, Panel and 1-2 variables with at least 2
levels and subject ID.
- variable: Single character value or character array. In case of
single character then that should represent a column in the df.
Otherwise, if length > 1, the included variable names will be used in
crossed analyses. It can also accept the notations ‘:’ or ’*’.
- outcome: Name of the column from df that contains the dependent
variable. Default: NPX.
- random: Single character value or character array with random
effects.
- covariates: Single character value or character array. Default:
NULL. Confounding factors to include in the analysis. In case of single
character then that should represent a column in the df. It can also
accept the notations ‘:’ or ’*’, while crossed analysis will not be
inferred from main effects.
- x_axis_variable: Character. Which main effect to use as x-axis in
the plot.
- col_variable: Character. If provided, the interaction effect
col_variable:x_axis_variable will be plotted with x_axis_variable on the
x-axis and col_variable as color.
- number_of_proteins_per_plot: Number plots to include in the list of
point-range plots. Defaults to 6 plots per figure.
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
plot <- olink_lmer_plot(df = npx_data1,
olinkid_list = c("OID01216", "OID01217"),
variable = c('Site', 'Treatment'),
x_axis_variable = 'Site',
col_variable = 'Treatment',
random = 'Subject')
plot[[1]]
Function output
A list of objects of class ggplot.
Note: Please note that plots will not appear in the
plots panel of Rstudio if not assigned to a variable
and printing it (see sample code above).
Principal components analysis (PCA) plot (olink_pca_plot)
Generates PCA projection of all samples from NPX data along two
principal components (Default PC2 vs PC1) colored by the variable
QC_Warning and including the percentage of explained variance. The
function used the functions prcomp and ggplot from the
R libraries stats and ggplot2, respectively. By
default, the values scaled and centered in the PCA and proteins with
missing NPX values removed from the corresponding assay(s). Unique
sample names are required. Imputation by median value is done for assays
with missingness <10% and for multi-plate projects, and for
missingness <5% for single plate projects.
The values are by default scaled and centered in the PCA and proteins
with missing NPX values are by default removed from the corresponding
assay. Unique sample names are required. Imputation by the median is
done for assays with missingness <10% for multi-plate projects and
<5% for single plate projects. The plot is printed, and a list of
ggplot objects is returned.
If byPanel = TRUE, the data processing (imputation of missing values
etc) and subsequent PCA is performed separately per panel. A faceted
plot is printed, while the individual ggplot objects are returned.
The arguments outlierDefX and outlierDefY can be used to identify
outliers in the PCA. Samples more than +/-outlierDef[X,Y] standard
deviations from the mean of the plotted PC will be labelled. Both
arguments have to be specified.
Function arguments (selection)
- df: NPX data frame in long format should minimally contain SampleID,
NPX and column that will be used for grouping/coloring.
- color_g: Character value indicating the column name that should be
used as fill color. Default QC_Warning.
- x_val: Integer indicating which principal component to plot along
the x-axis. Default 1.
- y_val: Integer indicating which principal component to plot along
the y-axis. Default 2.
- label_samples: Logical. If TRUE, points are replaced with SampleID.
Default FALSE.
- drop_assays: Logical. All assays with any missing values will be
dropped. Takes precedence over sample drop.
- drop_samples: Logical. All samples with any missing values will be
dropped.
- n_loadings: Integer. Plot the top n_loadings ranked by size.
- loadings_list: Character vector indicating for which OlinkID’s to
plot loadings. Arguments n_loadings and loadings_list can be used
together.
- byPanel: Logical. Perform the PCA per panel (default FALSE)
- outlierDefX: (Optional) The number standard deviations along the PC
plotted on the x-axis that defines an outlier.
- outlierDefY: (Optional) The number standard deviations along the PC
plotted on the y-axis that defines an outlier.
- OutlierLines: Logical. Draw dashed lines at +/-outlierDef[X,Y]
standard deviations from the mean of the plotted PCs (default
FALSE)
- verbose: Logical. Default: True. If information about removed
samples, factor conversion and final model formula is to be printed to
the console.
- quiet: Logical. Default: False. If TRUE, the resulting plot is not
printed.
npx_data1 %>%
filter(!str_detect(SampleID, 'CONTROL_SAMPLE')) %>%
olink_pca_plot(df = .,
color_g = "QC_Warning", byPanel = TRUE)
Function output
A list of objects of class ggplot (silently returned). Plots
are also printed unless option quiet = TRUE
is set. If
outlierDefX and outlierDefY are specified, a list of outliers can be
extracted from the ggplot object based on these
parameters.
npx_data <- npx_data1 %>%
mutate(SampleID = paste(SampleID, "_", Index, sep = ""))
g <- olink_pca_plot(df=npx_data, color_g = "QC_Warning",
outlierDefX = 2.5, outlierDefY = 4, byPanel = TRUE, quiet = TRUE)
lapply(g, function(x){x$data}) %>%
bind_rows() %>%
filter(Outlier == 1) %>%
select(SampleID, Outlier, Panel)
#> SampleID Outlier Panel
#> 1 B22_103 1 Cardiometabolic
#> 2 B68_149 1 Cardiometabolic
#> 3 B9_88 1 Cardiometabolic
#> 4 A28_30 1 Inflammation
#> 5 A57_59 1 Inflammation
Heatmap for visualizing pathway enrichment
(olink_pathway_heatmap)
The olink_pathway_heatmap function generates a heatmap of proteins
related to pathways using the enrichment results from the
olink_pathway_enrichment function. Either the top terms can be
visualized or terms containing a certain keyword. For each term, the
proteins in the test_result data frame that are related to that term
will be visualized by their estimate. This visualization can be used to
determining how many proteins of interest are involved in a particular
pathway and in which direction their estimates are.
Function arguments
- enrich_results: data frame of enrichment results from
olink_pathway_enrichment()
- test_results: filtered results from statistical test with Assay,
OlinkID, and estimate columns
- method: method used in olink_pathway_enrichment (“GSEA” (default) or
“ORA”)
- keyword: (optional) keyword to filter enrichment results on, if not
specified, displays top terms
- number_of_terms: number of terms to display, default is 20
# GSEA Heatmap from t-test results
try({ # This expression might fail if dependencies are not installed
olink_pathway_heatmap(enrich_results = gsea_results, test_results = ttest_results)
})
# ORA Heatmap from t-test results with cell keyword
try({ # This expression might fail if dependencies are not installed
olink_pathway_heatmap(enrich_results = ora_results, test_results = ttest_results,
method = "ORA", keyword = "cell")
})
Function output
A heatmap as a ggplot object
Bargraph for visualizing pathway enrichment
(olink_pathway_visualization)
The olink_pathway_visualization function generates a bar graph of the
top terms or terms related to a certain keyword for results from the
olink_pathway_enrichment function. The bar represents either the
normalized enrichment score (NES) for GSEA results or counts (number of
proteins) for ORA results colored by adjusted p-value. The ORA
visualization also contains the number of proteins out of the total
proteins in that pathway as a ratio after the bar.
Function arguments
- enrich_results: data frame of enrichment results from
olink_pathway_enrichment()
- method: method used in olink_pathway_enrichment (“GSEA” (default) or
“ORA”)
- keyword: (optional) keyword to filter enrichment results on, if not
specified, displays top terms
- number_of_terms: number of terms to display, default is 20
Function output
A bar graph as a ggplot object.
Scatterplot for QC (olink_qc_plot)
The olink_qc_plot function generates a facet plot per Panel using
ggplot and ggplot2::geom_point and stats::IQR plotting IQR vs. median
for all samples. This is a good first check to find out if any samples
have a tendency to be classified as outliers. Horizontal dashed lines
indicate +/-3 standard deviations from the mean IQR. Vertical dashed
lines indicate +/-3 standard deviations from the mean sample median.
Function arguments
- df: NPX data frame in long format should minimally contain SampleID,
Index, NPX and Panel.
- color_g: Character value indicating the column name that should be
used as fill color. Default QC_Warning.
- plot_index: Logical. Default FALSE. If FALSE, a point will be
plotted for a sample. If TRUE, a sample’s unique index number is
displayed.
- label_outliers: Logical. If TRUE, an outlier sample will be labeled
by its SampleID.
- IQR_outlierDef: The number of standard deviations from the mean IQR
that defines an outlier (default 3)
- median_outlierDef: The number of standard deviations from the mean
sample median that defines an outlier. (default 3)
- outlierLines: Logical. Draw dashed lines at +/-IQR_outlierDef and
+/-median_outlierDef standard deviations from the mean IQR and sample
median respectively (default TRUE)
- facetNrow: Integer. The number of rows that the panels are arranged
on.
- facetNcol: Integer. The number of columns that the panels are
arranged on.
npx_data1 %>%
filter(!str_detect(SampleID, 'CONTROL_SAMPLE'),
Panel == 'Olink Inflammation') %>%
olink_qc_plot(color_g = "QC_Warning")
Function output
An object of class ggplot. A list of outliers can be
extracted from the ggplot object.
qc <- olink_qc_plot(npx_data1, color_g = "QC_Warning", IQR_outlierDef = 3, median_outlierDef = 3)
qc$data %>% filter(Outlier == 1) %>% select(SampleID, Panel, IQR, sample_median, Outlier)
#> # A tibble: 1 × 5
#> SampleID Panel IQR sample_median Outlier
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 A48 Inflammation 8.64 4.53 1
Heatmap (olink_heatmap_plot)
The olink_heatmap_plot function generates a heatmap for all samples
and proteins using pheatmap::pheatmap. By default the heatmap center and
scaled NPX across all proteins and cluster samples and proteins using a
dendrogram. Unique sample names are required.
Group variable can be annotated and colored in the left side of the
heatmap.
Function arguments
- df: NPX data frame in long format which should minimally contain
SampleID, NPX, OlinkID, Assay. Optionally columns of choice for
annotations.
- variable_row_list: Columns in df to be annotated for rows in the
heatmap.
- variable_col_list: Columns in df to be annotated for columns in the
heatmap.
- center_scale: Logical. Default: True. If data should be centered and
scaled across assays.
- cluster_rows: Logical. Default: True. Determining if rows should be
clustered.
- cluster_cols: Logical. Default: True. Determining if columns should
be clustered.
- show_rownames: Logical. Default: True. Determining if row names are
shown.
- show_colnames: Logical. Default: True. Determining if column names
are shown.
- annotation_legend: Logical. Default: True. Determining if legend for
annotations should be shown.
- fontsize. Default: 10. Fontsize for all text.
- na_col. Default: Black. Color of the cells with NA.
first10 <- npx_data1 %>%
pull(OlinkID) %>%
unique() %>%
head(10)
first15samples <- npx_data1$SampleID %>%
unique() %>%
head(15)
npx_data_small <- npx_data1 %>%
filter(!str_detect(SampleID, 'CONT')) %>%
filter(OlinkID %in% first10) %>%
filter(SampleID %in% first15samples)
olink_heatmap_plot(npx_data_small, variable_row_list = 'Treatment')
Function output
An object of class ggplot.
Plot results of t-test (olink_volcano_plot)
The olink_volcano_plot function generates a volcano plot using
results from the olink_ttest function using the function ggplot
and geom_point of the R library ggplot2. The estimated
difference is shown in the x-axis and -log10(p-value) in the
y-axis. A horizontal dotted line indicates p-value = 0.05. Dots are
colored based on Benjamini-Hochberg adjusted p-value cutoff 0.05 and can
optionally be annotated by OlinkID.
Function arguments
- p.val_tbl: a data frame of results generated by
olink_ttest.
- x_lab: Optional. Character value to use as the x-axis label.
- olinkid_list: Optional. Character vector of proteins (OlinkID) to
label in the plot. If not provided, by default the function will label
all significant proteins.
# perform t-test
ttest_results <- olink_ttest(df = npx_data1,
variable = 'Treatment')
# select names of proteins to show
top_10_name <- ttest_results %>%
slice_head(n = 10) %>%
pull(OlinkID)
# volcano plot
olink_volcano_plot(p.val_tbl = ttest_results,
x_lab = 'Treatment',
olinkid_list = top_10_name)
Function output
An object of class ggplot.
Theming function (set_plot_theme)
This function sets a coherent plot theme for plots by adding it to a
ggplot object. It is mainly used for aesthetic reasons.
npx_data1 %>%
filter(OlinkID == 'OID01216') %>%
ggplot(aes(x = Treatment, y = NPX, fill = Treatment)) +
geom_boxplot() +
set_plot_theme()
Color theming (olink_color_discrete, olink_color_gradient,
olink_fill_discrete, olink_fill_gradient)
These functions sets a coherent coloring theme for the plots by
adding it to a ggplot object. It is mainly used for aesthetic
reasons.
npx_data1 %>%
filter(OlinkID == 'OID01216') %>%
ggplot(aes(x = Treatment, y = NPX, fill = Treatment)) +
geom_boxplot() +
set_plot_theme() +
olink_fill_discrete()