Often times, gene expression studies such as microarrays and RNA-Seq result in hundreds to thousands of differentially expressed genes (DEGs). It becomes very difficult to understand the biological significance of such massive data sets, especially when there are multiple conditions and comparisons being analyzed. This package facilitates visualization and downstream analyses of differential gene expression results, using pathway enrichment and protein-protein interaction networks, to aid researchers in uncovering underlying biology and pathophysiology from their gene expression studies.
We have included an example data set of gene expression results in this package
as the object exampleDESeqResults
. This is a list of 2 data frames, generated
using the results()
functions from the package
DESeq2 (Love et al. 2014). The data
is from an RNA-Seq study investigating COVID-19 and non-COVID-19 sepsis patients
at admission (T1) compared to approximately1 week later (T2) in the ICU,
indexed over time (i.e., T2 vs T1) (An et al. 2023).
To install and load the package:
# We'll also be using some functions from dplyr
# BiocManager::install("pathlinkR")
library(dplyr)
library(pathlinkR)
One of the first visualizations commonly performed with gene expression studies
is to identify the number of DEGs. These are typically defined using specific
cutoffs for both fold change and statistical significance. Thresholds of
adjusted p-value <0.05 and absolute fold change >1.5 are used as the default,
though any value can be specified. pathlinkR includes the function
eruption()
to create a volcano plot.
## A quick look at the DESeq2 results table
data("exampleDESeqResults")
knitr::kable(head(exampleDESeqResults[[1]]))
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | |
---|---|---|---|---|---|---|
ENSG00000000938 | 16292.64814 | -0.5624954 | 0.1458274 | -3.857268 | 0.0001147 | 0.0013531 |
ENSG00000002586 | 1719.51750 | 0.4501181 | 0.1520122 | 2.961066 | 0.0030658 | 0.0153820 |
ENSG00000002919 | 870.64168 | -0.2445729 | 0.1249293 | -1.957690 | 0.0502664 | 0.1236844 |
ENSG00000002933 | 266.65476 | 0.8838310 | 0.2093628 | 4.221528 | 0.0000243 | 0.0004313 |
ENSG00000003249 | 11.43282 | 1.3287128 | 0.2881385 | 4.611369 | 0.0000040 | 0.0001200 |
ENSG00000003509 | 207.88545 | -0.1825614 | 0.1763556 | -1.035189 | 0.3005807 | 0.4453252 |
## Generate a volcano plot from the first data frame, with default thresholds
eruption(
rnaseqResult=exampleDESeqResults[[1]],
title=names(exampleDESeqResults[1])
)
There are multiple options available for customizing this volcano plot, including:
Since we’re looking at COVID-19 in this example data set, let’s highlight genes in the interferon signaling pathway. We’ll use a database of Reactome pathways used by the pathway enrichment package Sigora to find the genes involved in interferon signaling. More information on Reactome pathways and the Sigora package can be found in the pathway enrichment section below.
Let’s make the following adjustment to the standard volcano plot:
data("sigoraDatabase")
interferonGenes <- sigoraDatabase %>%
filter(pathwayName == "Interferon Signaling") %>%
pull(ensemblGeneId)
eruption(
rnaseqResult=exampleDESeqResults[[1]],
title=names(exampleDESeqResults[1]),
highlightGenes=interferonGenes,
highlightName="Interferon genes (red)",
label="highlight",
nonlog2=TRUE,
n=10
)
In addition to creating volcano plots, we can also visualize our DEGs using
heatmaps of genes involved in a specific pathways (e.g. one identified as
significant by pathwayEnrichment()
). The function plotFoldChange()
accomplishes this by taking in an input list of DESeq2::results()
data frames,
just like pathwayEnrichment()
, and creating a heatmap of fold changes for the
constituent genes.
plotFoldChange(
exampleDESeqResults,
pathName="Interferon alpha/beta signaling"
)
Another example, with more customization:
exampleDESeqResultsRenamed <- setNames(
exampleDESeqResults,
c("Pos", "Neg")
)
plotFoldChange(
exampleDESeqResultsRenamed,
manualTitle="Signature genes",
genesToPlot=c("CD4", "CD8A","CD8B", "CD28", "ZAP70"),
geneFormat="hgnc",
colSplit=c("Pos", "Neg"),
log2FoldChange=TRUE,
colAngle=45,
clusterRows=FALSE,
clusterColumns=TRUE,
invert=TRUE
)
pathlinkR includes tools for constructing and visualizing Protein-Protein
Interaction (PPI) networks. Here we leverage PPI data gathered from
InnateDB to generate a list of interactions among
DE genes identified in gene expression analyses. These interactions can then be
used to build PPI networks within R, with multiple options for controlling the
type of network, such as support for first, minimum, or zero order networks. The
two main functions used to accomplish this are ppiBuildNetwork
and
ppiPlotNetwork
.
Let’s continue looking at the DEGs from the COVID positive patients over time,
using the significant DEGs to build a PPI network. Since the data frame we’re
inputting includes all measured genes (not just the significant ones), we’ll use
the filterInput=TRUE
option to ensure the network is made only with those
genes which pass the standard thresholds (defined above). Since we’re
visualizing a network of DEGs, let’s colour the nodes to indicate the direction
of their dysregulation (i.e. up- or down-regulated) by specifying
fillType="foldChange"
.
exNetwork <- ppiBuildNetwork(
rnaseqResult=exampleDESeqResults[[1]],
filterInput=TRUE,
order="zero"
)
ppiPlotNetwork(
network=exNetwork,
title=names(exampleDESeqResults)[1],
fillColumn=LogFoldChange,
fillType="foldChange",
label=TRUE,
labelColumn=hgncSymbol,
legend=TRUE
)
The nodes with blue labels (e.g. STAT1, FBXO6, CDH1, etc.) are hubs within the
network; i.e. those genes which have a high betweenness score. The statistic
used to determine hub nodes can be set in ppiBuildNetwork()
with the
“hubMeasure” option.
Let’s make some tweaks to this standard network. The output of
ppiBuildNetwork()
is a tidygraph object, meaning we can use the suite of
dplyr verbs to make changes. So let’s add a column to denote if a DEG is
from the interferon pathway (defined above) and highlight only those genes.
exNetworkInterferon <- mutate(
exNetwork,
isInterferon = if_else(name %in% interferonGenes, "y", "n")
)
ppiPlotNetwork(
network=exNetworkInterferon,
title=names(exampleDESeqResults)[1],
fillColumn=isInterferon,
fillType="categorical",
catFillColours=c("y"="red", "n"="grey"),
label=TRUE,
labelColumn=hgncSymbol,
legend=TRUE,
legendTitle="Interferon\ngenes"
)
pathlinkR includes two functions for further analyzing PPI networks. First,
ppiEnrichNetwork()
will use the node table from a network to test for enriched
Reactome pathways or Hallmark gene sets (see the next section for more detail on
the pathway enrichment methods):
exNetworkPathways <- ppiEnrichNetwork(
network=exNetwork,
analysis="hallmark",
filterResults="default",
geneUniverse = rownames(exampleDESeqResults[[1]])
)
exNetworkPathways
# A tibble: 5 × 10
pathwayId pathwayName pValue pValueAdjusted genes numCandidateGenes
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 INTERFERON GAMMA R… INTERFERON… 1.68e-7 0.00000791 HLA-… 45
2 INTERFERON ALPHA R… INTERFERON… 2.20e-6 0.0000518 IRF7… 30
3 TNFA SIGNALING VIA… TNFA SIGNA… 3.58e-5 0.000560 CEBP… 32
4 APOPTOSIS APOPTOSIS 1.01e-3 0.0118 CASP… 21
5 INFLAMMATORY RESPO… INFLAMMATO… 1.98e-3 0.0186 MXD1… 28
# ℹ 4 more variables: numBgGenes <dbl>, geneRatio <dbl>, totalGenes <int>,
# topLevelPathway <chr>
Second, the function ppiExtractSubnetwork()
can extract a minimally-connected
subnetwork from a starting network, using the genes from an enriched pathway as
the “starting” nodes for extraction. For example, below we use the results from
the Hallmark enrichment above to pull out a subnetwork of genes from the
“Interferon Gamma Response” term, then plot this reduced network while
highlighting the genes from the pathway:
exSubnetwork <- ppiExtractSubnetwork(
network=exNetwork,
pathwayEnrichmentResult=exNetworkPathways,
pathwayToExtract="INTERFERON GAMMA RESPONSE"
)
ppiPlotNetwork(
network=exSubnetwork,
fillType="oneSided",
fillColumn=degree,
label=TRUE,
labelColumn=hgncSymbol,
legendTitle = "Degree"
)
Alternatively you can use the “genesToExtract” argument in
ppiExtractSubnetwork()
to supply your own set of genes (as Ensembl IDs) to
extract as a subnetwork.
The essence of pathway enrichment is the concept of over-representation: that is, are there more genes belonging to a specific pathway present in our DEG list than we would expect to find by chance? To calculate this, the simplest method is to compare the ratio of DEGs in some pathway to all DEGs, and all genes in tat pathway to all genes in all pathways in a database. pathlinkR mainly uses the Reactome database (Fabregat et al. 2017) for this purpose.
One issue that can occur with over-representation analysis is the assumption that each gene in each pathway has “equal” value in belonging to that pathway. In reality, a single protein can have multiple (and sometimes very different) functions, and belong to multiple pathways, like protein kinases. There are also pathways that have substantial overlap with cellular machinery, like the TLR pathways. This can lead to enrichment of multiple similar pathways or even unrelated “false-positives” that make parsing through the results very difficult.
One solution is to use unique gene pairs, as described by the creators of the
package Sigora (Foroushani et al.
2013). This methodology decreases the number of similar and unrelated pathways
from promiscuous genes, focusing more on the pathways that are likely related to
the underlying biology. This approach is the default used in the pathlinkR
function pathwayEnrichment()
.
As an alternative, pathlinkR also support gene set enrichment analysis, using the implementation from fgsea (Korotkevich et al. 2019). For this method, both Reactome and Hallmark data can used to construct the gene sets, and the same input types are supported.
The pathwayEnrichment()
function takes as input a list of data frames (each
from DESeq2::results()
), and by default will split the genes into up- and
down-regulated before performing pathway enrichment one each set. The name of
the data frames in the list should indicate the comparison that was made in the
DESeq2 results, as it will be used to identify the results. For
analysis="sigora"
we also need to provide a Gene Pair Signature Repository
(gpsRepo
) which contains the pathways and gene pairs to be tested. Leaving
this argument to “default” will use the reaH
GPS repository from
Sigora, containing human Reactome pathways. Alternatively one can supply
their own GPS repository; see ?sigora::makeGPS()
for details on how to
make one.
## Note the structure of `exampleDESeqResults`: a named list of "DESeqResults"
## objects, with genes identified by Ensembl IDs
exampleDESeqResults
$`COVID Pos Over Time`
DataFrame with 5000 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000938 16292.6481 -0.562495 0.145827 -3.85727 1.14662e-04
ENSG00000002586 1719.5175 0.450118 0.152012 2.96107 3.06577e-03
ENSG00000002919 870.6417 -0.244573 0.124929 -1.95769 5.02664e-02
ENSG00000002933 266.6548 0.883831 0.209363 4.22153 2.42652e-05
ENSG00000003249 11.4328 1.328713 0.288138 4.61137 4.00026e-06
... ... ... ... ... ...
ENSG00000284882 121.6850 -0.5997942 0.204597 -2.9315844 3.37238e-03
ENSG00000284948 38.7903 -0.1786622 0.179800 -0.9936710 3.20383e-01
ENSG00000284977 73.4571 -0.7469480 0.182492 -4.0930511 4.25734e-05
ENSG00000285417 25.7393 -0.0180603 0.304676 -0.0592769 9.52732e-01
ENSG00000285444 17.3871 -0.3702265 0.264241 -1.4010952 1.61186e-01
padj
<numeric>
ENSG00000000938 0.001353100
ENSG00000002586 0.015382030
ENSG00000002919 0.123684426
ENSG00000002933 0.000431269
ENSG00000003249 0.000119958
... ...
ENSG00000284882 0.016523420
ENSG00000284948 0.465445666
ENSG00000284977 0.000659367
ENSG00000285417 0.970172528
ENSG00000285444 0.286419134
$`COVID Neg Over Time`
DataFrame with 5000 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000938 16292.6481 -0.967706 0.188265 -5.140117 2.74567e-07
ENSG00000002586 1719.5175 0.724006 0.196562 3.683353 2.30186e-04
ENSG00000002919 870.6417 -0.101169 0.161486 -0.626486 5.30996e-01
ENSG00000002933 266.6548 1.010147 0.274661 3.677795 2.35259e-04
ENSG00000003249 11.4328 0.647755 0.412732 1.569433 1.16547e-01
... ... ... ... ... ...
ENSG00000284882 121.6850 -0.427594 0.267027 -1.60132 1.09307e-01
ENSG00000284948 38.7903 -0.291125 0.234652 -1.24067 2.14728e-01
ENSG00000284977 73.4571 -1.049690 0.235115 -4.46458 8.02262e-06
ENSG00000285417 25.7393 0.490122 0.420899 1.16446 2.44236e-01
ENSG00000285444 17.3871 0.363210 0.351279 1.03397 3.01152e-01
padj
<numeric>
ENSG00000000938 2.39613e-05
ENSG00000002586 2.63737e-03
ENSG00000002919 6.61318e-01
ENSG00000002933 2.68493e-03
ENSG00000003249 2.29349e-01
... ...
ENSG00000284882 0.218554310
ENSG00000284948 0.353434564
ENSG00000284977 0.000245775
ENSG00000285417 0.386061932
ENSG00000285444 0.446699110
enrichedResultsSigora <- pathwayEnrichment(
inputList=exampleDESeqResults,
analysis="sigora",
filterInput=TRUE,
gpsRepo="default"
)
head(enrichedResultsSigora)
# A tibble: 6 × 12
comparison direction pathwayId pathwayName pValue pValueAdjusted genes
<chr> <chr> <chr> <fct> <dbl> <dbl> <chr>
1 COVID Pos Over … Up R-HSA-38… Chemokine … 7.04e-46 7.05e-43 CCR3…
2 COVID Pos Over … Up R-HSA-19… Immunoregu… 5.46e-45 5.46e-42 CD1A…
3 COVID Pos Over … Up R-HSA-38… Costimulat… 2.56e-40 2.56e-37 CD28…
4 COVID Pos Over … Up R-HSA-67… Neutrophil… 7.88e-31 7.89e-28 ABCA…
5 COVID Pos Over … Up R-HSA-20… Cell surfa… 3.55e-21 3.55e-18 ATP1…
6 COVID Pos Over … Up R-HSA-14… Alpha-defe… 1.02e-20 1.02e-17 CD4;…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <int>,
# geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>
If you prefer to test the up- and down-regulated genes together, simply set the
option split=FALSE
:
enrichedResultsSigoraNoSplit <- pathwayEnrichment(
inputList=exampleDESeqResults,
analysis="sigora",
filterInput=TRUE,
gpsRepo="default",
split=FALSE
)
head(enrichedResultsSigoraNoSplit)
# A tibble: 6 × 12
comparison direction pathwayId pathwayName pValue pValueAdjusted genes
<chr> <chr> <chr> <fct> <dbl> <dbl> <chr>
1 COVID Pos Over… All R-HSA-67… Neutrophil… 1.77e-197 1.77e-194 ABCA…
2 COVID Pos Over… All R-HSA-90… Interferon… 5.55e-168 5.55e-165 ADAR…
3 COVID Pos Over… All R-HSA-91… Interferon… 5.87e- 72 5.88e- 69 ADAR…
4 COVID Pos Over… All R-HSA-19… Immunoregu… 3.44e- 66 3.45e- 63 CD1A…
5 COVID Pos Over… All R-HSA-87… Interferon… 2.56e- 55 2.56e- 52 FCGR…
6 COVID Pos Over… All R-HSA-38… Chemokine … 4.94e- 53 4.94e- 50 CCR1…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <int>,
# geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>
For those who still prefer traditional over-representation analysis, we include
the option of doing so by setting analysis="reactomepa"
, which uses
ReactomePA (Yu et al.
2016). When using this method, we recommend providing a gene universe to serve
as a background for the enrichment test; here we’ll use all the genes which were
tested for significance by DESeq2 (i.e. all genes from the count matrix),
converting them to Entrez gene IDs before running the test.
data("mappingFile")
exGeneUniverse <- mappingFile %>%
filter(ensemblGeneId %in% rownames(exampleDESeqResults[[1]])) %>%
pull(entrezGeneId)
enrichedResultsRpa <- pathwayEnrichment(
inputList=exampleDESeqResults,
analysis="reactomepa",
filterInput=TRUE,
split=TRUE,
geneUniverse=exGeneUniverse
)
head(enrichedResultsRpa)
# A tibble: 6 × 12
comparison direction pathwayId pathwayName pValue pValueAdjusted genes
<chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 COVID Pos Over … Up R-HSA-20… Generation… 3.63e-10 0.000000219 PLCG…
2 COVID Pos Over … Up R-HSA-20… Translocat… 1.51e- 9 0.000000302 HLA-…
3 COVID Pos Over … Up R-HSA-38… PD-1 signa… 1.51e- 9 0.000000302 HLA-…
4 COVID Pos Over … Up R-HSA-20… Phosphoryl… 1.22e- 7 0.0000184 HLA-…
5 COVID Pos Over … Up R-HSA-38… Costimulat… 1.22e- 6 0.000146 CD28…
6 COVID Pos Over … Up R-HSA-20… TCR signal… 3.61e- 6 0.000321 PLCG…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <dbl>,
# geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>
In addition to the Reactome database used when setting analysis
to “sigora” or
“reactomepa”, we also provide over-representation analysis using the Hallmark
gene sets from
the Molecular Signatures Database (MSigDb). These are 50 gene sets that
represent “specific, well-defined biological states or processes with coherent
expression” (Liberzon et al. 2015). This database provides a more high-level
summary of key biological processes compared to the more granular Reactome
pathways.
enrichedResultsHm <- pathwayEnrichment(
inputList=exampleDESeqResults,
analysis="hallmark",
filterInput=TRUE,
split=TRUE
)
head(enrichedResultsHm)
# A tibble: 6 × 12
comparison direction pathwayId pathwayName pValue pValueAdjusted genes
<chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 COVID Pos Over … Up HEME MET… HEME METAB… 2.69e-32 1.34e-30 SLC4…
2 COVID Pos Over … Up IL2 STAT… IL2 STAT5 … 5.89e- 4 1.47e- 2 PLPP…
3 COVID Pos Over … Down INTERFER… INTERFERON… 3.14e-30 1.48e-28 EIF2…
4 COVID Pos Over … Down INTERFER… INTERFERON… 2.37e-28 5.58e-27 EIF2…
5 COVID Pos Over … Down INFLAMMA… INFLAMMATO… 6.83e- 9 1.07e- 7 KIF1…
6 COVID Pos Over … Down TNFA SIG… TNFA SIGNA… 1.75e- 7 2.06e- 6 MXD1…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <dbl>,
# geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>
Now that we have (a lot of) pathway enrichment results from multiple
comparisons, its time to visualize them. The function plotPathways
does this
by grouping Reactome pathways (or Hallmark gene sets) under parent groups, and
indicates if each pathway is up- or down-regulated in each comparison, making it
easy to identify which pathways are shared or unique to different DEG lists.
Because there are often many pathways, you can split the plot into multiple
columns (up to 3), and truncate the pathway names to make the results fit more
easily.
Sometimes a pathway may be enriched in both up- and down-regulated genes from the same DEG list (these usually occur with larger pathways). Such occurrences are indicated by a white asterisk where the more significant (lower adjusted p-value) direction is displayed. You can also change the angle/labels of the comparisons, or add the number of DEGs in each comparison below the labels. Lastly, you can specify which pathways or top pathway groups to include for visualization.
pathwayPlots(
pathwayEnrichmentResults=enrichedResultsSigora,
columns=2
)
Let’s again plot the sigora results, this time with the following tweaks:
pathwayPlots(
pathwayEnrichmentResults=enrichedResultsSigora,
specificTopPathways="Immune System",
colourValues=c("#440154", "#FDE725"),
newGroupNames=c("COVID\nPositive", "COVID\nNegative"),
showNumGenes=TRUE,
xAngle="horizontal",
nameWidth=50
)
Joining with `by = join_by(comparison)`
From these results, you can see that while many of the immune system pathways change in the same direction over time in COVID-19 and non-COVID-19 sepsis patients, a few unique ones stand out, mostly related to interferon signaling (“Interferon Signaling”, “Interferon gamma signaling”, “Interferon alpha/beta signaling”, “ISG15 antiviral mechanism”). This likely reflects an elevated early antiviral response in COVID-19 patients that decreased over time, compared to no change in non-COVID-19 sepsis patients.
pathlinkR includes functions for turning the pathway enrichment results from either Reactome-based method (“sigora” or “reactomepa”) into networks, using the overlap of the genes assigned to each pathway to determine their similarity to one other. In these networks, each pathway is a node, with connections or edges between them determined via a distance measure. A threshold can be set, where two pathways with a minimum similarity measure are considered connected, and would have an edge drawn between their nodes.
We provide a pre-computed distance matrix of Reactome pathways, generated using
Jaccard distance, but there is support for multiple distance measures to be
used. Once this “foundation” of pathway interactions is created, a pathway
network can be built using the createPathnet
function:
data("sigoraExamples")
pathwayDistancesJaccard <- getPathwayDistances(pathwayData = sigoraDatabase)
Running distance calculations (this may take a while)...
startingPathways <- pathnetFoundation(
mat=pathwayDistancesJaccard,
maxDistance=0.8
)
# Get the enriched pathways from the "COVID Pos Over Time" comparison
exPathwayNetworkInput <- sigoraExamples %>%
filter(comparison == "COVID Pos Over Time")
myPathwayNetwork <- pathnetCreate(
pathwayEnrichmentResult=exPathwayNetworkInput,
foundation=startingPathways
)
There are two options for visualization, the first being a static network:
pathnetGGraph(
myPathwayNetwork,
labelProp=0.1,
nodeLabelSize=4,
nodeLabelOverlaps=8,
segColour="red"
)
Nodes (pathways) which are filled in are enriched pathways (i.e. those output by
pathwayEnrichment()
). Size of nodes is correlated with statistical
significance, while edge thickness relates to the similarity of two connected
pathways.
Though this type of visualization is useful, we can also display this network using an alternate method that creates an interactive display:
pathnetVisNetwork(myPathwayNetwork)
Nodes within this network can be dragged around and re-positioned. Or hold the SHIFT key while clicking-and-dragging to select (and move) multiple nodes at once. Clicking on a single node will highlight its direct neighbours, and the dropdown at the top-left corner will highlight pathways from a specific category (e.g. Immune/Hemostasis).
Another option is to use only DEGs when generating the pathway network information, as outlined below:
candidateData <- sigoraExamples %>%
filter(comparison == "COVID Pos Over Time") %>%
select(pathwayId, genes) %>%
tidyr::separate_rows(genes, sep=";") %>%
left_join(
sigoraDatabase,
by=c("pathwayId", "genes" = "hgncSymbol"),
multiple="all"
) %>%
relocate(pathwayId, ensemblGeneId, "hgncSymbol"=genes, pathwayName) %>%
distinct()
## Now that we have a smaller table in the same format as sigoraDatabase, we
## can construct our own matrix of pathway distances
candidateDistData <- getPathwayDistances(
pathwayData=candidateData,
distMethod="jaccard"
)
candidateStartingPathways <- pathnetFoundation(
mat=candidateDistData,
maxDistance=0.9
)
candidatesAsNetwork <- pathnetCreate(
pathwayEnrichmentResult=filter(
sigoraExamples,
comparison == "COVID Pos Over Time"
),
foundation=candidateStartingPathways,
trim=FALSE
)
pathnetVisNetwork(candidatesAsNetwork, nodeLabelSize=30)
Sigora uses a Gene-Pair Signature (GPS) Repository that stores information on which gene pairs are unique for which pathways. We recommend using the one already loaded with pathlinkR, which is “reaH” (Reactome Human). You can also generate your own GPS repo using Sigora’s own functions and a custom set of pathways (e.g. from another pathway database like GO or KEGG). Please consult the Sigora documentation on how to generate your custom GPS repository.
Because there are now multiple gene pairs vs. single genes, the gene-pair
“universe” is greatly increased and it is more likely for a result to be
significant. Therefore, the cutoff threshold for significance is more stringent
(adjusted p-value < 0.001) and a more conservative adjustment method
(Bonferroni) is used. For regular over-representation analysis, a less
conservative adjustment method (Benjamini-Hochberg) is used with adjusted
p-value < 0.05. These are automatically set with filterResults="default"
. You
can adjust these cut-offs by setting filterResults
to different values between
0 and 1, or 1 if you want all the pathways (this may be useful for comparing
which enriched genes appear in which comparisons, even if the enrichment is not
significant).
An AY, Baghela AS, Falsafi R, Lee AH, Trahtemberg U, Baker AJ, dos Santos CC, Hancock REW. Severe COVID-19 and non-COVID-19 severe sepsis converge transcriptionally after a week in the intensive care unit, indicating common disease mechanisms. Front Immunol. 2023;6(14):1167917.
Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform. 2017;18:142.
Foroushani ABK, Brinkman FSL, Lynn DJ. Pathway-GPS and sigora: identifying relevant pathways based on the over-representation of their gene-pair signatures. PeerJ. 2013;1:e229.
Korotkevich G, Sukhov V, Sergushichev A (2019). “Fast gene set enrichment analysis.” bioRxiv. doi:10.1101/060012, http://biorxiv.org/content/early/2016/06/20/060012
Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
Yu G, He QY. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol Biosyst. 2016;12(2):477-9.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
time zone: America/Vancouver
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DESeq2_1.46.0 SummarizedExperiment_1.36.0
[3] Biobase_2.66.0 MatrixGenerics_1.18.0
[5] matrixStats_1.4.1 GenomicRanges_1.58.0
[7] GenomeInfoDb_1.42.0 IRanges_2.40.0
[9] S4Vectors_0.44.0 BiocGenerics_0.52.0
[11] pathlinkR_1.2.0 dplyr_1.1.4
[13] BiocStyle_2.34.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_1.8.9
[4] shape_1.4.6.1 magrittr_2.0.3 ggtangle_0.0.4
[7] magick_2.8.5 farver_2.1.2 rmarkdown_2.28
[10] GlobalOptions_0.1.2 fs_1.6.5 zlibbioc_1.52.0
[13] vctrs_0.6.5 memoise_2.0.1 ggtree_3.14.0
[16] rstatix_0.7.2 tinytex_0.53 htmltools_0.5.8.1
[19] S4Arrays_1.6.0 broom_1.0.7 Formula_1.2-5
[22] gridGraphics_0.5-1 SparseArray_1.6.0 sass_0.4.9
[25] bslib_0.8.0 htmlwidgets_1.6.4 plyr_1.8.9
[28] cachem_1.1.0 igraph_2.1.1 lifecycle_1.0.4
[31] iterators_1.0.14 pkgconfig_2.0.3 gson_0.1.0
[34] Matrix_1.7-1 R6_2.5.1 fastmap_1.2.0
[37] GenomeInfoDbData_1.2.13 clue_0.3-65 aplot_0.2.3
[40] enrichplot_1.26.1 digest_0.6.37 colorspace_2.1-1
[43] patchwork_1.3.0 AnnotationDbi_1.68.0 RSQLite_2.3.7
[46] ggpubr_0.6.0 vegan_2.6-8 labeling_0.4.3
[49] fansi_1.0.6 httr_1.4.7 polyclip_1.10-7
[52] abind_1.4-8 mgcv_1.9-1 compiler_4.4.1
[55] bit64_4.5.2 withr_3.0.2 doParallel_1.0.17
[58] backports_1.5.0 BiocParallel_1.40.0 carData_3.0-5
[61] viridis_0.6.5 DBI_1.2.3 highr_0.11
[64] ggforce_0.4.2 R.utils_2.12.3 ggsignif_0.6.4
[67] MASS_7.3-61 DelayedArray_0.32.0 rjson_0.2.23
[70] permute_0.9-7 tools_4.4.1 ape_5.8
[73] R.oo_1.26.0 glue_1.8.0 nlme_3.1-166
[76] GOSemSim_2.32.0 grid_4.4.1 reshape2_1.4.4
[79] cluster_2.1.6 fgsea_1.32.0 generics_0.1.3
[82] gtable_0.3.6 R.methodsS3_1.8.2 tidyr_1.3.1
[85] data.table_1.16.2 car_3.1-3 tidygraph_1.3.1
[88] utf8_1.2.4 XVector_0.46.0 ggrepel_0.9.6
[91] foreach_1.5.2 pillar_1.9.0 stringr_1.5.1
[94] yulab.utils_0.1.7 circlize_0.4.16 splines_4.4.1
[97] tweenr_2.0.3 treeio_1.30.0 lattice_0.22-6
[100] bit_4.5.0 tidyselect_1.2.1 GO.db_3.20.0
[103] ComplexHeatmap_2.22.0 locfit_1.5-9.10 Biostrings_2.74.0
[106] knitr_1.48 gridExtra_2.3 bookdown_0.41
[109] xfun_0.49 graphlayouts_1.2.0 visNetwork_2.1.2
[112] stringi_1.8.4 UCSC.utils_1.2.0 lazyeval_0.2.2
[115] ggfun_0.1.7 yaml_2.3.10 evaluate_1.0.1
[118] codetools_0.2-20 ggraph_2.2.1 tibble_3.2.1
[121] qvalue_2.38.0 BiocManager_1.30.25 ggplotify_0.1.2
[124] cli_3.6.3 munsell_0.5.1 jquerylib_0.1.4
[127] Rcpp_1.0.13 png_0.1-8 parallel_4.4.1
[130] ggplot2_3.5.1 blob_1.2.4 clusterProfiler_4.14.0
[133] DOSE_4.0.0 tidytree_0.4.6 viridisLite_0.4.2
[136] sigora_3.1.1 scales_1.3.0 purrr_1.0.2
[139] crayon_1.5.3 GetoptLong_1.0.5 rlang_1.1.4
[142] cowplot_1.1.3 fastmatch_1.1-4 KEGGREST_1.46.0