1 Introduction
2 Installation
3 Visualizing RNA-Seq data with volcano plots
4 Visualizing fold changes across comparisons
5 Building and visualizing PPI networks
- 5.1 Enriching networks and extracting subnetworks
6 Performing pathway enrichment
7 Plotting pathway enrichment results
8 Generating networks from enriched pathways
- 8.1 Pathway networks using only DEGs
9 Supplemental materials
- 9.1 Gene-pair signatures
- 9.2 Why are there different p-value cut-offs for sigora vs. ReactomePA/Hallmark?
10 Citations
11 Session information

1 Introduction

Often times, gene expression studies such as microarrays and RNA-Seq result in hundreds to thousands of differentially expressed genes (DEGs). It becomes very difficult to understand the biological significance of such massive data sets, especially when there are multiple conditions and comparisons being analyzed. This package facilitates visualization and downstream analyses of differential gene expression results, using pathway enrichment and protein-protein interaction networks, to aid researchers in uncovering underlying biology and pathophysiology from their gene expression studies.

We have included an example data set of gene expression results in this package as the object exampleDESeqResults. This is a list of 2 data frames, generated using the results() functions from the package DESeq2 (Love et al. 2014). The data is from an RNA-Seq study investigating COVID-19 and non-COVID-19 sepsis patients at admission (T1) compared to approximately1 week later (T2) in the ICU, indexed over time (i.e., T2 vs T1) (An et al. 2023).

2 Installation

To install and load the package:

# We'll also be using some functions from dplyr
# BiocManager::install("pathlinkR")
library(dplyr)
library(pathlinkR)

3 Visualizing RNA-Seq data with volcano plots

One of the first visualizations commonly performed with gene expression studies is to identify the number of DEGs. These are typically defined using specific cutoffs for both fold change and statistical significance. Thresholds of adjusted p-value <0.05 and absolute fold change >1.5 are used as the default, though any value can be specified. pathlinkR includes the function eruption() to create a volcano plot.

## A quick look at the DESeq2 results table
data("exampleDESeqResults")
knitr::kable(head(exampleDESeqResults[[1]]))

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ENSG00000000938	16292.64814	-0.5624954	0.1458274	-3.857268	0.0001147	0.0013531
ENSG00000002586	1719.51750	0.4501181	0.1520122	2.961066	0.0030658	0.0153820
ENSG00000002919	870.64168	-0.2445729	0.1249293	-1.957690	0.0502664	0.1236844
ENSG00000002933	266.65476	0.8838310	0.2093628	4.221528	0.0000243	0.0004313
ENSG00000003249	11.43282	1.3287128	0.2881385	4.611369	0.0000040	0.0001200
ENSG00000003509	207.88545	-0.1825614	0.1763556	-1.035189	0.3005807	0.4453252

## Generate a volcano plot from the first data frame, with default thresholds
eruption(
    rnaseqResult=exampleDESeqResults[[1]],
    title=names(exampleDESeqResults[1])
)

There are multiple options available for customizing this volcano plot, including:

Adjust the cutoffs for adjusted p-value and fold change
Switch the colours of significant and non-significant genes
Change the number of labelled genes
Adjust the x- and y-axis limits
Plot either log2 or non-log2 fold change (for a better sense of magnitude)
Highlight specific gene sets of interest

Since we’re looking at COVID-19 in this example data set, let’s highlight genes in the interferon signaling pathway. We’ll use a database of Reactome pathways used by the pathway enrichment package Sigora to find the genes involved in interferon signaling. More information on Reactome pathways and the Sigora package can be found in the pathway enrichment section below.

Let’s make the following adjustment to the standard volcano plot:

Highlight the interferon genes and provide a brief description
Converts the x-axis from log2 fold change to non-log2 fold change
Label only the top 10 up- and down-regulated genes which belong to the interferon pathway

data("sigoraDatabase")
interferonGenes <- sigoraDatabase %>% 
    filter(pathwayName == "Interferon Signaling") %>% 
    pull(ensemblGeneId)

eruption(
    rnaseqResult=exampleDESeqResults[[1]],
    title=names(exampleDESeqResults[1]),
    highlightGenes=interferonGenes,
    highlightName="Interferon genes (red)",
    label="highlight",
    nonlog2=TRUE,
    n=10
)

4 Visualizing fold changes across comparisons

In addition to creating volcano plots, we can also visualize our DEGs using heatmaps of genes involved in a specific pathways (e.g. one identified as significant by pathwayEnrichment()). The function plotFoldChange() accomplishes this by taking in an input list of DESeq2::results() data frames, just like pathwayEnrichment(), and creating a heatmap of fold changes for the constituent genes.

plotFoldChange(
    exampleDESeqResults,
    pathName="Interferon alpha/beta signaling"
)

Another example, with more customization:

Shorten the names of each comparison
Provide a custom list of genes to visualize, and add an informative title
Use log2 fold changes in the heatmap cells
Swap the rows and columns from the default, and add a split between the rows (comparisons)

exampleDESeqResultsRenamed <- setNames(
    exampleDESeqResults,
    c("Pos", "Neg")
)

plotFoldChange(
    exampleDESeqResultsRenamed,
    manualTitle="Signature genes",
    genesToPlot=c("CD4", "CD8A","CD8B", "CD28", "ZAP70"),
    geneFormat="hgnc",
    colSplit=c("Pos", "Neg"),
    log2FoldChange=TRUE,
    colAngle=45,
    clusterRows=FALSE,
    clusterColumns=TRUE,
    invert=TRUE
)

5 Building and visualizing PPI networks

pathlinkR includes tools for constructing and visualizing Protein-Protein Interaction (PPI) networks. Here we leverage PPI data gathered from InnateDB to generate a list of interactions among DE genes identified in gene expression analyses. These interactions can then be used to build PPI networks within R, with multiple options for controlling the type of network, such as support for first, minimum, or zero order networks. The two main functions used to accomplish this are ppiBuildNetwork and ppiPlotNetwork.

Let’s continue looking at the DEGs from the COVID positive patients over time, using the significant DEGs to build a PPI network. Since the data frame we’re inputting includes all measured genes (not just the significant ones), we’ll use the filterInput=TRUE option to ensure the network is made only with those genes which pass the standard thresholds (defined above). Since we’re visualizing a network of DEGs, let’s colour the nodes to indicate the direction of their dysregulation (i.e. up- or down-regulated) by specifying fillType="foldChange".

exNetwork <- ppiBuildNetwork(
    rnaseqResult=exampleDESeqResults[[1]],
    filterInput=TRUE,
    order="zero"
)

ppiPlotNetwork(
    network=exNetwork,
    title=names(exampleDESeqResults)[1],
    fillColumn=LogFoldChange,
    fillType="foldChange",
    label=TRUE,
    labelColumn=hgncSymbol,
    legend=TRUE
)

The nodes with blue labels (e.g. STAT1, FBXO6, CDH1, etc.) are hubs within the network; i.e. those genes which have a high betweenness score. The statistic used to determine hub nodes can be set in ppiBuildNetwork() with the “hubMeasure” option.

Let’s make some tweaks to this standard network. The output of ppiBuildNetwork() is a tidygraph object, meaning we can use the suite of dplyr verbs to make changes. So let’s add a column to denote if a DEG is from the interferon pathway (defined above) and highlight only those genes.

exNetworkInterferon <- mutate(
    exNetwork,
    isInterferon = if_else(name %in% interferonGenes, "y", "n")
)

ppiPlotNetwork(
    network=exNetworkInterferon,
    title=names(exampleDESeqResults)[1],
    fillColumn=isInterferon,
    fillType="categorical",
    catFillColours=c("y"="red", "n"="grey"),
    label=TRUE,
    labelColumn=hgncSymbol,
    legend=TRUE,
    legendTitle="Interferon\ngenes"
)

5.1 Enriching networks and extracting subnetworks

pathlinkR includes two functions for further analyzing PPI networks. First, ppiEnrichNetwork() will use the node table from a network to test for enriched Reactome pathways or Hallmark gene sets (see the next section for more detail on the pathway enrichment methods):

exNetworkPathways <- ppiEnrichNetwork(
    network=exNetwork, 
    analysis="hallmark",
    filterResults="default",
    geneUniverse = rownames(exampleDESeqResults[[1]])
)

exNetworkPathways

# A tibble: 5 × 10
  pathwayId           pathwayName  pValue pValueAdjusted genes numCandidateGenes
  <chr>               <chr>         <dbl>          <dbl> <chr>             <dbl>
1 INTERFERON GAMMA R… INTERFERON… 1.68e-7     0.00000791 HLA-…                45
2 INTERFERON ALPHA R… INTERFERON… 2.20e-6     0.0000518  IRF7…                30
3 TNFA SIGNALING VIA… TNFA SIGNA… 3.58e-5     0.000560   CEBP…                32
4 APOPTOSIS           APOPTOSIS   1.01e-3     0.0118     CASP…                21
5 INFLAMMATORY RESPO… INFLAMMATO… 1.98e-3     0.0186     MXD1…                28
# ℹ 4 more variables: numBgGenes <dbl>, geneRatio <dbl>, totalGenes <int>,
#   topLevelPathway <chr>

Second, the function ppiExtractSubnetwork() can extract a minimally-connected subnetwork from a starting network, using the genes from an enriched pathway as the “starting” nodes for extraction. For example, below we use the results from the Hallmark enrichment above to pull out a subnetwork of genes from the “Interferon Gamma Response” term, then plot this reduced network while highlighting the genes from the pathway:

exSubnetwork <- ppiExtractSubnetwork(
    network=exNetwork,
    pathwayEnrichmentResult=exNetworkPathways,
    pathwayToExtract="INTERFERON GAMMA RESPONSE"
)

ppiPlotNetwork(
    network=exSubnetwork,
    fillType="oneSided",
    fillColumn=degree,
    label=TRUE,
    labelColumn=hgncSymbol,
    legendTitle = "Degree"
)

Alternatively you can use the “genesToExtract” argument in ppiExtractSubnetwork() to supply your own set of genes (as Ensembl IDs) to extract as a subnetwork.

6 Performing pathway enrichment

The essence of pathway enrichment is the concept of over-representation: that is, are there more genes belonging to a specific pathway present in our DEG list than we would expect to find by chance? To calculate this, the simplest method is to compare the ratio of DEGs in some pathway to all DEGs, and all genes in tat pathway to all genes in all pathways in a database. pathlinkR mainly uses the Reactome database (Fabregat et al. 2017) for this purpose.

One issue that can occur with over-representation analysis is the assumption that each gene in each pathway has “equal” value in belonging to that pathway. In reality, a single protein can have multiple (and sometimes very different) functions, and belong to multiple pathways, like protein kinases. There are also pathways that have substantial overlap with cellular machinery, like the TLR pathways. This can lead to enrichment of multiple similar pathways or even unrelated “false-positives” that make parsing through the results very difficult.

One solution is to use unique gene pairs, as described by the creators of the package Sigora (Foroushani et al. 2013). This methodology decreases the number of similar and unrelated pathways from promiscuous genes, focusing more on the pathways that are likely related to the underlying biology. This approach is the default used in the pathlinkR function pathwayEnrichment().

As an alternative, pathlinkR also support gene set enrichment analysis, using the implementation from fgsea (Korotkevich et al. 2019). For this method, both Reactome and Hallmark data can used to construct the gene sets, and the same input types are supported.

The pathwayEnrichment() function takes as input a list of data frames (each from DESeq2::results()), and by default will split the genes into up- and down-regulated before performing pathway enrichment one each set. The name of the data frames in the list should indicate the comparison that was made in the DESeq2 results, as it will be used to identify the results. For analysis="sigora" we also need to provide a Gene Pair Signature Repository (gpsRepo) which contains the pathways and gene pairs to be tested. Leaving this argument to “default” will use the reaH GPS repository from Sigora, containing human Reactome pathways. Alternatively one can supply their own GPS repository; see ?sigora::makeGPS() for details on how to make one.

## Note the structure of `exampleDESeqResults`: a named list of "DESeqResults"
## objects, with genes identified by Ensembl IDs
exampleDESeqResults

$`COVID Pos Over Time`
 
 
DataFrame with 5000 rows and 6 columns
                  baseMean log2FoldChange     lfcSE       stat      pvalue
                 <numeric>      <numeric> <numeric>  <numeric>   <numeric>
ENSG00000000938 16292.6481      -0.562495  0.145827   -3.85727 1.14662e-04
ENSG00000002586  1719.5175       0.450118  0.152012    2.96107 3.06577e-03
ENSG00000002919   870.6417      -0.244573  0.124929   -1.95769 5.02664e-02
ENSG00000002933   266.6548       0.883831  0.209363    4.22153 2.42652e-05
ENSG00000003249    11.4328       1.328713  0.288138    4.61137 4.00026e-06
...                    ...            ...       ...        ...         ...
ENSG00000284882   121.6850     -0.5997942  0.204597 -2.9315844 3.37238e-03
ENSG00000284948    38.7903     -0.1786622  0.179800 -0.9936710 3.20383e-01
ENSG00000284977    73.4571     -0.7469480  0.182492 -4.0930511 4.25734e-05
ENSG00000285417    25.7393     -0.0180603  0.304676 -0.0592769 9.52732e-01
ENSG00000285444    17.3871     -0.3702265  0.264241 -1.4010952 1.61186e-01
                       padj
                  <numeric>
ENSG00000000938 0.001353100
ENSG00000002586 0.015382030
ENSG00000002919 0.123684426
ENSG00000002933 0.000431269
ENSG00000003249 0.000119958
...                     ...
ENSG00000284882 0.016523420
ENSG00000284948 0.465445666
ENSG00000284977 0.000659367
ENSG00000285417 0.970172528
ENSG00000285444 0.286419134

$`COVID Neg Over Time`
 
 
DataFrame with 5000 rows and 6 columns
                  baseMean log2FoldChange     lfcSE      stat      pvalue
                 <numeric>      <numeric> <numeric> <numeric>   <numeric>
ENSG00000000938 16292.6481      -0.967706  0.188265 -5.140117 2.74567e-07
ENSG00000002586  1719.5175       0.724006  0.196562  3.683353 2.30186e-04
ENSG00000002919   870.6417      -0.101169  0.161486 -0.626486 5.30996e-01
ENSG00000002933   266.6548       1.010147  0.274661  3.677795 2.35259e-04
ENSG00000003249    11.4328       0.647755  0.412732  1.569433 1.16547e-01
...                    ...            ...       ...       ...         ...
ENSG00000284882   121.6850      -0.427594  0.267027  -1.60132 1.09307e-01
ENSG00000284948    38.7903      -0.291125  0.234652  -1.24067 2.14728e-01
ENSG00000284977    73.4571      -1.049690  0.235115  -4.46458 8.02262e-06
ENSG00000285417    25.7393       0.490122  0.420899   1.16446 2.44236e-01
ENSG00000285444    17.3871       0.363210  0.351279   1.03397 3.01152e-01
                       padj
                  <numeric>
ENSG00000000938 2.39613e-05
ENSG00000002586 2.63737e-03
ENSG00000002919 6.61318e-01
ENSG00000002933 2.68493e-03
ENSG00000003249 2.29349e-01
...                     ...
ENSG00000284882 0.218554310
ENSG00000284948 0.353434564
ENSG00000284977 0.000245775
ENSG00000285417 0.386061932
ENSG00000285444 0.446699110

enrichedResultsSigora <- pathwayEnrichment(
    inputList=exampleDESeqResults,
    analysis="sigora",
    filterInput=TRUE,
    gpsRepo="default"
)

head(enrichedResultsSigora)

# A tibble: 6 × 12
  comparison       direction pathwayId pathwayName   pValue pValueAdjusted genes
  <chr>            <chr>     <chr>     <fct>          <dbl>          <dbl> <chr>
1 COVID Pos Over … Up        R-HSA-38… Chemokine … 7.04e-46       7.05e-43 CCR3…
2 COVID Pos Over … Up        R-HSA-19… Immunoregu… 5.46e-45       5.46e-42 CD1A…
3 COVID Pos Over … Up        R-HSA-38… Costimulat… 2.56e-40       2.56e-37 CD28…
4 COVID Pos Over … Up        R-HSA-67… Neutrophil… 7.88e-31       7.89e-28 ABCA…
5 COVID Pos Over … Up        R-HSA-20… Cell surfa… 3.55e-21       3.55e-18 ATP1…
6 COVID Pos Over … Up        R-HSA-14… Alpha-defe… 1.02e-20       1.02e-17 CD4;…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <int>,
#   geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>

If you prefer to test the up- and down-regulated genes together, simply set the option split=FALSE:

enrichedResultsSigoraNoSplit <- pathwayEnrichment(
    inputList=exampleDESeqResults,
    analysis="sigora",
    filterInput=TRUE,
    gpsRepo="default",
    split=FALSE
)

head(enrichedResultsSigoraNoSplit)

# A tibble: 6 × 12
  comparison      direction pathwayId pathwayName    pValue pValueAdjusted genes
  <chr>           <chr>     <chr>     <fct>           <dbl>          <dbl> <chr>
1 COVID Pos Over… All       R-HSA-67… Neutrophil… 1.77e-197      1.77e-194 ABCA…
2 COVID Pos Over… All       R-HSA-90… Interferon… 5.55e-168      5.55e-165 ADAR…
3 COVID Pos Over… All       R-HSA-91… Interferon… 5.87e- 72      5.88e- 69 ADAR…
4 COVID Pos Over… All       R-HSA-19… Immunoregu… 3.44e- 66      3.45e- 63 CD1A…
5 COVID Pos Over… All       R-HSA-87… Interferon… 2.56e- 55      2.56e- 52 FCGR…
6 COVID Pos Over… All       R-HSA-38… Chemokine … 4.94e- 53      4.94e- 50 CCR1…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <int>,
#   geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>

For those who still prefer traditional over-representation analysis, we include the option of doing so by setting analysis="reactomepa", which uses ReactomePA (Yu et al. 2016). When using this method, we recommend providing a gene universe to serve as a background for the enrichment test; here we’ll use all the genes which were tested for significance by DESeq2 (i.e. all genes from the count matrix), converting them to Entrez gene IDs before running the test.

data("mappingFile")
exGeneUniverse <- mappingFile %>% 
    filter(ensemblGeneId %in% rownames(exampleDESeqResults[[1]])) %>% 
    pull(entrezGeneId)

enrichedResultsRpa <- pathwayEnrichment(
    inputList=exampleDESeqResults,
    analysis="reactomepa",
    filterInput=TRUE,
    split=TRUE,
    geneUniverse=exGeneUniverse
)

head(enrichedResultsRpa)

# A tibble: 6 × 12
  comparison       direction pathwayId pathwayName   pValue pValueAdjusted genes
  <chr>            <chr>     <chr>     <chr>          <dbl>          <dbl> <chr>
1 COVID Pos Over … Up        R-HSA-20… Generation… 3.63e-10    0.000000219 PLCG…
2 COVID Pos Over … Up        R-HSA-20… Translocat… 1.51e- 9    0.000000302 HLA-…
3 COVID Pos Over … Up        R-HSA-38… PD-1 signa… 1.51e- 9    0.000000302 HLA-…
4 COVID Pos Over … Up        R-HSA-20… Phosphoryl… 1.22e- 7    0.0000184   HLA-…
5 COVID Pos Over … Up        R-HSA-38… Costimulat… 1.22e- 6    0.000146    CD28…
6 COVID Pos Over … Up        R-HSA-20… TCR signal… 3.61e- 6    0.000321    PLCG…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <dbl>,
#   geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>

In addition to the Reactome database used when setting analysis to “sigora” or “reactomepa”, we also provide over-representation analysis using the Hallmark gene sets from the Molecular Signatures Database (MSigDb). These are 50 gene sets that represent “specific, well-defined biological states or processes with coherent expression” (Liberzon et al. 2015). This database provides a more high-level summary of key biological processes compared to the more granular Reactome pathways.

enrichedResultsHm <- pathwayEnrichment(
    inputList=exampleDESeqResults,
    analysis="hallmark",
    filterInput=TRUE,
    split=TRUE
)

head(enrichedResultsHm)

# A tibble: 6 × 12
  comparison       direction pathwayId pathwayName   pValue pValueAdjusted genes
  <chr>            <chr>     <chr>     <chr>          <dbl>          <dbl> <chr>
1 COVID Pos Over … Up        HEME MET… HEME METAB… 2.69e-32       1.34e-30 SLC4…
2 COVID Pos Over … Up        IL2 STAT… IL2 STAT5 … 5.89e- 4       1.47e- 2 PLPP…
3 COVID Pos Over … Down      INTERFER… INTERFERON… 3.14e-30       1.48e-28 EIF2…
4 COVID Pos Over … Down      INTERFER… INTERFERON… 2.37e-28       5.58e-27 EIF2…
5 COVID Pos Over … Down      INFLAMMA… INFLAMMATO… 6.83e- 9       1.07e- 7 KIF1…
6 COVID Pos Over … Down      TNFA SIG… TNFA SIGNA… 1.75e- 7       2.06e- 6 MXD1…
# ℹ 5 more variables: numCandidateGenes <dbl>, numBgGenes <dbl>,
#   geneRatio <dbl>, totalGenes <int>, topLevelPathway <chr>

7 Plotting pathway enrichment results

Now that we have (a lot of) pathway enrichment results from multiple comparisons, its time to visualize them. The function plotPathways does this by grouping Reactome pathways (or Hallmark gene sets) under parent groups, and indicates if each pathway is up- or down-regulated in each comparison, making it easy to identify which pathways are shared or unique to different DEG lists. Because there are often many pathways, you can split the plot into multiple columns (up to 3), and truncate the pathway names to make the results fit more easily.

Sometimes a pathway may be enriched in both up- and down-regulated genes from the same DEG list (these usually occur with larger pathways). Such occurrences are indicated by a white asterisk where the more significant (lower adjusted p-value) direction is displayed. You can also change the angle/labels of the comparisons, or add the number of DEGs in each comparison below the labels. Lastly, you can specify which pathways or top pathway groups to include for visualization.

pathwayPlots(
    pathwayEnrichmentResults=enrichedResultsSigora, 
    columns=2
)

Let’s again plot the sigora results, this time with the following tweaks:

Only show immune-related pathways based on the top pathway grouping
Change the colour scaling used for the adjusted p-values
Condense the comparison names to better fit onto the plot, and make them display horizontally
Add the number of DEGs used for enrichment below the comparison name
Increase the truncation cutoff value of pathway names for more words

pathwayPlots(
    pathwayEnrichmentResults=enrichedResultsSigora,
    specificTopPathways="Immune System",
    colourValues=c("#440154", "#FDE725"),
    newGroupNames=c("COVID\nPositive", "COVID\nNegative"),
    showNumGenes=TRUE,
    xAngle="horizontal",
    nameWidth=50
)

Joining with `by = join_by(comparison)`

From these results, you can see that while many of the immune system pathways change in the same direction over time in COVID-19 and non-COVID-19 sepsis patients, a few unique ones stand out, mostly related to interferon signaling (“Interferon Signaling”, “Interferon gamma signaling”, “Interferon alpha/beta signaling”, “ISG15 antiviral mechanism”). This likely reflects an elevated early antiviral response in COVID-19 patients that decreased over time, compared to no change in non-COVID-19 sepsis patients.

8 Generating networks from enriched pathways

pathlinkR includes functions for turning the pathway enrichment results from either Reactome-based method (“sigora” or “reactomepa”) into networks, using the overlap of the genes assigned to each pathway to determine their similarity to one other. In these networks, each pathway is a node, with connections or edges between them determined via a distance measure. A threshold can be set, where two pathways with a minimum similarity measure are considered connected, and would have an edge drawn between their nodes.

We provide a pre-computed distance matrix of Reactome pathways, generated using Jaccard distance, but there is support for multiple distance measures to be used. Once this “foundation” of pathway interactions is created, a pathway network can be built using the createPathnet function:

data("sigoraExamples")

pathwayDistancesJaccard <- getPathwayDistances(pathwayData = sigoraDatabase)

Running distance calculations (this may take a while)...

startingPathways <- pathnetFoundation(
    mat=pathwayDistancesJaccard,
    maxDistance=0.8
)

# Get the enriched pathways from the "COVID Pos Over Time" comparison
exPathwayNetworkInput <- sigoraExamples %>% 
    filter(comparison == "COVID Pos Over Time")

myPathwayNetwork <- pathnetCreate(
    pathwayEnrichmentResult=exPathwayNetworkInput,
    foundation=startingPathways
)

There are two options for visualization, the first being a static network:

pathnetGGraph(
    myPathwayNetwork,
    labelProp=0.1,
    nodeLabelSize=4,
    nodeLabelOverlaps=8,
    segColour="red"
)

Nodes (pathways) which are filled in are enriched pathways (i.e. those output by pathwayEnrichment()). Size of nodes is correlated with statistical significance, while edge thickness relates to the similarity of two connected pathways.

Though this type of visualization is useful, we can also display this network using an alternate method that creates an interactive display:

pathnetVisNetwork(myPathwayNetwork)

Nodes within this network can be dragged around and re-positioned. Or hold the SHIFT key while clicking-and-dragging to select (and move) multiple nodes at once. Clicking on a single node will highlight its direct neighbours, and the dropdown at the top-left corner will highlight pathways from a specific category (e.g. Immune/Hemostasis).

8.1 Pathway networks using only DEGs

Another option is to use only DEGs when generating the pathway network information, as outlined below:

candidateData <- sigoraExamples %>%
    filter(comparison == "COVID Pos Over Time") %>% 
    select(pathwayId, genes) %>%
    tidyr::separate_rows(genes, sep=";") %>%
    left_join(
        sigoraDatabase,
        by=c("pathwayId", "genes" = "hgncSymbol"),
        multiple="all"
    ) %>%
    relocate(pathwayId, ensemblGeneId, "hgncSymbol"=genes, pathwayName) %>%
    distinct()

## Now that we have a smaller table in the same format as sigoraDatabase, we
## can construct our own matrix of pathway distances
candidateDistData <- getPathwayDistances(
    pathwayData=candidateData,
    distMethod="jaccard"
)

candidateStartingPathways <- pathnetFoundation(
    mat=candidateDistData,
    maxDistance=0.9
)

candidatesAsNetwork <- pathnetCreate(
    pathwayEnrichmentResult=filter(
        sigoraExamples, 
        comparison == "COVID Pos Over Time"
    ),
    foundation=candidateStartingPathways,
    trim=FALSE
)

pathnetVisNetwork(candidatesAsNetwork, nodeLabelSize=30)

9 Supplemental materials

9.1 Gene-pair signatures

Sigora uses a Gene-Pair Signature (GPS) Repository that stores information on which gene pairs are unique for which pathways. We recommend using the one already loaded with pathlinkR, which is “reaH” (Reactome Human). You can also generate your own GPS repo using Sigora’s own functions and a custom set of pathways (e.g. from another pathway database like GO or KEGG). Please consult the Sigora documentation on how to generate your custom GPS repository.

9.2 Why are there different p-value cut-offs for sigora vs. ReactomePA/Hallmark?

Because there are now multiple gene pairs vs. single genes, the gene-pair “universe” is greatly increased and it is more likely for a result to be significant. Therefore, the cutoff threshold for significance is more stringent (adjusted p-value < 0.001) and a more conservative adjustment method (Bonferroni) is used. For regular over-representation analysis, a less conservative adjustment method (Benjamini-Hochberg) is used with adjusted p-value < 0.05. These are automatically set with filterResults="default". You can adjust these cut-offs by setting filterResults to different values between 0 and 1, or 1 if you want all the pathways (this may be useful for comparing which enriched genes appear in which comparisons, even if the enrichment is not significant).

10 Citations

An AY, Baghela AS, Falsafi R, Lee AH, Trahtemberg U, Baker AJ, dos Santos CC, Hancock REW. Severe COVID-19 and non-COVID-19 severe sepsis converge transcriptionally after a week in the intensive care unit, indicating common disease mechanisms. Front Immunol. 2023;6(14):1167917.

Fabregat A, Sidiropoulos K, Viteri G, Forner O, Marin-Garcia P, Arnau V, D’Eustachio P, Stein L, Hermjakob H. Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform. 2017;18:142.

Foroushani ABK, Brinkman FSL, Lynn DJ. Pathway-GPS and sigora: identifying relevant pathways based on the over-representation of their gene-pair signatures. PeerJ. 2013;1:e229.

Korotkevich G, Sukhov V, Sergushichev A (2019). “Fast gene set enrichment analysis.” bioRxiv. doi:10.1101/060012, http://biorxiv.org/content/early/2016/06/20/060012

Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

Yu G, He QY. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol Biosyst. 2016;12(2):477-9.

11 Session information

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Vancouver
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] DESeq2_1.46.0               SummarizedExperiment_1.36.0
 [3] Biobase_2.66.0              MatrixGenerics_1.18.0      
 [5] matrixStats_1.4.1           GenomicRanges_1.58.0       
 [7] GenomeInfoDb_1.42.0         IRanges_2.40.0             
 [9] S4Vectors_0.44.0            BiocGenerics_0.52.0        
[11] pathlinkR_1.2.0             dplyr_1.1.4                
[13] BiocStyle_2.34.0           

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_1.8.9         
  [4] shape_1.4.6.1           magrittr_2.0.3          ggtangle_0.0.4         
  [7] magick_2.8.5            farver_2.1.2            rmarkdown_2.28         
 [10] GlobalOptions_0.1.2     fs_1.6.5                zlibbioc_1.52.0        
 [13] vctrs_0.6.5             memoise_2.0.1           ggtree_3.14.0          
 [16] rstatix_0.7.2           tinytex_0.53            htmltools_0.5.8.1      
 [19] S4Arrays_1.6.0          broom_1.0.7             Formula_1.2-5          
 [22] gridGraphics_0.5-1      SparseArray_1.6.0       sass_0.4.9             
 [25] bslib_0.8.0             htmlwidgets_1.6.4       plyr_1.8.9             
 [28] cachem_1.1.0            igraph_2.1.1            lifecycle_1.0.4        
 [31] iterators_1.0.14        pkgconfig_2.0.3         gson_0.1.0             
 [34] Matrix_1.7-1            R6_2.5.1                fastmap_1.2.0          
 [37] GenomeInfoDbData_1.2.13 clue_0.3-65             aplot_0.2.3            
 [40] enrichplot_1.26.1       digest_0.6.37           colorspace_2.1-1       
 [43] patchwork_1.3.0         AnnotationDbi_1.68.0    RSQLite_2.3.7          
 [46] ggpubr_0.6.0            vegan_2.6-8             labeling_0.4.3         
 [49] fansi_1.0.6             httr_1.4.7              polyclip_1.10-7        
 [52] abind_1.4-8             mgcv_1.9-1              compiler_4.4.1         
 [55] bit64_4.5.2             withr_3.0.2             doParallel_1.0.17      
 [58] backports_1.5.0         BiocParallel_1.40.0     carData_3.0-5          
 [61] viridis_0.6.5           DBI_1.2.3               highr_0.11             
 [64] ggforce_0.4.2           R.utils_2.12.3          ggsignif_0.6.4         
 [67] MASS_7.3-61             DelayedArray_0.32.0     rjson_0.2.23           
 [70] permute_0.9-7           tools_4.4.1             ape_5.8                
 [73] R.oo_1.26.0             glue_1.8.0              nlme_3.1-166           
 [76] GOSemSim_2.32.0         grid_4.4.1              reshape2_1.4.4         
 [79] cluster_2.1.6           fgsea_1.32.0            generics_0.1.3         
 [82] gtable_0.3.6            R.methodsS3_1.8.2       tidyr_1.3.1            
 [85] data.table_1.16.2       car_3.1-3               tidygraph_1.3.1        
 [88] utf8_1.2.4              XVector_0.46.0          ggrepel_0.9.6          
 [91] foreach_1.5.2           pillar_1.9.0            stringr_1.5.1          
 [94] yulab.utils_0.1.7       circlize_0.4.16         splines_4.4.1          
 [97] tweenr_2.0.3            treeio_1.30.0           lattice_0.22-6         
[100] bit_4.5.0               tidyselect_1.2.1        GO.db_3.20.0           
[103] ComplexHeatmap_2.22.0   locfit_1.5-9.10         Biostrings_2.74.0      
[106] knitr_1.48              gridExtra_2.3           bookdown_0.41          
[109] xfun_0.49               graphlayouts_1.2.0      visNetwork_2.1.2       
[112] stringi_1.8.4           UCSC.utils_1.2.0        lazyeval_0.2.2         
[115] ggfun_0.1.7             yaml_2.3.10             evaluate_1.0.1         
[118] codetools_0.2-20        ggraph_2.2.1            tibble_3.2.1           
[121] qvalue_2.38.0           BiocManager_1.30.25     ggplotify_0.1.2        
[124] cli_3.6.3               munsell_0.5.1           jquerylib_0.1.4        
[127] Rcpp_1.0.13             png_0.1-8               parallel_4.4.1         
[130] ggplot2_3.5.1           blob_1.2.4              clusterProfiler_4.14.0 
[133] DOSE_4.0.0              tidytree_0.4.6          viridisLite_0.4.2      
[136] sigora_3.1.1            scales_1.3.0            purrr_1.0.2            
[139] crayon_1.5.3            GetoptLong_1.0.5        rlang_1.1.4            
[142] cowplot_1.1.3           fastmatch_1.1-4         KEGGREST_1.46.0

Analyze and visualize RNA-Seq data with pathlinkR

4 November 2024

Contents