Concept guide · 12 min read Intermediate

GSEA vs ORA: Pathway Enrichment Explained

Two ways to turn a gene list into biology. When to use which, illustrated with real examples.

Plain-language stats Worked examples with numbers GO · KEGG · Reactome · MSigDB

ORA (Over-Representation Analysis) and GSEA (Gene Set Enrichment Analysis) are the two dominant methods for translating a list of differentially expressed genes into biological pathways. The difference, in one line: ORA tests your thresholded significant-gene list against pathways using a hypergeometric test; GSEA tests your full ranked gene list using a running-sum statistic. ORA is faster and easier to read; GSEA catches coordinated, subtle shifts that ORA misses by filtering them out.

TL;DR
ORA

Over-Representation Analysis

Asks: “Is my significant-gene list enriched for any pathway?” Uses a hypergeometric test on a thresholded list (e.g. padj < 0.05).

GSEA

Gene Set Enrichment Analysis

Asks: “Are any pathways coordinately up- or down-regulated?” Uses the full ranked gene list — no significance threshold.

ORA is faster and easier to interpret. GSEA catches subtle coordinated changes ORA misses. Many studies report both, then highlight pathways where the two agree.

Section 1 · The problem

A gene list isn’t biology

You ran differential expression, applied your cutoffs, and got back 500 significant genes. Now what? Reading 500 gene symbols one by one is impractical — and even if you did, the human brain can’t hold that many entities at once and pick out a pattern. You need to translate gene IDs into themes a biologist can reason about: inflammation, cell cycle, angiogenesis, fatty-acid metabolism.

That translation is called functional enrichment analysis (sometimes pathway enrichment or gene-set enrichment). The recipe is always the same: take your experimental result, compare it against a curated catalog of gene sets that someone has labeled with biological meaning (a GO term, a KEGG pathway, a hallmark signature), and ask “does my result light up any of these in a non-random way?”

The two methods on this page differ in what they use as input and what statistical test they apply. Everything else — the gene set catalogs, the biology, the way you read the results — is shared.

Why this matters

The phrase “my data is enriched for cell cycle genes” is doing more lifting than it looks. Without enrichment analysis you have a gene list. With it, you have a hypothesis. The method you choose changes which hypotheses you can see.

Section 2 · ORA

ORA — the simple, intuitive approach

Over-Representation Analysis is the older, simpler idea. You start with two lists:

  • Foreground — your significant DEGs (often padj < 0.05 and |log2FC| > 1).
  • Background — every gene you could have detected in this experiment (all expressed genes, or all annotated genes in your reference).

For each pathway in your catalog (say, the GO term vasculature development), you ask: “Of the 412 genes in my foreground, how many are in this pathway? And how many would I expect by chance, given the size of the pathway and the size of my background?” If the observed count is much larger than the expected count, the pathway is over-represented — enriched.

The statistical test is the hypergeometric test (sometimes implemented as Fisher’s exact test on a 2×2 table). The math is the same as drawing colored balls from an urn: how surprising is it to draw 47 blue balls when only 380 of the 22,000 balls in the urn are blue, and you drew 412 balls total?

Figure 1 · ORA urn intuition Background: all expressed genes 22,000 total · 380 in pathway ~ 1.7% of genome draw 412 expect ~ 7 blue My foreground: significant DEGs 412 total · 47 in pathway ~ 11.4% of DEGs ← over-represented

Figure 1 · ORA intuition. The genome has 380 genes annotated to vasculature development (~1.7%). By chance, your 412-gene DEG list should hit ~7 of them. Instead it hits 47 — about 6.7× the expected number. The hypergeometric test quantifies how unlikely that excess is under the null.

Worked example · GSE151427 cardiac vs paraxial mesoderm

You have 412 up-regulated DEGs (padj < 0.05, log2FC > 1) out of 22,000 expressed genes. The GO term vasculature development has 380 annotated genes in the human genome. Of your DEGs, 47 overlap with that term.

Hypergeometric test: p = 1 × 10⁻¹⁵, BH-adjusted FDR = 1 × 10⁻¹³. Translation: there is essentially no chance you’d see this much overlap by drawing 412 genes at random. The cardiac differentiation samples have switched on vasculature programs.

When ORA is the right choice

Use ORA when

  • You have a clean significant-gene list (clear thresholds like padj < 0.05, |log2FC| > 1).
  • You want a fast, easy-to-explain result (“47 of my 412 DEGs are in this pathway, p = 1e−15”).
  • Your experiment has strong effects — clear winners that show up regardless of threshold.
  • You’re screening many gene sets and need to triage quickly.

Avoid ORA when

  • Your effects are subtle: many pathway genes nudged by log2FC ~ 0.3 in the same direction. They’ll all fail your cutoff individually and the pathway vanishes.
  • You want a threshold-independent answer. Move from padj < 0.05 to padj < 0.01 and your enriched-pathway list can change dramatically.
  • Your DEG list is tiny (< 20 genes). The statistical power is poor and most pathways will look enriched by coincidence.
Section 3 · GSEA

GSEA — the ranked-list, threshold-free approach

Gene Set Enrichment Analysis was introduced by Subramanian and colleagues in 2005, in response to a real frustration: in some diseases (the original case was type-2 diabetes and oxidative phosphorylation), no single gene in a relevant pathway crossed conventional significance thresholds, yet the pathway as a whole was clearly shifted. ORA found nothing. The genes were all whispering in the same direction.

GSEA fixes this by refusing to threshold. You hand it the full ranked list of every gene in your experiment, ordered by some statistic that captures direction and strength — typically log2FC or a signed score like sign(log2FC) × −log10(p). Up-regulated genes pile at the top, down-regulated at the bottom.

For each pathway, GSEA walks down the ranked list from top to bottom. Every time it encounters a pathway member, it adds to a running enrichment score. Every time it encounters a non-member, it subtracts a small amount. If the pathway is concentrated at the top of the list (coordinately up-regulated), the running score climbs early and peaks high — that peak is the enrichment score (ES).

Statistical significance comes from permutation: shuffle the pathway labels (or the sample labels) thousands of times, recompute the ES under the null, and ask how unusual your real ES is. The result is reported as a normalized enrichment score (NES) plus a p-value and FDR.

Figure 2 · GSEA running enrichment Pathway: vasculature development  ·  NES = 2.41  ·  FDR < 0.001 0.8 0.4 0.0 −0.4 running ES ES = 0.78 ← leading edge hits rank 1 rank ~10,000 rank 20,432 ↑ up-regulated in cardiac down-regulated ↓

Figure 2 · A canonical GSEA plot. The green curve is the running enrichment score as we walk down the ranked gene list from up-regulated (left) to down-regulated (right). Tick marks show where pathway members fall. Members cluster heavily at the top ⇒ the curve climbs fast, peaks early, and the peak height is the enrichment score (ES). The set of genes from rank 1 up to the peak is the leading edge — the genes actually driving the enrichment.

When GSEA is the right choice

Use GSEA when

  • You suspect subtle, coordinated changes (small log2FCs across many pathway genes in the same direction).
  • You’d rather not pick an arbitrary significance threshold and live with its consequences.
  • You have weak global effects and ORA returns “no significant pathways”.
  • You want directional information: GSEA reports up- vs down-regulated pathways separately via the sign of NES.

Avoid GSEA when

  • You only have a thresholded gene list (no log2FCs / no statistics for non-significant genes).
  • Your sample size is tiny (< 3 per group) — permutation tests are unreliable.
  • You need to report the result to a non-technical audience: the leading edge concept and NES are harder to explain than “X% of my DEGs are in pathway Y”.
  • You’re running it against very large databases without filtering — default gene-set size cutoffs (15–500) silently exclude a lot.
Section 4 · Side-by-side

ORA vs GSEA at a glance

One table for the whole picture. If you remember nothing else from this page, keep this:

Aspect ORA GSEA
Input Significant-gene list + background Full ranked gene list (every gene + score)
Statistical test Hypergeometric / Fisher’s exact Kolmogorov-Smirnov-like running sum + permutation
Threshold-dependent? Yes — results change with cutoff No — uses all genes
Detects subtle coordinated changes? Poorly — filtered out by threshold Well — this is what it was built for
Directional? No (run up- and down-regulated lists separately) Yes — sign of NES tells you up vs down
Speed Fast — seconds Slow — 1–5 min on large databases
Interpretation “X of my N DEGs are in pathway Y” “Pathway Y is coordinately shifted; the leading edge is...”
Best for Clean, strong-effect experiments Subtle, biologically coordinated effects
Foundational citation Boyle et al. 2004 (GO::TermFinder) Subramanian et al. 2005 (GSEA)
The practical answer

Run both. They answer different questions and they fail in different ways. The pathways that show up in both methods are the ones you can lean on. Pathways only in ORA are usually being driven by a handful of high-fold-change genes; pathways only in GSEA are usually coordinated whispers worth looking at in a heatmap before claiming.

Section 5 · Choosing a database

Which gene-set database should I use?

ORA and GSEA are methods. They both need a catalog of gene sets to test against. The four big ones cover most needs.

GO — Gene Ontology

3 branches · 40,000+ terms · most organisms

The default. Three branches: Biological Process, Molecular Function, Cellular Component. Terms are organized hierarchically (vasculature development is-a system development).

Pros Comprehensive, well-curated, available for every model organism.

Cons Heavy redundancy — many overlapping terms appear as “separate” hits. Top-level BP terms (metabolic process, cellular process) are too vague to interpret.

KEGG

Kyoto Encyclopedia of Genes & Genomes

Curated pathway maps with visual diagrams (the classic boxes-and-arrows you’ve seen in textbooks). Around 500 pathways per organism.

Pros Clean, illustrated, biology-focused (metabolism, signaling, disease).

Cons Smaller coverage than GO. License restrictions on commercial use — matters if your tool is hosted commercially.

Reactome

Hierarchical · manually curated · reactions & complexes

Manually curated pathways with explicit reactions, protein complexes, and cross-references. Hierarchical like GO but the leaves are actual molecular events.

Pros High quality, explicit mechanisms, good for translating “pathway” into testable biochemistry.

Cons Smaller than GO. Strong bias toward human and disease biology — weaker in non-mammalian organisms.

MSigDB

Molecular Signatures Database · standard for GSEA

A meta-collection built for GSEA. Key sub-collections: H (50 cancer Hallmarks), C2 (literature pathways inc. KEGG/Reactome), C5 (GO-derived), C7 (immunologic signatures), C8 (cell type signatures).

Pros Standardized format. Hallmarks (50 canonical pathways) are the gold-standard starting point for many studies.

Cons Strongly human-centric — though mouse orthologs are now provided.

Figure 3 · Database fit matrix GO BP KEGG Reactome MSigDB Hallmark Exploratory first pass best good ok best Mechanism (specific reactions) weak best best ok Cancer / oncology good good good best Non-model organism best ok weak poor Reporting to wet-lab audience ok best good best best fit good ok weak poor

Figure 3 · Which database fits which use case. There’s no single right answer — run two complementary databases (e.g. GO BP for breadth + MSigDB Hallmarks for canonical themes) and compare.

Section 6 · Common pitfalls

Five ways to misread enrichment results

Most enrichment-related mistakes in papers fall into one of these five buckets. Reviewers spot them. Avoid them.

1

“This pathway is significant, therefore it’s biologically important.”

Significance is a statement about how unlikely the observation is under the null — not about magnitude or importance. A pathway with FDR = 1e−30 driven by 200 genes nudged by log2FC 0.3 may matter less biologically than a pathway with FDR = 0.01 driven by 5 genes nudged by log2FC 3. Look at fold changes and biology.

2

“My top pathway is `metabolic process` — this must be metabolic disease.”

GO BP top-level terms (metabolic process, cellular process, biological regulation) cover so many genes that they’re almost always enriched when anything is enriched. They’re too vague to interpret. Filter to mid-depth terms or use a slim subset, and look at the more specific child terms.

3

“I got 50 enriched pathways — the biology is rich.”

Most enrichment results are heavily redundant. angiogenesis, blood vessel development, vasculature development, and cardiovascular system development share most of their genes and will all show up together. They’re one finding, not four. Use semantic-similarity clustering (e.g. simplifyEnrichment) or report only the most specific term per cluster.

4

“GSEA found nothing significant — there’s no enrichment.”

GSEA can underperform when a pathway is small (say 8 genes), all dramatically changed, but sandwiched among 20,000 unrelated genes. The running-sum walk dilutes the signal. ORA on those 8 genes would scream. Run both methods before claiming “no enrichment”.

5

“I’ll just use the default gene-set size cutoffs.”

Most GSEA implementations exclude gene sets with fewer than 15 or more than 500 members by default. That excludes specific TF target sets, rare disease signatures, and many curated immunologic subsets — silently. Check your tool’s minGSSize / maxGSSize parameters and adjust to your question.

Reviewer red flag

If your manuscript reports “the top 20 enriched GO terms” without showing fold enrichment, gene overlap, or evidence that they aren’t a redundant cluster of the same parent term — a careful reviewer will ask for a redo.

Section 7 · Decision flowchart

Which method should I use?

Follow the arrows. The honest answer at most leaves is “run both” — but the flowchart tells you where to start.

Figure 4 · Method-choice flowchart Strong, clear DEGs? (>100 genes, |log2FC| > 1) YES NO / subtle Start with ORA Fast, interpretable, “X of N DEGs in pathway Y” + GO BP · KEGG · Hallmark GSEA preferred Threshold-free; catches coordinated whispers rank by signed -log10(p) · permute Publishing the result? or custom gene sets? YES NO Stick with your initial pick Quick exploration / sanity check Run BOTH, report agreement most robust + most reviewer-proof

Figure 4 · Pragmatic decision tree. For publication: run both, then report the union and highlight the intersection. Pathways flagged by both methods are your strongest claims.

Section 8 · In TransXplorer

How TransXplorer handles enrichment

The Pathway Enrichment tab runs both ORA and GSEA side by side in a single panel, so you don’t have to pick blind. Choose your method (or both), choose a database (GO BP / MF / CC, KEGG, Reactome, MSigDB Hallmark / C2 / C5 / C7), and hit run. Results appear as ranked bar charts you can sort, filter, and export.

Behind the scenes: clusterProfiler v4 for both methods, organism backgrounds pulled from AnnotationHub covering ~1,800 organisms, and gene-symbol / ENSEMBL / Entrez auto-detection so you don’t have to convert IDs yourself. Every pathway is clickable — drill down to see exactly which of your DEGs are driving it, with their log2FC and padj inline.

For GSEA, the leading-edge gene list is exported alongside the enrichment statistic. For ORA, the actual overlap (foreground ∩ pathway) is shown explicitly so you can verify the count. Both methods produce side-by-side ranked plots in the same colour scheme as everywhere else in the tool.

Try pathway enrichment in TransXplorer

Run ORA and GSEA against GO, KEGG, Reactome, or MSigDB in one panel — no installs, no R, no command line.

Launch TransXplorer →

FAQ

Should I use Bonferroni or FDR adjustment on enrichment results?
FDR — specifically Benjamini-Hochberg (BH). Bonferroni is far too conservative for pathway-level testing: GO BP alone tests thousands of overlapping terms that are highly correlated. BH is the convention in the field for a reason. Report adjusted p-values (p.adjust or q.value), not raw p-values.
Why do my enriched pathways change when I tighten my DEG threshold?
That’s the threshold-dependence problem of ORA. Moving from padj < 0.05 to padj < 0.01 changes the foreground gene set, which changes the hypergeometric inputs, which changes the result. This is exactly the kind of fragility GSEA was designed to remove — if you find your ORA results swing with threshold, run GSEA as a sanity check.
Can I use my own custom gene set?
Yes — upload a GMT (Gene Matrix Transposed) file in TransXplorer’s Pathway Enrichment tab. GMT is the standard format MSigDB and GSEA use: one row per gene set, tab-separated, first column is the set name, second column is a description, all subsequent columns are gene symbols. Both ORA and GSEA accept it.
Do organism backgrounds matter for ORA?
A lot. The hypergeometric test is sensitive to the size of the universe (background). Using all of GO instead of just your expressed genes overstates significance, because most of the “background” could never have been called as a DEG. TransXplorer uses your expression matrix to define the background automatically, scaled to the right organism.
What about pathway analysis for single-cell data?
Different methods exist that score pathways per-cell instead of per-experiment: UCell, AUCell, decoupleR, VISION. They’re out of scope for this page (which is about bulk RNA-seq), but the same gene-set databases (GO, KEGG, MSigDB) are reused.
Section 10 · Further reading

Foundational and recent references

A short, opinionated reading list. The first two are essential history; the rest are practical.

  • Subramanian A, Tamayo P, Mootha VK, et al. (2005) — Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43):15545–15550. doi:10.1073/pnas.0506580102
  • Mootha VK, Lindgren CM, Eriksson KF, et al. (2003) — PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34(3):267–273. doi:10.1038/ng1180 — the paper that motivated GSEA.
  • Boyle EI, Weng S, Gollub J, et al. (2004) — GO::TermFinder — open source software for accessing Gene Ontology information and finding significantly enriched terms in gene lists. Bioinformatics 20(18):3710–3715. doi:10.1093/bioinformatics/bth456
  • Wu T, Hu E, Xu S, et al. (2021) — clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3):100141. doi:10.1016/j.xinn.2021.100141
  • Liberzon A, Birger C, Thorvaldsdóttir H, et al. (2015) — The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Systems 1(6):417–425. doi:10.1016/j.cels.2015.12.004