Walkthrough · 15 min read Beginner-friendly No code required

Your First RNA-Seq Analysis

A walkthrough for biologists with zero coding — using a published cardiac differentiation dataset, in your browser.

Free forever Browser-based Real published dataset
TL;DR · The short version
  • 1 RNA-seq measures gene expression across thousands of genes per sample, letting you compare cell types, treatments, or disease states.
  • 2 TransXplorer turns the entire workflow into one configuration screen and one click: load counts, set parameters, hit Run DEG Analysis, and the platform does QC, batch correction, differential expression, pathway enrichment, and visualisation in one go.
  • 3 This walkthrough uses a real published dataset (GSE151427 — cardiac vs paraxial mesoderm endothelial cells) so you can see exactly what the outputs look like before trying your own data.
Section 1

What RNA-seq analysis actually answers

Strip away the acronyms and it’s a comparison.

Every cell in your body carries the same DNA, but no two cell types do the same job. A neuron and a hepatocyte share an identical genome and yet behave nothing alike. The difference is which genes are switched on in each cell, and how loudly. That switching pattern — the transcriptome — is what RNA-seq measures.

RNA-seq works by extracting messenger RNA from a sample, fragmenting it, sequencing the fragments, and mapping each fragment back to a gene. The output is a table: roughly 20,000 rows (one per protein-coding gene) and one column per sample, where each cell holds a count of how many reads landed on that gene. From that count matrix flows everything else.

The question almost every RNA-seq experiment is built to answer is the same: which genes differ between two (or more) conditions, and what biology do those differences point to? Tumour versus healthy tissue. Drug-treated versus vehicle-treated. Wild-type versus knockout. Day 6 versus day 8 of a differentiation protocol. Whatever the contrast, the analytical task is the same: find the genes that move, and figure out what story they tell.

None of this is trivial. With ~20,000 genes tested at once, chance differences pile up — if you call a gene “significant” at p < 0.05, you’d expect roughly 1,000 false positives just from rolling dice that many times. Read counts are noisy: the same gene in the same condition can vary two- or threefold across biological replicates for entirely uninteresting reasons. Sequencing is technical: counts depend on library size, GC bias, and which day the samples ran on the machine. And the genes that change biologically may not be the ones with the biggest fold change — transcription factors often move modestly but downstream of them, structural genes shift dramatically.

Sorting through all of that — normalising for technical noise, modelling biological variability, correcting for multiple testing, translating gene lists into pathways — is exactly what software like TransXplorer is for. It does not do your biology for you. But it turns a count matrix into a ranked, statistically defensible, biologically grouped set of findings you can actually read, without having to write a line of code.

The conceptual core

RNA-seq analysis is differential analysis. Counts in, ranked lists of biology-changing genes out. Everything else — normalisation, batch correction, dispersion estimation, multiple testing, enrichment — is machinery for getting that comparison right.

Section 2

Meet your dataset: GSE151427

22 samples, two cell types, a beautifully clean question.

The dataset you’ll work with comes from Orlova et al. (2014, Arterioscler Thromb Vasc Biol), doi:10.1161/ATVBAHA.113.302598, deposited in NCBI GEO as GSE151427. The authors took human pluripotent stem cells and pushed them down two different developmental paths, both of which produce endothelial cells — the cells that line every blood vessel in your body.

One path goes through cardiac mesoderm and gives rise to CMECs (cardiac mesoderm endothelial cells), the cells that build the vessels feeding your heart. The other goes through paraxial mesoderm and gives rise to PMECs (paraxial mesoderm endothelial cells), which form trunk and limb vasculature. Under a microscope the two populations are indistinguishable: same shape, same markers like CDH5 and PECAM1, same overall endothelial identity. But put them in vivo and they make wildly different vascular beds. Something at the transcriptional level is encoding that fate decision.

The experiment is exactly the comparison you’d design to find it: 11 CMEC replicates and 11 PMEC replicates, harvested across two differentiation timepoints (day 6 and day 8), processed in two sequencing batches. The question is unambiguous — which genes differ between CMECs and PMECs? — and the biology is well enough characterised that you can sanity-check your results against published cardiac master regulators like GATA4, GATA5, HAND2, and MYL7, and paraxial markers like ALDH1A2, FZD10, and CTHRC1. If your pipeline returns those genes at the top of the list, you know the analysis is working.

Accession
GSE151427NCBI GEO
Samples
2211 CMEC + 11 PMEC
Batches
2day 6 + day 8
Organism
H. sapienshPSC-derived

TransXplorer ships this dataset as an example so you don’t need to download anything to follow along. From the analysis screen you’ll see a one-click Load Example Counts shortcut that pulls in the count matrix and the sample metadata together. The first thing TransXplorer does is something useful and quietly clever: it parses the sample names and auto-assigns them to groups, so the 22 samples come in already sorted into 11 CMEC and 11 PMEC. You confirm the assignment with one click and you’re ready to configure.

fig. 1 TransXplorer auto-detected sample groups modal: 22 GSE151427 samples sorted into 11 CMEC and 11 PMEC ready to confirm
TransXplorer parses sample names and auto-assigns groups — 22 samples sorted into 11 CMEC and 11 PMEC, ready to confirm with one click.

You can read the original GEO record at ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE151427 for the full experimental design, library preparation details, and original processed counts.

Section 3

The TransXplorer workflow — configure once, click once

One sidebar. One button. Everything else is downstream.

Most RNA-seq pipelines spread the work across half a dozen scripts, tabs, or notebooks. TransXplorer collapses it into a single screen called Transcriptome Analysis. On the left you have a sidebar where you load data and set parameters; on the right, an empty canvas waiting to fill with results. There is one action button, and it is named exactly what it does.

The full pipeline in one click
Load data GEO / CSV / FASTQ
Configure one sidebar
Run DEG Analysis one button

Step 1 — load your data

Three options live side by side. Import from GEO takes any GSE accession and pulls counts and metadata directly from NCBI. Upload counts accepts a CSV of genes by samples with a separate metadata sheet. Load Example Counts drops in the GSE151427 dataset described above. For this walkthrough, click the example button.

Step 2 — set the parameters

Underneath the data loader, the sidebar exposes the choices that actually shape your analysis. Everything has a defensible default; you only change what you need.

  • DEG analysis packageedgeR (default) Pick DESeq2, edgeR, or limma-voom. All three are well-established, peer-reviewed, and widely cited; any is defensible in a paper. The screenshot below shows edgeR.
  • Normalisation methodTMM Pairs canonically with your DEG choice — TMM for edgeR, median-of-ratios for DESeq2, voom for limma. Each accounts for library-size differences across samples in a slightly different way.
  • Comparison methodSimple (Two Groups) For this dataset, a straight two-group test. Multi-factor and complex designs are available for experiments with treatment, time, and batch interacting.
  • Reference vs treatmentCMEC → PMEC Set CMEC as reference (control) and PMEC as treatment (case). This matters for sign interpretation: a positive log2FC will mean “higher in PMEC” and a negative one “higher in CMEC.”
  • Significance thresholdspadj < 0.05, |log2FC| > 2 A focused, strongly-changing set. Loosen |log2FC| to 1 if you want a broader screen; tighten padj to 0.01 for a stricter hypothesis-generating set.

Step 3 — click the button

Underneath the parameters is a single primary action: Run DEG Analysis. This is the only button you need to click. It triggers the full pipeline: quality control on every sample, automatic batch-effect detection across four metrics (PVCA, kBET, silhouette, combined), limma::removeBatchEffect correction if the metrics flag batch as a problem, the differential expression test you configured, pathway enrichment on the resulting DEG list, and the full visualisation suite. End-to-end takes roughly 30 to 90 seconds, depending on dataset size.

fig. 2 The TransXplorer configuration sidebar showing edgeR, TMM normalization, CMEC reference vs PMEC treatment, and significance thresholds padj < 0.05 and |log2FC| > 2 all set in a single panel
Set parameters once — method, normalisation, groups, thresholds — all in one sidebar.
fig. 3 TransXplorer auto-generated batch report with PVCA 34.4 percent, kBET 0.469, Silhouette 0.371, combined score 0.559, Excellent correction verdict, and batch clustering reduced by 84.4 percent after limma removeBatchEffect
TransXplorer auto-detects batch effects (here: PVCA 34.4%, kBET 0.469, Silhouette 0.371, combined 0.559) and applies limma::removeBatchEffect — reducing batch clustering by 84.4%.
Why one button matters

The difference between a tool that requires a methods chapter to operate and one that lets a biologist focus on the biology is exactly this: collapsing a six-tab click-through into a single configured run. The pipeline did not get simpler — QC, batch correction, dispersion estimation, multiple-testing correction, and enrichment all still happen — it just stopped asking you to babysit each step.

Section 4

Reading your results — the three views that matter

A volcano, an enrichment chart, and a heatmap. That’s most of what you need.

When the run finishes you’ll see a multi-panel results view: tables, plots, downloads. Three of those panels carry most of the information you actually want, and learning to read them is most of what learning RNA-seq analysis is about.

4a. The volcano plot — where the action is

A volcano plot is the single most useful chart in differential expression. Every dot is one gene. The x-axis is log2FC — how much the gene changed, on a log scale. The y-axis is -log10(padj) — how statistically convinced we are that the change is real. Bigger movements push dots outward; stronger statistics push dots upward. The most interesting genes end up in the top corners.

Because we set CMEC as the reference and PMEC as the treatment, the sign convention is straightforward: negative log2FC (left, blue) means higher in CMEC, and positive log2FC (right, red) means higher in PMEC. Dashed lines mark the thresholds you set in the sidebar — vertical lines at log2FC = ±2, a horizontal line at padj = 0.05. Anything past both lines is on your DEG list. Anything in the bottom-middle blob is statistical noise: small effects, weak evidence, mostly housekeeping.

The fastest way to gut-check an analysis is to look at what’s in the top corners and ask: do these genes match the biology I expect?

Hover over the top-left for this dataset and the cardiac master regulators land in your lap. GATA4 sits at log2FC = -5.75 with padj ≈ 1×10-14 — a 53-fold higher level in CMECs and a p-value so small it’s effectively asserting certainty. GATA5, MYL7, HAND2, and SMAD6 sit right next to it. Every one of those is a canonical cardiac transcription factor or structural protein. The biology isn’t whispering; it’s shouting.

Hover the top-right and you get the paraxial mesoderm story. ALDH1A2 at log2FC = +7.24 — a retinoic-acid synthesising enzyme that’s a textbook paraxial-trunk marker. FZD10 at +5.52, a Wnt receptor; CTHRC1 at +4.33, a Wnt-pathway modulator; TNFRSF19 at +4.24. Wnt signalling everywhere, exactly what you’d expect for trunk mesoderm patterning.

The genes that came back are the genes that should have come back. That’s how you know the analysis worked.
fig. 4 Static volcano plot from TransXplorer showing GATA4 with a hover tooltip at log2FC = -5.75 and adjusted p-value = 3.48e-11, blue downregulated genes on the left, red upregulated genes on the right, with threshold dashed lines at log2FC = +/-2
The volcano plot in the TransXplorer results view, hover-tooltip on GATA4 visible. log2FC = -5.75, padj = 3.48e-11 — one of the strongest cardiac signals you’ll ever see in a screen.

Now hover over your own version

The static screenshot above is the panel inside the TransXplorer app. The plot below is the same data, embedded live so you can hover, zoom, pan, and toggle direction groups. Try it: hover over the top-left dot for GATA4, drag-select a region to zoom, click the legend entries to hide and reveal cell-type groups.

Live

Live: hover any point to see the gene, log2FC, and adjusted p-value. Click the legend to toggle direction groups. Top hits are labelled.

Gene log2FC padj Direction Biology
GATA4-5.759.7e-15up in CMECCardiac master transcription factor
GATA5-5.831.0e-13up in CMECEndocardial / cardiac TF
MYL7-5.959.1e-12up in CMECAtrial myosin light chain
SMAD6-5.643.8e-15up in CMECBMP-pathway inhibitor, cardiac
ALDH1A2+7.241.1e-07up in PMECRetinoic acid synthesis, paraxial
FZD10+5.522.8e-12up in PMECWnt receptor
CTHRC1+4.333.8e-15up in PMECWnt-pathway modulator
TNFRSF19+4.243.8e-14up in PMECTROY, neural / paraxial

4b. Pathway enrichment — what biology emerged

A list of gene symbols, even a great one, is just a list. The interpretive jump happens when you ask whether those genes share a function. That’s pathway enrichment: take your DEG list, intersect it with curated gene sets that represent known biological processes, and find the sets that are unexpectedly over-represented.

TransXplorer runs enrichment automatically as part of Run DEG Analysis — by default against the Gene Ontology Biological Process database (GO:BP). You don’t configure it, you don’t click anything extra; when the volcano appears, the enrichment panel appears alongside it.

For this dataset, the top hit is the one you want to see: Heart Development (GO:0007507), driven by CMEC-up genes. Right behind it sit Cardiac Muscle Tissue Development, Negative Regulation of Multicellular Organismal Process, and various branches of MAPK-cascade regulation. In one panel, the platform has translated “here are 212 differentially expressed genes” into “these cells differ in their heart-development programme.” That sentence is the entire point of running an enrichment.

fig. 5 Top 15 GO Biological Process enrichment results from TransXplorer with Heart Development GO:0007507 as the top hit, followed by Cardiac Muscle Tissue Development, MAPK Cascade Regulation, and other cardiovascular and developmental pathways
Top 15 pathways enriched in CMEC-up genes. Heart Development (GO:0007507) is the top hit — exactly what you’d expect for cardiac-lineage cells.

A small subtlety worth understanding: TransXplorer’s default is over-representation analysis (ORA), which tests whether your significant DEG list is enriched for a pathway compared to the background of all measured genes. There’s a sister method, GSEA, that uses every gene ranked by fold change and is more sensitive to coordinated subtle shifts. For a deep dive on when to pick which, see the methods page linked below.

4c. The heatmap — visual confirmation

The third panel is your sanity check. TransXplorer takes the top 50 DEGs and clusters them — both rows (genes) and columns (samples) — producing a heatmap where rows are genes, columns are samples, and the colour at each cell shows expression level (red high, blue low). Above the columns sit annotation tracks: one row showing Group (CMEC vs PMEC) and one row showing Batch (day 6 vs day 8).

You’re looking for two things. First, do the samples cluster cleanly by group? The column dendrogram should split into two big branches, one all-CMEC and one all-PMEC. If a CMEC sample wanders into the PMEC cluster, you have a labelling problem or an outlier. Second, do the gene blocks pop? The CMEC-up half of the gene list should glow red on the CMEC columns and blue on the PMEC columns. The PMEC-up half should do the opposite. Sharp diagonal contrast is what you want.

The Batch row is where you confirm that batch correction actually worked. If batch were still driving variation, the column clustering would split by day rather than by cell type — day-6 CMECs would cluster with day-6 PMECs against day-8 of both. Here, the Batch row is scrambled across the column dendrogram, which is exactly what you want: limma::removeBatchEffect did its job and the biology took over.

fig. 6 Heatmap of top 50 DEGs from TransXplorer with CMEC and PMEC samples clustering separately on the column dendrogram, Group annotation row showing clean separation, Batch annotation row showing day 6 and day 8 scrambled across both clusters indicating successful batch correction
Top 50 DEGs clustered. CMEC and PMEC samples separate cleanly on the column dendrogram (Group annotation row). The Batch annotation row (day 6 / day 8) is scrambled across the clusters — batch is no longer driving variation. Output is publication-ready vector PDF.
How to use these three views together

Volcano tells you which genes moved. Enrichment tells you what those genes do collectively. Heatmap tells you whether you can trust the result. If all three agree — identity-defining genes at the top of the volcano, a relevant pathway leading the enrichment, and a heatmap that splits by biology rather than batch — you have a result worth writing up.

Section 5

What’s next

Bring your own data, go deeper on the methods, or open the full in-app tour.

Try it with your own data

The example dataset exists so you can see what the outputs look like before committing time to your own. When you’re ready, three on-ramps lead into TransXplorer’s analysis screen. Counts CSV is the fastest: drop in a CSV of genes × samples with a separate metadata file describing sample groups. GEO import takes any GSE accession — type GSE12345, hit fetch, and TransXplorer pulls the counts and metadata from NCBI directly into the analysis. FASTQ processing is for when you’ve received raw reads from a sequencing facility: a separate FASTQ Processing tab runs Salmon to align and quantify, then deposits the count matrix straight into the analysis screen.

Go deeper on the concepts

This walkthrough kept things at the level of “what you’re looking at and what it means.’’ The companion concept pages do the methods-level deep dives: what the statistical models actually do, when to pick one over another, what the failure modes look like.

Want the complete in-app walkthrough?

Every panel, every option, every export — with the actual TransXplorer interface in front of you. Open TransXplorer and click Tutorial in the navbar.

Launch TransXplorer →

If TransXplorer helps your research, please cite us

The preprint is open access on bioRxiv. A peer-reviewed version is in submission.

Verma VM, Oler E, Syed H, Han S, Berjanskii M, Mason AL, Wishart DS, Wong GK. TransXplorer: An automated translational discovery platform for RNA-seq data. bioRxiv. 2026. doi:10.64898/2026.05.15.724657
Read the preprint →

Frequently asked questions

Do I need to install anything?
No. TransXplorer runs entirely in your browser. There’s nothing to install — no R, no Python, no Docker, no command line. Open the URL, you’re in.
Can I use my own data?
Yes. Three options: a counts CSV (genes × samples plus a metadata sheet), a GEO accession (any GSE ID fetched directly from NCBI), or raw FASTQ processed through the FASTQ Processing tab (Salmon under the hood).
How big can my dataset be?
Typical bulk RNA-seq experiments with 10–200 samples run in seconds to a few minutes. Very large cohorts (500+ samples) take longer but are supported. The bottleneck is usually the enrichment step on large gene sets, not the DE itself.
What if I have single-cell data?
TransXplorer is built for bulk RNA-seq. Single-cell needs fundamentally different statistical methods (sparse counts, cell-level rather than sample-level inference). A sister tool called CellScope is in development for single-cell workflows.
Can I cite TransXplorer in a paper?
Yes — please do. Preprint: doi:10.64898/2026.05.15.724657. Citing helps others find the tool and supports continued development.
Where can I get help?
Open an issue on the GitHub repository for bugs or feature requests, or email varinde2@ualberta.ca for direct support.