In February I argued that statistics and machine learning, as empirical methods, cannot correct for the biases embedded in the data they consume. The correction must come from outside the dataset—from context. But context is a word that hides a great deal of work. It is not a handful of literature references or a pathway database. It is an organized conceptual framework, built up over decades of observation, and it only becomes useful to others when it has been written down.
This month, we are releasing Omics Data Science (Edition 2026.1)—a 391-page textbook that is our first serious attempt to write that framework down.
The rest of this letter is about why a book like this is needed now, and what "writing down the framework" actually means.
Twenty years ago, the scientist who analyzed an omics dataset was, by necessity, the scientist who understood the method. Tools were scarce enough that each tool came bundled with its own training path.
Today the reverse is true. Tools have proliferated, interfaces have matured, and a graduate student can produce a PCA plot, a volcano plot, or a differential abundance table within an hour of their first exposure. What has not proliferated at the same rate is the conceptual framework that tells you when a PCA is compensating for technical variation rather than biology, when a volcano plot's cut-off is hiding a real signal, and when a differential abundance test is answering the wrong question. That framework used to be transmitted informally—through supervisors, reading groups, and years of quiet frustration. It was slow, unreliable, and tied to where and with whom you happened to train.
It is tempting to assume that AI will close this gap—that as automation improves, the need for methodological understanding will fade. The opposite is true.
When an automated pipeline selects a normalization, runs a correction, chooses a threshold, and delivers a result, it does not remove the interpretive burden; it buries it. The researcher ends up holding a conclusion whose provenance they cannot easily inspect, and the pressure to "just trust the output" grows with every improvement in the underlying model. This is the exact failure mode the auto-pilot with full control design from January was meant to prevent. But control is only usable if the reader has the vocabulary to read what the tools are showing them.
The book is not a software manual and not a catalog of algorithms. It is an ordered account of how modern omics analysis actually works—what each step is for, why it is there, what the alternatives look like, and how the disciplines relate to one another. The chapters are structured so that every workflow step in our platforms—MetaboAnalyst, ExpressAnalyst, MicrobiomeAnalyst, ProteoAnalyst, NetworkAnalyst, OmicsNet, OmicsAnalyst—has a principled explanation the reader can recover.
The table of contents is, in effect, the skeleton of the conceptual map that February pointed at:
Tools without context are dangerous. Context without tools is academic. January described the tools, February argued for the context, and this month the context takes a form you can hold. The letters to come will describe how the two begin to weave back together—how an analysis session can carry the reasoning of the book with it, so that a researcher is not only handed a result but given the explanation that makes it meaningful.