microbial: The study of microbes
meta: in communities
genomics: and development of the necessary computational instruments to read their genomes (and learn what they do).
I believe that the study of microbial communities and their appreciation as interactive agents holds potential for:
For most of the 20th century, microbes were studied in isolation. Central insights into fundamental and general molecular mechanisms were gained due to the ease of handling and genetic modification of model organisms, such as E. coli or baker’s yeast. While it was obvious that these organisms do not occur in isolation in nature, the vast diversity of microbial communities only became apparent with the application of high-throughput sequencing (from around 2004 onwards): Most microbes in most environments had never been cultured and whole branches of the tree of life had been hidden from science. “Metagenomics” became the discoverer’s ship or the moon rover of microbiology. But in addition to cataloguing what is out there (or in here, in the case of the human microbiome), there were new questions.
What does that world of microbes do with us?
Do microbes, as communities, affect our well-being?
Or that of other organisms, e.g., the plants that we eat?
And do the ecological rules governing these communities determine those effects?
As a start, formal (read: mathematical) descriptions needed to be defined. Because metagenomics and high-throughput marker gene sequencing generate a census of microbial communities, numerical methods from community ecology were imported. Despite a high degree of adaptation to the specific characteristics and meanings of ecological data, those methods share common inheritance with the statistical methods employed in many other analytical research fields. In addition, microbial ecology inherited minds and methods from omics in cellular biology or biomedical sciences (I, for example first practiced transcriptome analysis before I became a microbiome researcher).
In consequence, there is now a lively ecosystem of statistical and machine learning methods that wrangle community matrices (taxa x samples; and less frequently, but increasingly: functions x samples). With their aid, researchers reveal microbiome trends and patterns which are associated with environmental or host characteristics. These patterns can be informative about host health or the status of natural or biotechnological ecosystems. The microbes or combination of microbes in which these patterns are most striking are of interest as marker species or because they can carry special relevance for the microbial processes.
Given the multitude of processes in which microbiomes play roles that are not even gauged and the diversity of microbes that may provide the key to steering microbial systems, or at least be a valuable marker, application of the microbiome bioinformatics and data analysis methods collection continues to unearth new findings.
You can roughly divide my microbiome work into three fields:
Here are links to selected related publications:
Microbiome data are not like other data: in contrast to community ecology data, they are the product of omics measurements (or at least high-throughput marker gene sequencing). They are also not the same as genomics (or transcriptomics/proteomics) of single, and in particular model-, organisms. Bioinformatics and data analysis for microbiomes may borrow methods from these fields, but they need to meet further challenges. One set of challenges stems from the entanglement of our ability to detect, recognize, and observe a microorganism and its quantifiability: microorganisms and part of genomes that are already known are more easily detected and precisely quantified – yet, these make up only a small part of most microbial systems. Rare organisms or genomic regions are more difficult to describe and to recognize unequivocally, so measurement errors remain large. Like with many high-throughput technologies, there is often a trade-off between measurement effort and precision and/or sensitivity, but this dimension is seldom explored in microbiome research and often neglected in interpretations. Similar issues exist for the sample processing and measurements: paradoxically, despite the high analytical costs and lengthy computational efforts, microbiome experiments are not usually better designed than others (including the frequent lack of controls and replication that would have been part of simpler studies). These challenges can be met, if the bioinformatics methods that delineate and detect microbial or functional signatures from the omics data report uncertainties. Data analysis methods that incorporate this information and that are more robust and/or adaptable to very specific experimental designs need to be developed and employed.
What is most interesting to study within a microbiome? In theory, it is possible to focus on any aspect of a meta-omics survey (including both well-known and new entities, e.g., how many different microbial taxa are present? or which strains of a particular genus of microbes? or how abundant are transcripts with a particular function compared to others? or does microbiomes under condition A change more quickly than under condition B?). Yet, there’s always a risk of aspects outside the focus confounding the view. Choosing these aspects (or features as a data scientist would say) wisely is difficult, but of paramount importance. It requires both microbiological knowledge (e.g., is information on genus identity helpful or should my data be resolved to strain-level? do I need to make a distinction between enzymes that likely operate at different pH or can I lump-sum all the enzymes in a pathway?) and the abovementioned bioinformatics insights into the certainty with which data at any level can be obtained. In addition, there is no reason why microbiome research should be stuck with traditional features (e.g., species, a cocept that is awkward to apply to non-sexually reproducing microbes): Advances in bioinformatics open the door for new categories, for example structurally similar proteins or protein surfaces. Once bioinformatics and statistics methods that recognize and handle them are established, these categories may prove relevant in host-microbiome interactions.
Asking single-aspect questions like the ones introduced at the beginning of the previous paragraph and streamlining bioinformatics and data analysis tools for such studies has been very productive. However, this approach is overly simplistic: in human microbiome research, focusing on a single aspect without taking into account the rest of the picture has led to discussions more akin to “is the dress blue or white?” than scientific theory building. It is especially self-limiting in multi-meta-omics studies: here, a common strategy is to process each meta-omics data set with a particular focus (say, metagenomics for which taxa are present, metatranscriptomics for what functions do transcripts have, and metabolomics for which molecules are how abundant) and then hope that data analysis will magically find links that were purposefully disregarded during data processing. Preserving the links that are already present in sequence-based data or making use of existing knowledge on links (e.g., between enzyme classes and metabolites) is a much more promising approach. This strategy should be statistically more powerful. And it can distinguish different mechanisms behind observations and pinpoint functionally important community members. Methods development for both the storage and handling of this interconnected data and its analysis is imperative if we really want to transform microbiome research from a collection of citable curiosities to an impactful science.
When I say “we”, I mean a growing community of researchers with different backgrounds: microbiologists, ecologist, medical researchers, soil scientists, plant scientists, biotechnologists, bioinformaticians, biostatisticians, data scientists, modelers, and the first “microbiome natives” (who learned about microbiomes during their university education) meet in microbiome research. Scientific research often requires different insight from multiple perspectives and contributions from several experts. My research is testament to this: if you skim through the list of publications below, you see that I wrote hardly any of them on my own. Moreover, the authors of the vast majority of them come from different places, involving individuals from more than 60 institutes or universities. I am fascinated by their different perspectives and the paradigms in their fields: what kind of questions do they ask? what assumptions do they make? why? can their approach be applied in a different field?
Open Science allows wide access to scientific advances. It prevents misconduct and innocent misunderstandings. For my own research, I use pre-print servers as much as possible, where manuscripts can be freely downloaded even before they have been peer-reviewed. The raw data that I have generated during my time of wet-lab science is in public repositories and I enforce publication of data in projects that I lead. The software that I have (co-)developed and maintain is open-source software (see for example my github). Below, you can find links to course materials and to the Science Park Study Group, who are a group of like-minded colleagues who promote skill sharing and open data in biology and beyond.
I am part of the Biosystems Data Analysis (BDA) group at the Swammerdam Institute for Life Sciences (SILS) of the University of Amsterdam. BDA leads the omics and systems biology data science at SILS, together with its collaboration partners. It blends a wide range of pertinent expertise, including biology, bio-analytics, data processing and mining, data analysis/machine learning, data fusion, and modelling. Our biological data science and methods development makes use of biological knowledge and validated models to limit the computational problems to those parts that can effectively be tackled by algorithms based on realistic data sizes.