In this module we are working on data analysis, programming languages, exploratory data analysis, and functional annotation. But it may be worth opening a very timely question: how might foundation models change the way we analyse and interpret omics data?
In this context, foundation models are large-scale models trained on massive biological datasets, such as DNA sequences, transcriptomics, proteomics, or single-cell data, to learn general representations that can later be reused across many different tasks. In recent years, they have started to be applied in genomics, functional annotation, variant prediction, and single-cell analysis, and they are increasingly being presented as a new layer of infrastructure for bioinformatics.
This raises an interesting question: if these models can learn complex patterns at scale, could they transform tasks such as functional annotation, variant interpretation, multi-omics integration, or even part of exploratory data analysis? There are already studies showing promising results in genome annotation at single-nucleotide resolution and in the use of pretrained models for diverse downstream tasks with limited additional data.
At the same time, however, an important caution emerges: greater predictive power does not automatically mean greater biological understanding. Recent benchmarking studies show real promise, but also make clear that major challenges remain in interpretability, rigorous evaluation, generalization, and biological reliability.
So perhaps the question is not only whether foundation models will change
bioinformatics, but also what role expert knowledge will continue to
play.
- Will EDA, statistical validation, and classical functional annotation remain just as important?
- Or will some of these tasks increasingly be mediated by ever more powerful pretrained models?
- And above all, how do we prevent a very powerful tool from also becoming a new black box?
References
· Dalla-Torre H, et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nature Methods (2025). (Nature)
· de Almeida BP, et al. Annotating the genome at single-nucleotide resolution with DNA foundation models. Nature Methods (2025). (Nature)
· Feng H, et al. Benchmarking DNA foundation models for genomic and genetic tasks. Nature Communications (2025). (Nature)
· Tomaz da Silva P, et al. Nucleotide dependency analysis of genomic language models. Nature Genetics (2025). (Nature)
-++++++++++++++++
In this module we are working on data analysis, programming languages, exploratory data analysis, and functional annotation. But it may be worth opening a very timely question: how might foundation models change the way we analyse and interpret omics data?
In this context, foundation models are large-scale models trained on massive biological datasets, such as DNA sequences, transcriptomics, proteomics, or single-cell data to learn general representations that can later be reused across many different tasks. In recent years, they have started to be applied in genomics, functional annotation, variant prediction, and single-cell analysis, and they are increasingly being presented as a new layer of infrastructure for bioinformatics.
This raises an interesting question: if these models can learn complex patterns at scale, could they transform tasks such as functional annotation, variant interpretation, multi-omics integration, or even part of exploratory data analysis? There are already studies showing promising results in genome annotation at single-nucleotide resolution and in the use of pretrained models for diverse downstream tasks with limited additional data.
At the same time, however, an important caution emerges: greater predictive power does not automatically mean greater biological understanding. Recent benchmarking studies show real promise, but also make clear that major challenges remain in interpretability, rigorous evaluation, generalization, and biological reliability.
So perhaps the question is not only whether foundation models will change
bioinformatics, but also what role expert knowledge will continue to
play.
Will EDA, statistical validation, and classical functional annotation remain
just as important?
Or will some of these tasks increasingly be mediated by ever more powerful
pretrained models?
And above all, how do we prevent a very powerful tool from also
becoming a new black box?
References
· Dalla-Torre H, et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nature Methods (2025). (Nature)
· de Almeida BP, et al. Annotating the genome at single-nucleotide resolution with DNA foundation models. Nature Methods (2025). (Nature)
· Feng H, et al. Benchmarking DNA foundation models for genomic and genetic tasks. Nature Communications (2025). (Nature)
· Tomaz da Silva P, et al. Nucleotide dependency analysis of genomic language