Exploring the use of AI Foundation Models

Dear all -

Our AI/ML digital twin subgroup is actively exploring the use of AI Foundation Models. I want to post this topic here as a placeholder before thinking/planning for a larger project. I appreciate your input and any experiences that you can share.

Here is a list of some interesting models (this is an expanding list that I will curate and keep up-to-date here in this post):

  1. TabPFN: Accurate predictions on small data with a tabular foundation model | Nature

  2. Evo2: Evo 2: DNA Foundation Model | Arc Institute

  3. Geneformer: Transfer learning enables predictions in network biology | Nature

  4. scPlantFormer: scPlantFormer: A Lightweight Foundation Model for Plant Single-Cell Omics Analysis | Research Square

  5. RETFound: A foundation model for generalizable disease detection from retinal images | Nature

Questions:

  1. What are the usual or standard inputs for such models and what are the expected outputs? How flexible are these models: multimodal (text, image, time-series, tabular), bio-sequence?

  2. What are the expected gains in using such models compared with tradational ML methods? What are you thinking and planning to use them for?

  3. Computational efficiencies (what resources do you need)? Time efficiencies in setting things up?

  4. Any other FMs that you are using? Could you please share?

Thank you all! @AIMLawg

8 Likes

Hi Great thought… working in similar lines. We can discuss let me know

1 Like

Dear @jgong ,

I believe this topic and these questions are very exciting and important for AWG-related projects.

I wanted to leave a message here and come back to it later as a reminder to revisit the discussion. Over the past two years, I have been working mainly on ophthalmology-related foundation models, especially RETFound. I have fine-tuned it, adapted it, used it as an encoder, and redefined its architecture for new tasks. I am also happy to share that I am now the Principal Investigator of the GlobalRETFound foundation model in 2026!

I will share my ideas and responses to your questions here soon, and I hope some great ideas will emerge as other @AIMLawg members contribute to the discussion!

1 Like

I’m also very curious about this topic. Count me in if you’re planning any meetings, would love to see how I could get involved!

1 Like

I am looking for the group involved in Topor Studies can anyone tell me who/where that is?

Thank you for the questions. I am still studying and learning more about these foundation models, so my understanding is still developing.

From what I understand, each model is useful for a different type of data. Geneformer, scGPT and scFoundation seem useful for single-cell omics data. TabPFN seems useful for tabular datasets. Evo 2 is more related to biological sequences, and RETFound is more focused on medical image data.

What interests me is that these models may help generate hypotheses faster, compare patterns across datasets, and support the identification of possible biological signals. For a deeper study, it seems that they could help analyze cross-tissue molecular signaling in spaceflight data, especially possible links between peripheral tissues and brain-related responses.

At this stage, I would probably start with pretrained models or embeddings rather than training anything from scratch, because I am still learning the technical requirements.

I am also using ChatGPT as a support tool to help me organize information, validate my understanding, compare models, and build a clearer base before moving into deeper technical work. I have also started using Grok, but I have not yet felt the same level of precise interaction as I do with ChatGPT. This may be because I have been using ChatGPT for longer, and much of my study background and previous information is already more organized here. I feel that, to fully migrate to Grok, I would need to integrate much more of my profile and study context there. However, I have already started making some comparisons between the two tools.

I also use ChatGPT to translate texts from Portuguese to English and to help me formulate clearer and more appropriate responses.

3 Likes

Hi ,

Here i want to put an important foundation model which i think will be important resource soon :slight_smile:

https://doi.org/10.1126/science.aec8514

They trained a model using zero-shot learning on 112 million cell transcriptomes, across 12 species with an evolutionary span of up to 1.53 billion years.

Then they evaluated its knowledge and understanding in some classification tasks.

I think these kinds of foundation models and datasets will be foundational and will be treated as revolutionary tools like Whole Genome Sequencing or MRI.

1 Like