Hi AWG members,
As many of you may know, there are several bulk and single cell RNAseq foundation models (FMs) available, such as BulkRNABert, MOJO, BulkFormer, CellWhisperer, scFoundation, and scGPT. I am curious how many of you are using RNAseq FMs in your research. If you are using these FMs or others will you please let me know in the comments and also tell me how you are using them (i.e. what questions are you asking, what data/information are getting out of them)?
Thanks!
@AWGall
6 Likes
Hello, I’m using the GPT-5.5 model. Since I’m studying Behavioral Neuroscience and nutraceuticals, I ask it this type of question: What is the vitamin precursor to serotonin? Could you provide the sources you’ve verified? It’s necessary to verify the sources to ensure they are correct.
Thanks, @Michelle, so in your example you’re using the GPT-5.5 large language model (as opposed to a specific RNAseq model). Am I understanding your use case correctly?
Hi @asaravia
I use foundation models in opthalmology, mainly RETFound ,
It’s trained on huge number of retinal images, using self-supervised learning approach, and it look at patterns which is not clear to us ! But for some reason outperforms classical models in many tasks.
So when i want to do something ( not studied , or discovered) with retinal images ,
I simply process image into RETFound encoder and collect the outputs as embeddings .
Then i work with them directly , for example i ask what is biological age of this retina,
Or what is heart and kidney status of the person with this retinal image !
1 Like
Hi @asaravia , thanks for starting this conversation!
Yes, we have been experimenting with scGPT and BulkRNABert in our work. With scGPT we are primarily using it for cell type annotation on single cell data and the main question we are asking is whether the model can reliably identify rare cell populations without us having to manually define marker genes each time. The results are promising but we do find it needs fine-tuning on our specific tissue context to get reliable outputs.
With BulkRNABert we are exploring whether we can get meaningful embeddings across samples from different cohorts and use those for downstream clinical outcome prediction. So far the embeddings are quite useful for clustering samples but the actual prediction tasks need more labeled data to fine-tune properly.
We have not worked with CellWhisperer or scFoundation yet but they are on our list. Would be very curious to hear from others who have used them, especially on non-human datasets since that is something we are moving toward.
What tissue types or organisms are you working with? That would help understand which FM makes the most sense for your use case.
1 Like