Title: HBISS Recap: AI Agents Building AI Biology Models (Biomni and the Integrated Biology Environment); Kexin Huang (Phylo, formerly Stanford)
Hi @AWGall
June 25, 2026
Today’s Horizons in Biosciences and Informatics Seminar series (HBISS) with ~75 total in attendance featured Kexin Huang, Co-founder and CEO of Phylo and first author of the original Biomni paper, on building autonomous biomedical research agents.
Kexin did his PhD in computer science at Stanford with Jure Leskovec, and his talk traced the arc from the open-source Biomni project to its hosted successor, Biomni Lab and its Integrated Biology Environment (IBE).
Here is the link to view the recording if you were unable to join.
The space relevance is direct. Ryan has run Biomni for months and fed it OSDR’s three public APIs, BioData, RadLab, and the Environmental Data App, all of which the agent reached without trouble. With 1,000+ curated OSDR datasets, this is a fast route to mining spaceflight life science data.
What Kexin covered
Why agents. Demand for biomedical expertise grows exponentially while the supply of biologists grows linearly, so data goes unanalyzed and ideas wait in line. Biomni’s thesis: scientists provide the ideas, the agent handles execution.
Biomni and its environment. Released open source at Stanford in 2025, Biomni was one of the first general-purpose biomedical AI agents. Because agents need a biology environment to act in, the team ran an action discovery agent across literature in 25 biomedical domains to map the tools, databases, and protocols needed to reproduce results, then built the essential ones into a single environment (Biomni-E1): bioinformatics tools (DESeq2, Scanpy, PLINK, BWA), major databases (PDB, UniProt, GenBank), ML predictors, and wet-lab know-how. The agent reasons with an LLM, plans with retrieval, and acts by writing and running code.
Biomni Lab and the IBE (live demo). The hosted successor is a workspace where biologists collaborate with the agent. Each task spins up a virtual machine with provisioned CPUs or GPUs, an agent-native HPC layer runs AlphaFold, Boltz, RFdiffusion, and full pipelines from plain English, a skill hub holds community best practices, and a “show traces” view exposes every step, all code, data, APIs, and outputs.
Training and fine-tuning foundation models. The capability that landed hardest for our community: beyond curating data and running omics workflows, the agent trains and fine-tunes AI foundation models from plain English. A short prompt fine-tunes Boltz on a new dataset or trains scGPT on Perturb-seq data, and you can pre-train or build new models the same way. It still handles standard omics processing too, taking a single-cell count matrix through UMAP, PCA, clustering, marker genes, and labels, plus multi-omics integration. You then interrogate the model you just trained in the same environment: ask “what sequence features drive localization,” and the agent applies an interpretability method to that model and pulls in the literature to answer. Combine that with agent-launched experiments and you get an agentic lab in a loop. Ryan noted he trained a foundation model in about a week that would otherwise have taken months.
Three hard problems set scientific agents apart from coding agents:
- Hallucination. A stepwise review checks every major step for hallucinations, missing information, and factual mistakes and self-corrects; a separate deep-scan reports datasets used, hallucination patterns, and limitations.
- Scalability. Agent-managed sandboxes let one agent launch many machines (up to ~200), allocate memory per task, and run in parallel over a shared file system, so terabyte jobs and multi-day pipelines become routine.
- Reproducibility. Traceability records everything from question to result, and agentic pipelines built on Nextflow give 100% reproducibility for well-specified tasks while keeping the LLM free for exploration.
Evaluation. Public benchmarks often carry ambiguous questions and narrow coverage, and fixing those issues raised performance, so the failures sat in the benchmarks, not the agent. Biomni-Bench mines user traces for common tasks and asks the original first authors to write questions and rubrics, with process-level grading that judges the intermediate steps, not just the final answer.
What agents unlock. Kexin closed with applications: wet-lab automation through PyLabRobot (Hamilton, Opentrons) and experiment-as-API services (Adaptyx, Ginkgo) for closed-loop work; TusoAI, which optimized a single-cell method to roughly 40% better in about a day and 200 iterations at a cost near 40 cents, against a version that took a postdoc months; and a program of agents running across 20,000 genes, shown as an Agentic Atlas for target prioritization with the Michael J. Fox Foundation. The longer vision is agentic organizations, where session traces become a shared, traceable knowledge base a background agent mines for insights and gaps.
Key discussion highlights
A long and substantive Q&A, well past the hour.
- Dave Nguyen @dave asked whether validating a Biomni result with a second agent is undercut by shared biases. Kexin: skills let scientists encode their own approach so the agent follows their lead, and the stepwise review catches hallucination patterns; self-improvement from user traces requires human approval, and only high-confidence best practices get folded in.
- Jon @jgalazka asked what repositories should do to enable these platforms, and @pwrose asked about MCP. Kexin: define APIs. API-native resources like PDB and UniProt integrate cleanly, databases without APIs get data lakes with added metadata, and you can drop in a link or MCP for the agent to use or request native integration. This connects directly to the OSDR knowledge-graph MCP that Peter Rose and Amanda have been building.
- Pooneh Bagher asked how to publish this work and satisfy reviewers. Kexin: cite the Biomni paper (publishing soon) plus the tools and packages used, and the agent compiles that reference list on request; protected health data needs a separate, non-public deployment under a data processing agreement.
- Venkat Chitikala @Kinnera_1002 asked about the space relevance. Ryan answered: OSDR holds 1,000+ datasets and three APIs, and the agent can mine spaceflight data for drug predictions and hypotheses about biological dysfunction from microgravity, radiation, and confinement, across mammalian, plant, and microbial data. About 185 OSDR/GeneLab enabled papers, preprints, and theses exist to date.
- Amanda Saravia-Butler @asaravia asked about versioning, metadata, and context. Tools are pinned by default through container images and upgradable on request, a background knowledge graph (PrimeKG) resolves identifiers across ontologies, and an in-session manager compacts and saves key information automatically, which is why a project started in April can continue months later without context loss.
- Mo Hamza @Mo_Hamza asked whether agents replace or assist bioinformatics (human-in-the-loop by design), and Hussein asked whether they find novel biology or just reinforce existing databases (the agent can reason toward new hypotheses, but validation still needs orthogonal data and experiments).
- Kirill Grigorev @kirill flagged strong interest in integrating OSDR’s APIs further. Ryan walked Kexin through OSDR’s metadata structure, and Kexin suggested a web API plus an OSDR skill that encodes the know-how for repeatable access.
Next HBISS
July 22, 2026: lunar regolith and its effects on plant biology and the brain, with some interesting toxicity connections. ![]()
Watch Forum-Space for speakers and details.
All links from the chat and presentation
- Phylo - main link to try the free version of Biomni
- The Integrated Biology Environment (Phylo blog)
- Biomni: A General-Purpose Biomedical AI Agent (Huang et al. 2025, bioRxiv)
- TusoAI: Agentic Optimization for Scientific Methods (Turcan, Huang, et al. 2025, arXiv)
- Executable Code Actions Elicit Better LLM Agents (CodeAct, arXiv 2402.01030)
- PyLabRobot
- Scale AI and Phylo on coding agents for drug discovery
- Kexin Huang Google Scholar
- OSDR BioData API
- RadLab
- Environmental Data App (EDA)
- OSDR/GeneLab Google Scholar (enabled pubs, preprints, theses)
- HBISS series
- AWG Forum-Space About