HBISS Recap: Knowledge Graphs for Space Life Sciences; Peter Rose (UCSD) & Amanda Saravia-Butler (AI4LS)

HBISS Recap: Knowledge Graphs for Space Life Sciences; Peter Rose (UCSD) & Amanda Saravia-Butler (AI4LS)

@AWGall

March 12, 2026

Today’s Horizons in Biosciences and Informatics Seminar series (HBISS) featured @asaravia and Peter Rose presenting on the GeneLab Knowledge Graph and how Model Context Protocol (MCP) servers can connect it to external knowledge graphs using natural language queries.

Here is the recording if you missed the meeting (I hit record a few mins after the meeting began, sorry :sweat_smile:). All links are below from the chat, including those shared by Melissa from MonarchKG group :slight_smile:

What Peter covered

Peter walked through the SPOKE-GeneLab composite knowledge graph built on Neo4j. The graph integrates OSDR study metadata, differential gene expression, DNA methylation, and amplicon/metagenomic data with self-describing meta nodes that make it AI-ready. He demonstrated a Cypher query combining hypermethylated promoter regions with downregulated genes across spaceflight vs. ground control comparisons, and showed how the composite graph links GeneLab data to the broader SPOKE biomedical knowledge graph through shared nodes (genes, cell types, anatomy).

What Amanda covered

Amanda demonstrated how MCP servers allow users to query the GeneLab knowledge graph (and connect it to external graphs) entirely through natural language in a chatbot client like Claude, with no Cypher or coding required. She showed three live demos:

  1. Querying OSD-244 (Rodent Research 6, thymus, muscle atrophy) for differentially expressed genes across 30-day and 60-day spaceflight timepoints, generating volcano plots and Venn diagrams, then pulling related publications from PubMed via its MCP connector
  2. Combining the GeneLab KG with the Monarch knowledge graph to find genes that are both hypermethylated in the promoter region and downregulated in OSD-48, then using Monarch for pathway enrichment analysis (growth factor signaling, lipid metabolism, circadian clock, ECM remodeling)
  3. Using GeneLab KG + SPOKE-OKN + Monarch + PubMed together to analyze 16S amplicon data from OSD-267 (Veggie hardware validation test), identifying the top 20 most abundant bacteria in spaceflight roots and cross-referencing them against known plant and human pathogens across multiple knowledge graphs

Amanda outlined the near-term plan: host the GeneLab MCP server publicly so users only need to add a connector URL to their chatbot of choice and start querying. Longer term, a “Space Life Sciences” overarching MCP server will route queries across multiple KG sub-servers automatically. Registration on PyPI is planned once testing is complete.

Key discussion highlights

  • Melissa Haendel (Monarch Initiative) raised important points about KG interoperability: even when graphs use the same ontology terms, they can model source data differently. She encouraged the team to train Claude to document equivalency decisions and provenance. Melissa shared several standards and resources (linked below) and expressed strong interest in collaborating.
  • Rebecca Ringuette @rebecca.ringuette stressed the importance of being able to see the actual code and data sources behind any AI-generated plot, and suggested adding DOIs to study nodes. Amanda confirmed DOI properties are planned.
  • Nick Brereton @nicholas.brereton noted that FDR thresholds could be too strict for cross-experiment work and asked about integrating the Environmental Data App and RadLab. Amanda confirmed MCP servers for both the RadLab API and EDA API are planned once those APIs are in Open API format.
  • Adam Amara @adam.amara asked about an open-source dump of the graph database for testing in other graph engines. Peter shared a Neo4j dump file (linked below).
  • Simon Cole @simoncole asked about variance from experimental differences (read depth, library prep) across studies. Amanda noted that full metadata (currently in OSDR but not yet in the KG) will be accessible via API-based MCP queries so users can make informed decisions.
  • Anu Iris @anuiris asked about accessibility for non-coders. Amanda confirmed that once hosted, using the tool will require nothing more than adding a connector in your chatbot and writing natural language prompts. Tutorials and template prompts are planned.
  • Peter Rose noted that Claude (Opus 4.6 and Sonnet 4.6) currently gives the best and most consistent results among the LLMs tested.

Next HBISS

April 30, 2026 — Bowhead whales and improved DNA repair in long-lived species :whale: Bowhead whales live ~200 years, and contrary to expectations (elephants have extra tumor suppressor genes), bowhead fibroblasts actually require fewer oncogenic hits for malignant transformation but compensate with dramatically superior DNA repair via cold-inducible RNA binding protein (CIRBP). Exciting potential applications for spaceflight radiation countermeasures!


All links from the chat and presentation

10 Likes

@AstroJac

The recording is here :slight_smile:

I had to stop midway in Nicks presentation but going right back. I never seen a better a presentation on our research yet. This is so powerful!

1 Like

Thank you muchly

1 Like

Thanks everyone for this great presentation!

Regarding the knowledge graph Peter proposed and the MCP Amanda proposed, I see them as fundamental units, not only for human (us!) research, but also for agentic research.

By agentic research, I mean a community of agents that handle different types of data inputs (omics, demographics, and more), while communicating with each other, and with an orchestrator guiding them with the big picture in mind. For me, that big picture is novelty, utility, and creativity in science, while building on top of the best existing research.

Knowledge graphs break down the complexity of a dataset by showing the nodes, connections, and edges. MCP, on the other hand, helps agents think inside the world of OSDR datasets.

This idea combines Clawbot-like agentic labs, https://openclaw.ai/, this paper https://www.biorxiv.org/content/10.64898/2026.02.27.708412v1.abstract, and the autoresearch idea GitHub - karpathy/autoresearch: AI agents running research on single-GPU nanochat training automatically · GitHub
I’m tagging some seniors who may like this idea!

@asaravia @rtscott2001 @lauren.sanders @jgong @james.casaletto

4 Likes

Thanks for sharing this solid survey of tools in knowledge graphs.

I’ve put together an agentic graph from a sample of OSDR research artifacts related to Arabidopsis after chatting with @borjabarbero and @dr.richard.barker . Check out the graph and try exploring or querying the graph using an agent. There’s a walkthrough at the bottom left. Reach out if you would like a demo or have any feedback.

5 Likes

Nice! @asaravia @PlantAWG

1 Like

@shaobsh Your knowledge graph is super fun to play with.

After playing with it for a bit i made this figure to summarize what i found. There a couple of AI typo’s but just because i used gemini to merge a some of the screen shots i got from the knowledge graph.

Food for thought?

To summarize my understadning of the overall model, that i illustrated in the flowchart at the top of the figure, it isa good to shows the linear progression from Environmental Inputs (Microgravity, Radiation, Light) through Molecular Sensing, leading to Transcriptomic & Epigenetic Responses, and ultimately resulting in Phenotypic Adaptation.


Major and Common Responses identified in the network:

  1. Fundamental stress and defense mechanisms: The core central network, which connects to all other data types, highlights that Stress Response (green nodes) is the single most common and interconnected transcriptomic response to spaceflight. A space biologist would identify that the plant perceives the space environment primarily as a profound stressor.

    • Specific mechanisms: This includes the upregulation of genes involved in oxidative defense (ROS signaling) and general defense signaling, such as auxin signaling, which are essential for survival when standard gravity-based positional and structural cues are missing.
  2. Growth and metabolic downregulation: The central network also reveals a strong common response of Downregulated genes (blue nodes). This indicates that plants actively suppress major energy-intensive programs, specifically general growth, developmental processes, and standard metabolic pathways.

    • Space Biologist’s Interpretation: Instead of thriving, plants in space reallocate resources away from growth and reproduction toward immediate survival and stress mitigation, resulting in the smaller, slower-growing plants often observed in spaceflight experiments.
  3. Epigenetic regulation as a mediator: The data in the bottom-left panel confirms that Epigenetic Changes (DNA methylation, chromatin remodeling, and small RNAs) are common and critical regulators. These modifications provide the mechanical link between the environmental inputs and the observed transcriptomic changes.

    • Impact: This demonstrates that the plant is not just reacting in real-time, but is also setting up heritable or long-term structural changes in its genome to maintain the stress-adapted state throughout its life cycle in microgravity.
  4. Integration of the entire “Space Stressor Complex”: The arrows pointing from the Radiation Biology and Phytochrome Light Switch panels reinforce that the plant’s transcriptome is not just responding to microgravity. The green stress network is heavily integrated with responses to high-LET cosmic radiation (DNA damage repair) and altered light signaling (photobiology circuit integration). The graph explicitly labels “Radiation-Induced Stress Signalling” and “Photobiology Circuit Integration” as key drivers of the central gene expression profile, showing that all three environmental factors contribute to the overall response.

  5. A highly coordinated (if non-ideal) adaptive state: The space biologist would conclude that the plant’s response to spaceflight is not chaotic. It is a highly coordinated, integrated system. The networks show a structured approach to adaptation, where radiation-induced and photobiological signals are integrated into a transcriptomic stress profile, mediated by specific epigenetic changes. However, this adapted state is characterized by high stress, high energy consumption for defense, and suppressed growth.

This integrated atlas provides a comprehensive overview for a space plant biologist, mapping out links between environmental stressors of space flight on the plants often creates a unique, stress-dominated, state of adaptation based on currently avaliable transcriptional data.

5 Likes