Causal Inference Sub Group 2025

Hi @jakubm Very sorry about that, I will be sending out an updated calendar invite today and will make sure to include your email.

1 Like

Thank you @lauren.sanders !

2 Likes

Below is a brief explanation I demonstrated yesterday - Can not add notebooks here so code is at https://github.com/yigitk/Causal_Inference_Examples:
The first experiment introduces a ground truth method and outlines three different approaches to control for confounding factors and identify causal effects. Backdoor criterion, front-door criterion and instrumental variables.

In the second experiment, we use real data from the UC Irvine Adult dataset (https://archive.ics.uci.edu/dataset/2/adult) to examine the impact of occupation on earnings. By accounting for education status and other unmeasured variables through data preparation and confounding adjustments, we found an effect size of approximately 20%.

4 Likes

Anu i am sorry, i just saw this. yes, would you like the list? please shoot me an email soroush.seylani@gmail.com

2 Likes

Did I miss a memo, we didn’t have a meeting today?

I don’t think we have. Or I missed it too

Not sure, I still have the “Ask to Join” sadly and it looked like no one was there. @rtscott2001 @lauren.sanders Any way to solve the issue with “Ask to Join” instead of letting me in? (yigitkucuk92@gmail.com)

@yigitkucuk92 @jakubm @soroush.seylani

The causal inference meeting is now on Mondays, it was changed a couple weeks ago

Always remember to check your Calendar to see ALL meetings (AWG Meeting Calendar on the left; link here: https://awg.osdr.space/upcoming-events )

Here is the monday Causal Inference meeting calendar invite:
https://awg.osdr.space/t/causal-inference-subgroup-biweekly-meeting/2435

The lead for the subgroup is @pmisra30

The Chair of the entire @AIMLawg is @lauren.sanders

Sorry if you felt like you missed out. Nothing was missed :slight_smile: Thanks for asking!

2 Likes

Thanks Ryan! That’s what it seemed to me

2 Likes

Thanks Ryan and everyone for a lightning fast response :slight_smile: Appreciate you all!

2 Likes

Hi all - looking forward to our meeting today at 1:30PM Pacific time!

@dr.richard.barker and @anna.lewkowicz if you are able to attend, let’s spend some time discussing the plant regolith causal inference project idea using OSD-476. I reviewed the data and there is info in the Assay Table which links the RNAseq samples to the image samples (I believe there was a question about this in the last meeting). I can demo this today.

Thanks all for a great meeting! Notes below.

Scientific project update: using OSD-476 to predict causal factors impacting the difference between lunar regolith vs. regolith simulant for plant growth
The plan is to try try 3 experiments:

  • gene expression only
  • image only
  • gene expression + image

Action items:

  • @everyone: message Anna @anna.lewkowicz if you are interested in contributing to the lunar regolith project!

  • James @james.casaletto will present the CRISPv3.0 version which supports image data, at the next meeting 5/26

  • @everyone: post here or bring to the meeting if you have any other datasets (OSDR or otherwise) that you want to test CRISPv3.0!

  • Yigit @yigitkucuk92 : help James integrate DoWhy to this project

3 Likes

Dear team – sorry for the last-minute notice. I’ll be late to our meeting today if I can make it at all. Have another very urgent obligation I need to attend to.
cheers
-james

1 Like

Hi @james.casaletto no problem! I should be able to attend the whole meeting. Are there any updates or a GitHub repo for CRISPv3.0?

1 Like

Dear @james.casaletto @lauren.sanders and @rtscott2001. Here is the code to the causal inference dowhy experiments for the sub group! GitHub - yigitk/crspv1.1-causal-inference: This is a Streamlit application for interactive causal analysis using the DoWhy library. It allows users to explore potential causal relationships in their data through an easy-to-use web interface.

It has a Readme, and can do a comprehensive analysis of all the variables 11103 as it stands at the moment. Will present some of it in the meeting in case James misses it.

2 Likes

Thank you Yigit! I will join the meeting in 2 minutes.

2 Likes

Hi @everyone - Thank you for a great meeting today!

First let’s find a better time to meet: please fill out this poll: https://forms.gle/Up86DaSLd2cNjoAV8

Here are the notes and actions from today:

@yigitkucuk92

  • Created a Streamlit app

  • Loads the /data from the crispv1.1 repo (which is liver rnaseq, the threshold 1/0 column is high/low ORO lipid retetion values)

  • Can choose different weights, eg linear regression

  • Do a refute test, add a random confounder and see if it will change the causal inference. Can also add a placebo

  • The code originally iterates through all the genes and treats each one as the cause versus the effect. Which is a cool biological question- which gene causes which other gene?

  • Collect the pairwise correlation to find the highest correlated pairs

Feedback:

  • instead of all pairwise correlations, use a t-test to feature reduce instead, to find features that are related to the “threshold” column. Then perform causal inference to predict the “threshold” column.

  • Also use the synthetic data since there are ground truth causal features: AI4LS/crispv1.1/data/synthetic at main ¡ nasa/AI4LS ¡ GitHub

Contact Yigit if any questions about his code!

yigitkucuk92@gmail.com

@anna.lewkowicz

  • Update on project collaborating with Lunar Regolith subgroup in Plant AWG

  • OSD-476 dataset; plants were grown on 3 different Apollo lunar regolith and on 4 different lunar regolith simulants (JSC-#)

  • Also there is differential gene expression data from all the plants

  • The idea for this project is to combine the image data and the DGE data and use CRISP to infer which genes cause the inferred phenotype

  • First need to “describe” the phenotype, using a program called SOAPP. Detects plants and characterizes things like their color or size

  • Ran into an issue, some of the plants were so dark that SOAPP couldn’t even detect them. Manually enhanced the colors but made the color analysis useless.

  • Solution, used Photoroom photo edit software AI feature to remove all but 1 plant from each picture. Then could change the SOAPP detection threshold for each plant.

  • Currently have 2 files: data on each plant’s size, and data on each plant’s color: are these sufficient for CRISP? - Yes! :slight_smile:

  • Next steps:

    • binarize all of the columns (maybe use median as the high/low threshold) to use as targets in CRISP
    • run PCA to see which of the columns explain the most variance in the dataset
    • use regolith vs simulant as the environments in CRISP
    • Alex DongHyeon Seo @alexdseo suggests GLARE - GeneLab Representation Learning Pipeline to make these analyses easy. GLARE pre-print : https://doi.org/10.1101/2024.06.04.597470
  • Discussion of comparing CRISPv1.1 and CRISPv3.0 - can CRISPv3.0 overcome the issue with SOAPP?

Potentially new project idea from Alex DongHyeon Seo, Richard Barker, Simon Gilroy:

  • Working on an extension of GLARE

  • Right now GLARE creates a latent representation, but we want to do a causal representation, and connect with agentic AI

  • Ultimately want to do hypothesis generation with agentic AI

  • Presentation at a future meeting

2 Likes

The CRISP v3.0 has been finished and is available at https://github.com/crowdplat/NASA-NOIS2-171/tree/main. I asked the developers to add the documentation to the README which should be available soon. I have that documentation separately if you want to try using it before they have the chance to update the README.
cheers
-james

2 Likes

Thank you @james.casaletto ! This is great! Let’s go over CRISPv3.0 as a group once the documentation is finished, maybe at our next meeting. We can discuss how to integrate it with Anna’s OSD-476 project and how to compare CRISPv3.0 with CRISPv1.1 and the SOAPP analysis.

1 Like