Hi @jakubm Very sorry about that, I will be sending out an updated calendar invite today and will make sure to include your email.
Thank you @lauren.sanders !
Below is a brief explanation I demonstrated yesterday - Can not add notebooks here so code is at https://github.com/yigitk/Causal_Inference_Examples:
The first experiment introduces a ground truth method and outlines three different approaches to control for confounding factors and identify causal effects. Backdoor criterion, front-door criterion and instrumental variables.
In the second experiment, we use real data from the UC Irvine Adult dataset (https://archive.ics.uci.edu/dataset/2/adult) to examine the impact of occupation on earnings. By accounting for education status and other unmeasured variables through data preparation and confounding adjustments, we found an effect size of approximately 20%.
Anu i am sorry, i just saw this. yes, would you like the list? please shoot me an email soroush.seylani@gmail.com
Did I miss a memo, we didnât have a meeting today?
I donât think we have. Or I missed it too
Not sure, I still have the âAsk to Joinâ sadly and it looked like no one was there. @rtscott2001 @lauren.sanders Any way to solve the issue with âAsk to Joinâ instead of letting me in? (yigitkucuk92@gmail.com)
@yigitkucuk92 @jakubm @soroush.seylani
The causal inference meeting is now on Mondays, it was changed a couple weeks ago
Always remember to check your Calendar to see ALL meetings (AWG Meeting Calendar on the left; link here: https://awg.osdr.space/upcoming-events )
Here is the monday Causal Inference meeting calendar invite:
https://awg.osdr.space/t/causal-inference-subgroup-biweekly-meeting/2435
The lead for the subgroup is @pmisra30
The Chair of the entire @AIMLawg is @lauren.sanders
Sorry if you felt like you missed out. Nothing was missed Thanks for asking!
Thanks Ryan! Thatâs what it seemed to me
Thanks Ryan and everyone for a lightning fast response Appreciate you all!
Hi all - looking forward to our meeting today at 1:30PM Pacific time!
@dr.richard.barker and @anna.lewkowicz if you are able to attend, letâs spend some time discussing the plant regolith causal inference project idea using OSD-476. I reviewed the data and there is info in the Assay Table which links the RNAseq samples to the image samples (I believe there was a question about this in the last meeting). I can demo this today.
Thanks all for a great meeting! Notes below.
Scientific project update: using OSD-476 to predict causal factors impacting the difference between lunar regolith vs. regolith simulant for plant growth
The plan is to try try 3 experiments:
- gene expression only
- image only
- gene expression + image
Action items:
-
@everyone: message Anna @anna.lewkowicz if you are interested in contributing to the lunar regolith project!
-
James @james.casaletto will present the CRISPv3.0 version which supports image data, at the next meeting 5/26
-
@everyone: post here or bring to the meeting if you have any other datasets (OSDR or otherwise) that you want to test CRISPv3.0!
-
Yigit @yigitkucuk92 : help James integrate DoWhy to this project
Dear team â sorry for the last-minute notice. Iâll be late to our meeting today if I can make it at all. Have another very urgent obligation I need to attend to.
cheers
-james
Hi @james.casaletto no problem! I should be able to attend the whole meeting. Are there any updates or a GitHub repo for CRISPv3.0?
Dear @james.casaletto @lauren.sanders and @rtscott2001. Here is the code to the causal inference dowhy experiments for the sub group! GitHub - yigitk/crspv1.1-causal-inference: This is a Streamlit application for interactive causal analysis using the DoWhy library. It allows users to explore potential causal relationships in their data through an easy-to-use web interface.
It has a Readme, and can do a comprehensive analysis of all the variables 11103 as it stands at the moment. Will present some of it in the meeting in case James misses it.
Thank you Yigit! I will join the meeting in 2 minutes.
Hi @everyone - Thank you for a great meeting today!
First letâs find a better time to meet: please fill out this poll: https://forms.gle/Up86DaSLd2cNjoAV8
Here are the notes and actions from today:
-
Created a Streamlit app
-
Loads the /data from the crispv1.1 repo (which is liver rnaseq, the threshold 1/0 column is high/low ORO lipid retetion values)
-
Can choose different weights, eg linear regression
-
Do a refute test, add a random confounder and see if it will change the causal inference. Can also add a placebo
-
The code originally iterates through all the genes and treats each one as the cause versus the effect. Which is a cool biological question- which gene causes which other gene?
-
Collect the pairwise correlation to find the highest correlated pairs
Feedback:
-
instead of all pairwise correlations, use a t-test to feature reduce instead, to find features that are related to the âthresholdâ column. Then perform causal inference to predict the âthresholdâ column.
-
Also use the synthetic data since there are ground truth causal features: AI4LS/crispv1.1/data/synthetic at main ¡ nasa/AI4LS ¡ GitHub
Contact Yigit if any questions about his code!
-
Update on project collaborating with Lunar Regolith subgroup in Plant AWG
-
OSD-476 dataset; plants were grown on 3 different Apollo lunar regolith and on 4 different lunar regolith simulants (JSC-#)
-
Also there is differential gene expression data from all the plants
-
The idea for this project is to combine the image data and the DGE data and use CRISP to infer which genes cause the inferred phenotype
-
First need to âdescribeâ the phenotype, using a program called SOAPP. Detects plants and characterizes things like their color or size
-
Ran into an issue, some of the plants were so dark that SOAPP couldnât even detect them. Manually enhanced the colors but made the color analysis useless.
-
Solution, used Photoroom photo edit software AI feature to remove all but 1 plant from each picture. Then could change the SOAPP detection threshold for each plant.
-
Currently have 2 files: data on each plantâs size, and data on each plantâs color: are these sufficient for CRISP? - Yes!
-
Next steps:
- binarize all of the columns (maybe use median as the high/low threshold) to use as targets in CRISP
- run PCA to see which of the columns explain the most variance in the dataset
- use regolith vs simulant as the environments in CRISP
- Alex DongHyeon Seo @alexdseo suggests GLARE - GeneLab Representation Learning Pipeline to make these analyses easy. GLARE pre-print : https://doi.org/10.1101/2024.06.04.597470
-
Discussion of comparing CRISPv1.1 and CRISPv3.0 - can CRISPv3.0 overcome the issue with SOAPP?
Potentially new project idea from Alex DongHyeon Seo, Richard Barker, Simon Gilroy:
-
Working on an extension of GLARE
-
Right now GLARE creates a latent representation, but we want to do a causal representation, and connect with agentic AI
-
Ultimately want to do hypothesis generation with agentic AI
-
Presentation at a future meeting
The CRISP v3.0 has been finished and is available at https://github.com/crowdplat/NASA-NOIS2-171/tree/main. I asked the developers to add the documentation to the README which should be available soon. I have that documentation separately if you want to try using it before they have the chance to update the README.
cheers
-james
Thank you @james.casaletto ! This is great! Letâs go over CRISPv3.0 as a group once the documentation is finished, maybe at our next meeting. We can discuss how to integrate it with Annaâs OSD-476 project and how to compare CRISPv3.0 with CRISPv1.1 and the SOAPP analysis.