In the past 3.5 years (but especially from 2021-2023), I counted ~125 people in the @ALSDAawg who provided expertise and feedback to the ~63 assay metadata and data standards for the phenotypic, physiological, imaging, biomedical, and behavioral data types. I’m talking about members who debated standards, put in effort to give us real guidance, told me our v1 drafts were wrong, or were missing values, or the members who provided nomenclature, checked out other databases to learn from, etc.
First tab of this Public Config list shows them all (in column B). The 2nd tab shows all the omics ones
OSDR as an integrated repository was released in early 2023, and we began releasing curated, reusable, machine-readable datasets for “non-omics” data (phenotypic, physiological, imaging, biomedical, and behavioral). As spaceflight science datasets are deposited to ALSDA/OSDR, we then create the standards.
Look at any of the ~50 ALSDA datasets in OSDR, and you’ll see the assay metadata and data standards being used The assays tables use the metadata configs, and all tabular files use the dependent variable data standards via nomenclature, units, and results.
Since that time of the first dataset release in 2023 (and the few publications now mining non-omics data) we have received lots and lots of inquiries about why our system works so well and people wanting more details. These queries come from scientists, physicians, programmers, and engineers spanning other databases, commercial, other governmental agencies, international, and especially from non-space domains (even NIH and EBI!).
The OSDR/ALSDA non-omics standards we’ve developed are based on: 1) received via feedback from you the community SMEs, 2) gold standard papers, 3) scouring other existing databases to extract best practices (rather, tried; we learned the large majority of these standards are not anywhere else, BUT a few other non-NASA repos we did learn from ).
The OSDR database paper published this week covers the bare essentials of this topic, but not in any depth:
Samrawit G Gebre, Ryan T Scott, Amanda M Saravia-Butler, Danielle K Lopez, Lauren M Sanders, Sylvain V Costes, NASA open science data repository: open science for life in space, Nucleic Acids Research, 2024; gkae1116, https://doi.org/10.1093/nar/gkae1116
At the November meeting of the ALSDA AWG, we met and began the process of starting a manuscript on this topic of ‘non-omics’ data. To expand on the standards, explain how they all were developed, provide a guide for new data (and standards), as they continually are submitted/expanded, and get into challenges, lessons learned, AI-readiness, data formats, and other topics. December ALSDA AWG this will be discussed more.
This post is a call to bring those original ~125 @ALSDAawg members back around (and do have documentation/notes of who:sweat_smile:), as they are offered co-authorship. So please keep an eye on an outline and manuscript coming your way if you are interested.
As our OSDR/ALSDA non-omics standards cover both ingress of data into OSDR (collection, curation) and egress of data (data engineering standards/structures, machine-readability), we have involved a few @AIMLawg members we’re offering co-authorship to as well.
Please let myself & @rachelrgilbert lnow if you have questions
Cheers all
@anuiris @drsaswatidas @keith.siew @marie.mortreux @mbouxsein @russell.turner @anarayan09 @katherine.j.baxter @kira.rienecker @stephanie.a.puukila @egle @jwilley @pinpin @XLPalmer @botanynerd @chm2042 @Dr.Overbey @canelson @james.casaletto @lauren.sanders @wcromer @jbraun @vfajardo @szewczyk