New Metadata & Data Standards Paper - 3.5 years, 125 AWG members, 62 assays

In the past 3.5 years (but especially from 2021-2023), I counted ~125 people in the @ALSDAawg who provided expertise and feedback to the ~63 assay metadata and data standards for the phenotypic, physiological, imaging, biomedical, and behavioral data types. I’m talking about members who debated standards, put in effort to give us real guidance, told me our v1 drafts were wrong, or were missing values, or the members who provided nomenclature, checked out other databases to learn from, etc.

First tab of this Public Config list shows them all (in column B). The 2nd tab shows all the omics ones :hugs:

OSDR as an integrated repository was released in early 2023, and we began releasing curated, reusable, machine-readable datasets for “non-omics” data (phenotypic, physiological, imaging, biomedical, and behavioral). As spaceflight science datasets are deposited to ALSDA/OSDR, we then create the standards.

Look at any of the ~50 ALSDA datasets in OSDR, and you’ll see the assay metadata and data standards being used :hugs: The assays tables use the metadata configs, and all tabular files use the dependent variable data standards via nomenclature, units, and results.

Since that time of the first dataset release in 2023 (and the few publications now mining non-omics data) we have received lots and lots of inquiries about why our system works so well and people wanting more details. These queries come from scientists, physicians, programmers, and engineers spanning other databases, commercial, other governmental agencies, international, and especially from non-space domains (even NIH and EBI!).

The OSDR/ALSDA non-omics standards we’ve developed are based on: 1) received via feedback from you the community SMEs, 2) gold standard papers, 3) scouring other existing databases to extract best practices (rather, tried; we learned the large majority of these standards are not anywhere else, BUT a few other non-NASA repos we did learn from :hugs:).

The OSDR database paper published this week covers the bare essentials of this topic, but not in any depth:

Samrawit G Gebre, Ryan T Scott, Amanda M Saravia-Butler, Danielle K Lopez, Lauren M Sanders, Sylvain V Costes, NASA open science data repository: open science for life in space, Nucleic Acids Research, 2024; gkae1116, https://doi.org/10.1093/nar/gkae1116

At the November meeting of the ALSDA AWG, we met and began the process of starting a manuscript on this topic of ‘non-omics’ data. To expand on the standards, explain how they all were developed, provide a guide for new data (and standards), as they continually are submitted/expanded, and get into challenges, lessons learned, AI-readiness, data formats, and other topics. December ALSDA AWG this will be discussed more.

This post is a call to bring those original ~125 @ALSDAawg members back around :hugs: (and do have documentation/notes of who​:sweat_smile:), as they are offered co-authorship. So please keep an eye on an outline and manuscript coming your way if you are interested.

As our OSDR/ALSDA non-omics standards cover both ingress of data into OSDR (collection, curation) and egress of data (data engineering standards/structures, machine-readability), we have involved a few @AIMLawg members we’re offering co-authorship to as well.

Please let myself & @rachelrgilbert lnow if you have questions

Cheers all

@anuiris @drsaswatidas @keith.siew @marie.mortreux @mbouxsein @russell.turner @anarayan09 @katherine.j.baxter @kira.rienecker @stephanie.a.puukila @egle @jwilley @pinpin @XLPalmer @botanynerd @chm2042 @Dr.Overbey @canelson @james.casaletto @lauren.sanders @wcromer @jbraun @vfajardo @szewczyk

9 Likes

This is going to be a BIG achievement under Open Science Initiatives! Looking forward to the manuscript.
Thank you for the acknowledgement!
Cheers!:tada:

2 Likes

Thank you very much for all the details you provided in this message.
Looking forward to ALSDA AWG meeting in December!

2 Likes

So excited to see this work mature! Ryan, big kudos.

2 Likes

Thank YOU Kira! Your contributions were essential :100: Plus the standards have become even more refined over time and feedback from neuro-behavrioalists. Cheers and look forward to your inputs for the upcoming manuscript :slight_smile: :confetti_ball:

2 Likes

Great work, Ryan. We can add some more parameters for study of oxidative stress. Spectrophotometric assays of antioxidant enzymes/molecules like SOD (superoxide dismutase), Catalase, reduced glutathione etc. are very popular methods used by researchers. Moreover, as mentioned in the spreadsheet Western blotting technique is meant for protein detection (qualitative analysis) rather than quantitation. Thanks.

3 Likes

Thanks @kajari.das63 :slightly_smiling_face::tada:

As spaceflight science datasets are deposited to ALSDA/OSDR, we then create the standards.

Look at any of the ~50 ALSDA datasets in OSDR, and you’ll see the assay metadata and data standards being used :hugs:

The assays tables use the metadata configs, and all tabular files use the dependent variable data standards via nomenclature and results

2 Likes

Hi Ryan,

Incredible spreadsheet!

Might be a little off-topic, but what is the best way for me to submit feedback on the Nanopore assay metadata collection sheets for ALSDA?

Thanks!

Sincerely,
Theo

1 Like

Nanopore is not on the first tab of that list, since it is an omics assay. This list is for non-genomic/sequencing technologies.

If you attend the ALSDA AWG mtg on Dec 17, I’ll make sure to explain the difference also between the “measurement” (gene expression), the “technology” (bulk RnaSeq), and the name of the device itself (eg Nanopore).

3 Likes

Sounds like a plan. I added it to my calendar - I usually have a meeting that runs over at that time but I can definitely make the latter half. Talk then! :slight_smile:

1 Like