I am Rutuja Gurav and I was part of the FDL team that used RadLab data to forecast radiation exposure using ML. I have been in communication with @kirill who invited me to join this WG and he mentioned that moving forward the RadLab AWG wants to focus on applications alongside data acquisition efforts.
To that effect, based on my experience of using RadLab data for ML, @kirill and I discussed a couple of considerations for the next version of RadLab.
Some application-agnostic data cleaning.
Including a “Segments” table in the DB providing info about data availability and gaps on a per-instrument basis.
I have described this in some detail in these slides. Please reach out to me if this topic is of interest.
Hi @rutujagurav, happy to see you here! Thanks for a very fruitful discussion that we had. I have the segments table on my todo list now (it’s gonna benefit both the users and us internally, actually). As for application-agnostic data cleaning, this can undoubtedly be a value add. Data PIs already perform a level of cleaning before the data gets into the database (so – and that’s just semantics – actually what you’re referring to as “level 0” is, technically, already level 2). But preprocessing specifically for ML applications and having a common target spec for that is important – please keep me in these discussions and we welcome all suggestions!