Hi @MicrobesAWG @nicholas.brereton @katherine.j.baxter @ben.sikes @jaume.puig @asaravia ,
We are planning to process the following microbial RNAseq datasets: OSD-95, OSD-138, OSD-185, OSD-145, and OSD-554. Before we begin, I’d like to get some feedback on the reference genomes we plan to use for alignment.
The table below lists the organism and strain in OSDR for each dataset, with the proposed reference genome in the last two columns. I was able to find all of the organisms/strains listed in OSDR, except the E. coli strain used in OSD-95. Can we use the K-12 assembly (listed below) for this strain, or is there a different assembly we should use instead?
| OSD |
Organism |
Strain |
Reference genome accession |
Assembly name |
| OSD-95 |
Escherichia coli |
ATCC 4157 |
GCA_000005845 |
ASM584v2 |
| OSD-138 |
Bacillus subtilis |
subsp. subtilis 168 |
GCA_000009045 |
ASM904v1 |
| OSD-185 |
Bacillus subtilis |
subsp. subtilis 168 |
GCA_000009045 |
ASM904v1 |
| OSD-145 |
Staphylococcus aureus |
UAMS-1 |
GCA_000788115 |
ASM78811v1 |
| OSD-554 |
Pseudomonas aeruginosa |
UCBPP-PA14 |
GCA_000014625 |
ASM1462v1 |
thanks,
Barbara
2 Likes
For the E. coli, it looks like there is a genome available for that specific strain here: ATCC® 4157™ | Escherichia coli | ATCC Genome Portal
There are a lot of differences between it and the MG1655 strain, as ATCC notes 95.73% similarity.
Buck
2 Likes
Our pipeline can handle assemblies in either NCBI or Ensembl. I think we’d have issues with annotation if we tried to use the ATCC sequences.
I also noticed that the S. aureus UAMS-1 strain assembly is only at the scaffold level. NCBI reports that it is substantially similar to the ATCC 12600 strain ( Staphylococcus aureus genome assembly ASM609491v1 - NCBI - NLM ) which has a complete genome assembly.
It looks like DSMZ has the strain too: https://bacdive.dsmz.de/strain/4409
Deposited here: Escherichia coli genome assembly 39282_E01-4 - NCBI - NLM
ATCC; ATCC 4157 ← NCTC; NCTC 86 ← Lister Institute;
1 Like
Thanks! I’ll see if we can use that version.
DSMZ also has a good compilation of the Staph one too: https://bacdive.dsmz.de/strain/14487
All of the ones that are listed as ‘complete’ do come up as 2 contigs (even the one that you link), which probably means that there is some region of the genome with highly repetitive or other structure issue that complicates full assembly.
Your link is a single cell nanopore long read sequencing, which gives the same coverage/assembly as another sequenced by PacBio: Staphylococcus aureus genome assembly ASM102710v1 - NCBI - NLM
I think that either would be fine.
2 Likes
Thanks super much Buck! @microbeMinded 
1 Like
I think Buck solved the E.coli genome and you seem all good for the others. Thanks so much for processing these!
Nick
1 Like
THANK YOU for pointing them out as low-hanging fruit to be processed so analysis can be done
Spread the word @AWGall if any other AWG members know of other OSD datasets which you WISH had data processing so AWG analysis could be done, just let @rtscott2001 or @asaravia or the general OSDR/Genelab team know 
1 Like