Microbial RNASeq alignment filter

barbara.novak · May 21, 2026, 6:54pm

As we process the 5 prioritized microbial RNASeq datasets (OSD-95, 138, 145, 185, 554) identified for prioritization, we’ve noticed that the alignment rate for some samples is quite poor. In the eukaryotic RNASeq pipeline, we apply a >=60% RSEM unalignable filter. Any samples that don’t pass the filter remain in the raw counts table, but are excluded from the count normalization and different expression analysis.We are curious if we should be applying a similar cutoff to microbial RNAseq data.

The closest equivalent to the eukaryotic pipeline threshold in the microbial pipeline appears to be Bowtie 2 “overall_alignment_rate” <= 40%. At this cutoff, only OSD-95 and OSD-554 would be affected. However, it’s not clear to us if that threshold is appropriate for microbial data since it is based on eukaryotic data.

Table of overall alignment rates for each dataset:

Dataset	N	N with 1 - overall_alignment_rate >=50	N with >=60	N with >=70	N with >=80	N with >=90	Min overall_alignment_rate	Median overall_alignment_rate	Mean overall_alignment_rate
OSD-95	21	7	6	4	4	3	2.42	70.43	60.08
OSD-138	18	0	0	0	0	0	99.18	99.30	99.31
OSD-145	26	0	0	0	0	0	99.16	99.31	99.33
OSD-185	6	0	0	0	0	0	56.75	99.44	89.87
OSD-554	66	46	38	26	13	7	2.73	36.25	38.70

For those of you with experience in differential gene expression in microbial RNASeq, are there thresholds typically applied for filtering samples based on alignment rate?

@nicholas.brereton @ben.sikes @daniela.bezdan @jaume.puig — iirc these were the datasets you asked for data processing

nicholas.brereton · May 22, 2026, 11:52am

Hi Barbara,

I would investigate this instead of using a cut off (you could then maybe add some extra QC steps in future). Low mapping could mean contamination, either rRNA, human, reagent or other bacteria in the culture, but it could also mean the ref is further away than we hoped.

To investigate, maybe try to annotate the unmapped reads (kraken2 or something else) as well as check for low complexity/rRNA. Assembly of the unmapped reads would be useful, and just blast the most abundant contigs.

I’m not really sure what mapping rate to expect exactly but it should be high (>80% or higher really) - no @emmanuel.gonzalez?

Hope that’s helpful,
Nick

barbara.novak · June 1, 2026, 6:09pm

That is very helpful. I think we’ll leave filtering off for now and plan to investigate this as a possible pipeline update after we process more datasets and see how they behave.

Topic		Replies	Views
Prokaryotic (bulk) RNAseq pipeline - AWG review needed OSDR Feedback omics , rna-seq , new-pipeline , microbes	14	293	March 10, 2025
Updated RNAseq pipeline - AWG review needed OSDR Feedback omics , rna-seq , pipeline-update	1	142	February 12, 2025
RNA Metrics Feedback Requested OSDR Feedback rna-seq , metrics , qc	0	83	October 23, 2024
Double-checking reference genome for ISS microbial isolates Microbial AWG Topics data-processing , microbial-isolates	9	202	May 7, 2026
New Low Biomass Metagenomics Pipelines - AWG review needed PP AWG Topics new-pipeline , planetary-protection , low-biomass	7	183	March 2, 2026

Microbial RNASeq alignment filter

Related topics