Hi @MicrobesAWG,
As we process the 5 prioritized microbial RNASeq datasets (OSD-95, 138, 145, 185, 554) identified for prioritization, we’ve noticed that the alignment rate for some samples is quite poor. In the eukaryotic RNASeq pipeline, we apply a >=60% RSEM unalignable filter. Any samples that don’t pass the filter remain in the raw counts table, but are excluded from the count normalization and different expression analysis.We are curious if we should be applying a similar cutoff to microbial RNAseq data.
The closest equivalent to the eukaryotic pipeline threshold in the microbial pipeline appears to be Bowtie 2 “overall_alignment_rate” <= 40%. At this cutoff, only OSD-95 and OSD-554 would be affected. However, it’s not clear to us if that threshold is appropriate for microbial data since it is based on eukaryotic data.
Table of overall alignment rates for each dataset:
| Dataset | N | N with 1 - overall_alignment_rate >=50 | N with >=60 | N with >=70 | N with >=80 | N with >=90 | Min overall_alignment_rate | Median overall_alignment_rate | Mean overall_alignment_rate |
|---|---|---|---|---|---|---|---|---|---|
| OSD-95 | 21 | 7 | 6 | 4 | 4 | 3 | 2.42 | 70.43 | 60.08 |
| OSD-138 | 18 | 0 | 0 | 0 | 0 | 0 | 99.18 | 99.30 | 99.31 |
| OSD-145 | 26 | 0 | 0 | 0 | 0 | 0 | 99.16 | 99.31 | 99.33 |
| OSD-185 | 6 | 0 | 0 | 0 | 0 | 0 | 56.75 | 99.44 | 89.87 |
| OSD-554 | 66 | 46 | 38 | 26 | 13 | 7 | 2.73 | 36.25 | 38.70 |
For those of you with experience in differential gene expression in microbial RNASeq, are there thresholds typically applied for filtering samples based on alignment rate?
@nicholas.brereton @ben.sikes @daniela.bezdan @jaume.puig — iirc these were the datasets you asked for data processing ![]()