Hi AWG members,
I’m excited to announce that we have created a (bulk) RNAseq pipeline for prokaryotic organisms, that is ready for your review. Please click on the link to review the pipeline and provide your feedback ASAP and no later than Monday, 3/10/2025.
Pipeline document detailing each step of the pipeline:
https://github.com/nasa/GeneLab_Data_Processing/blob/DEV_RNAseq_vG/RNAseq/Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md
Example outputs from raw counts through DGE from OSD-185:
GL-DPPD-7115_Prokaryotic_RNAseq_OSD-185_Example_Outputs
Differences from the eukaryotic pipeline, GL-DPPD-7101-G:
- Bowtie2 is used for alignment instead of STAR
- featureCounts is used for gene quantification instead of RSEM
- rRNA genes are removed from featureCounts results on a dataset-wide basis and rRNA removal logs are all reported in the same file
- Raw counts data are imported into R with the read.csv() function instead of tximport, and the dds object is created with the DESeqDataSetFromMatrix() function instead of the DESeqDataSetFromTximport() function
Specific questions, we would like your feedback on:
- In step 4a, the Bowtie2 default
--end-to-end
mode is currently run such that only the primary alignment is reported and if multiple alignments have the same best score, Bowtie2 chooses one at random.- Is this sufficient, or should we modify the parameters to report multiple alignments?
- In step 4c/d, is it sufficient to only publish the sorted bam file, or do you also want the unsorted bam file?
- In step 7c (counting with featureCounts):
- Should the
-G
and-J
options be added to improve read counting for exon-exon junctions? - If we modify the Bowtie2 command to return multiple alignments, answer the following questions:
- Should the
--primary
option be specified to only count the primary alignment (and ignore secondary, tertiary, etc. alignments)? - Or should multi-mapped reads be considered when counting and if so, how?
- Do not count multi-mapped reads
- Randomly select which alignment to count
- Count each alignment as 1
- Count each alignment equally as a fraction of the total number of alignments (e.g. if there are 2 alignments, each would be counted as 0.5)
- Should the
- Should the
A big thank you to @alexis.torres and @crystal.han on the Data Processing Team for leading this effort!