ANCOMBC2 pairwise comparison: How does it decide which features to keep?

olabiyi · January 23, 2025, 11:53pm

@MicrobesAWG Does anyone have experience with ANCOMBC 2 and how it decides which features (taxon/ASVs) to keep in the differential abundance output when running pairwise comparisons?

Thanks!

rtscott2001 · January 24, 2025, 3:02am

@olabiyi you may also want to ask this at the next Microbes AWG mtg? It’s usually first Wednesday of the month, 10-11a PT

@jaume.puig @daniela.bezdan - thoughts on agenda for Feb 5th mtg, maybe open to @olabiyi asking there?

Others, ideas, thoughts? @nicholas.brereton @katherine.j.baxter @vrvinothan @ben.sikes @joel.babdor @lauraelf @anna.simpson @philipjsweet @Zerrin @sudip.sharma.temple @mkagd001

nicholas.brereton · January 24, 2025, 10:37am

Hi @olabiyi,

The DA output would be after the pairwise comparison, so one would retain everything for interpretation. You can then focus on specific dynamics based on that interpretation/biology.

For treatment prior to DA analysis, this would depend on the data and question (what’s being compared and why) but below are some general steps to consider for filtering prior to pairwise comparisons which we use to improve meaningful biological insight for amplicon, metaT, WMS and untargeted metabolomics. Generally, we want to maintain data integrity and change as little as possible, but specifically prior to pw test:

minimum occurrence threshold: an ESV being present in enough samples to represent the condition. This is highly question specific but one wants counts in at least >2 samples (it’s normally sensible to set this higher to be meaningful, based on biological reps).
Sparsity filter: remove ESVs with >90% of counts in one sample (one can assess sparsity and raise this depending on the question), this can impact FDR a lot. There is sparsity correction but it won’t do well with extreme sparsity, I imagine.
Structural zeros: I’m told ANCOMBC2 now has an explicit step for structural zeros, but I’d just remove these and put them in a significantly DA bucket. It doesn’t make sense to “test” something that’s presence vs absence based on any abundance criteria (or at all really).

Otherwise, I’d make sure to keep the highest resolution of ASVs as possible (not collapsing anything based on annotation until interpretation) and not remove anything based on annotation on a first pass. ie. our ancient cyanobacteria or a-proteobacteria friends (mitochondria or chloroplast ASVs, depending on environment), given this can confound and can be done downstream with eyes open. I’d also check how the correction is performing if you have very unbalanced library sizes.

For a real expert, you could ask @emmanuel.gonzalez

Hope that helps a bit!

olabiyi · January 24, 2025, 3:13pm

Thanks @rtscott2001, I’ll ask the question at the meeting on the 5th of Feb. Hopefully I can find an answer before then.

olabiyi · January 24, 2025, 3:24pm

@nicholas.brereton thanks for the wonderful recommendations. However, I was talking specifically about how ANCOMBC2 does it’s filtering before pairwise comparisons. I am asking because we are updating GeneLab’s amplicon Illumina workflow to run using nextflow and implement differential abundance testing which is currently missing. So ANCOMBC2 does pairwise comparisons between groups if you ask it to but even when you do set the library cutoff and prevalence cutoff parameters to zero (i.e. do not drop any sample or ASV), it still drops some ASVs in the final results generated. So how does it make this decision? My observation is that even if one group in a pairwise comparison (A vs B) has a count of zero (i.e. either A or B has a count of zero) then it drops that ASV entirely from its pairwise comparison.

I am looking to confirm my observation or gain new insights as to how ANCOMBC decides which ASVs to drop,

Thanks.

nicholas.brereton · January 24, 2025, 4:30pm

If I get your question - that example would be a structural zero. They mention the step explicitly in the bioconductor chat, I think.

olabiyi · January 24, 2025, 5:14pm

Hi @nicholas.brereton, that must be It. My observation was right but I just didn’t use the right terminology. It basically didn’t analyze (i.e. dropped) the ‘structural zeros’ as stated in the bioconductor documentation. I just confirmed this by checking the zero_ind dataframe of the output object generated by ANCOMBC2. If even one group has a structural zero for that ASV it wasn’t further analyzed.

Thanks for your help.

nicholas.brereton · January 24, 2025, 5:27pm

Great! Just for clarity, these are the MOST significant parts of the data. So, while they should be removed from the test, they should always be reintegrated in the DA result. Ie. They should just automatically assigned as significant. If ancombc2 doesn’t do this

It’d be cool to see some genelab comparisons of real data against deseq2 at some point.

Topic		Replies	Views
Updated RNAseq pipeline - AWG review needed OSDR Feedback omics , rna-seq , pipeline-update	1	111	February 12, 2025
Feedback Requested for GeneLab MethylSeq Pipeline Updates AWG Discussions	0	98	July 24, 2024
Prokaryotic (bulk) RNAseq pipeline - AWG review needed OSDR Feedback omics , rna-seq , new-pipeline , microbes	14	182	March 10, 2025
Plant-Microbial AWG meeting Plant AWG Open Projects plantawg , microbesawg , plantmicrobiome	3	109	April 14, 2025
Mining Microbial Data from Existing Plant Data Microbes AWG Open Projects data-mining , plant , microbial	16	408	April 21, 2025

ANCOMBC2 pairwise comparison: How does it decide which features to keep?

Related topics