Input on Microbe Annotations for Microarrays Requested

Hi AWG members @AnimalAWG @MultiOmicsAWG,

We have been processing microarray datasets and came across some bacteria datasets that do not have BioMart support. Normally, we would use BioMart to help us map the probeset-level output data to gene-level annotations. But since this information is not available for bacteria data, we can either:

(1) Omit annotations entirely in the outputs for bacteria datasets
(2) Use annotation data from the manufacturer

For the latter, note that the annotation data may be quite old. For example, Thermo Fisher provides the following annotation files, both created in 2016 (login required for below links):

We would like your feedback on whether or not these annotations from the manufacturer would still be useful to include in the output data. Alternatively, we are open to any other sources of annotations that you may know of.

Thank you!

2 Likes

Generally I think we’d want to provide some annotation even if it is outdated. Thus, unless anyone comes up with a good open source, the manufacturer annotation might be the best you can do.

2 Likes

Thank you for your feedback! We will make sure to include the annotations in that case. Our current outputs include the following annotation fields:

β€˜ENSEMBL’, β€˜SYMBOL’, β€˜GENENAME’, β€˜REFSEQ’, β€˜ENTREZID’, β€˜STRING_id’, β€˜GOSLIM_IDS’

Some of these fields might not be available in the manufacturer annotations, but we will include what is available.

Are there any other fields besides the ones listed above that you would like to see in the output? Here are all the fields provided by ThermoFisher in the bacteria species linked above (note: some fields might not actually be populated):

Probe Set ID GeneChip Array Species Scientific Name Annotation Date Sequence Type Sequence Source Transcript ID(Array Design) Target Description Representative Public ID Archival UniGene Cluster UniGene ID Genome Version Alignments Gene Title Gene Symbol Chromosomal Location Unigene Cluster Type Ensembl Entrez Gene SwissProt EC OMIM RefSeq Protein ID RefSeq Transcript ID FlyBase AGI WormBase MGI Name RGD Name SGD accession number Gene Ontology Biological Process Gene Ontology Cellular Component Gene Ontology Molecular Function Pathway InterPro Trans Membrane QTL Annotation Description Annotation Transcript Cluster Transcript Assignments Annotation Notes
AFFX-Athal_actin_at P. aeruginosa Genome Array Pseudomonas aeruginosa 30-Mar-16 Control sequence GenBank AFFX-Athal_actin U37281 Arabidopsis thaliana actin-2 mRNA U37281 β€” At.23605 β€” β€” actin 7 ACT7 β€” β€” β€” 830841 P53492 β€” β€” NP_196543 NM_121018 β€” β€” β€” β€” β€” β€” 0006007 // glucose catabolic process // inferred from reviewed computational analysis /// 0006094 // gluconeogenesis // inferred from reviewed computational analysis /// 0007010 // cytoskeleton organization // inferred from reviewed computational analysis /// 0007010 // cytoskeleton organization // traceable author statement /// 0009416 // response to light stimulus // inferred from expression pattern /// 0009611 // response to wounding // inferred from expression pattern /// 0009733 // response to auxin // inferred from expression pattern /// 0009845 // seed germination // inferred from mutant phenotype /// 0010053 // root epidermal cell differentiation // inferred from mutant phenotype /// 0010498 // proteasomal protein catabolic process // inferred from reviewed computational analysis /// 0032880 // regulation of protein localization // inferred from reviewed computational analysis /// 0048364 // root development // inferred from mutant phenotype /// 0048767 // root hair elongation // inferred from mutant phenotype /// 0048767 // root hair elongation // inferred from reviewed computational analysis /// 0051301 // cell division // inferred from mutant phenotype 0005618 // cell wall // inferred from direct assay /// 0005730 // nucleolus // inferred from direct assay /// 0005737 // cytoplasm // not recorded /// 0005739 // mitochondrion // inferred from direct assay /// 0005829 // cytosol // inferred from direct assay /// 0005856 // cytoskeleton // inferred from sequence or structural similarity /// 0005886 // plasma membrane // inferred from direct assay /// 0009506 // plasmodesma // inferred from direct assay /// 0009570 // chloroplast stroma // inferred from direct assay /// 0009941 // chloroplast envelope // inferred from direct assay 0000166 // nucleotide binding // inferred from electronic annotation /// 0005200 // structural constituent of cytoskeleton // inferred from sequence or structural similarity /// 0005515 // protein binding // inferred from physical interaction /// 0005524 // ATP binding // inferred from electronic annotation β€” β€” β€” β€” This probe set was annotated using the Design Representative Id based pipeline to a Entrez Gene identifier using 1 transcripts. // false // Design Representative Id // R U37281 AFFX-Athal_actin // β€” // unknown // β€” // β€” /// U37281 // U37281 Arabidopsis thaliana actin-2 mRNA // gb // β€” // β€” β€”
AFFX-Athal_GAPDH_at P. aeruginosa Genome Array Pseudomonas aeruginosa 30-Mar-16 Control sequence GenBank AFFX-Athal_GAPDH M64116 Arabidopsis thaliana glyceraldehyde 3-phosphate dehydrogenase C subunit (GapC) gene M64116 β€” At.22963 β€” β€” glyceraldehyde-3-phosphate dehydrogenase C subunit 1 GAPC1 β€” β€” β€” 819567 P25858 /// Q41949 /// Q56WW5 β€” β€” NP_187062 NM_111283 β€” β€” β€” β€” β€” β€” 0006006 // glucose metabolic process // inferred from electronic annotation /// 0006007 // glucose catabolic process // inferred from reviewed computational analysis /// 0006094 // gluconeogenesis // inferred from reviewed computational analysis /// 0006094 // gluconeogenesis // traceable author statement /// 0006096 // glycolytic process // inferred from direct assay /// 0006096 // glycolytic process // inferred from electronic annotation /// 0006096 // glycolytic process // inferred from sequence or structural similarity /// 0006096 // glycolytic process // inferred from reviewed computational analysis /// 0006096 // glycolytic process // traceable author statement /// 0006098 // pentose-phosphate shunt // inferred from reviewed computational analysis /// 0006833 // water transport // inferred from reviewed computational analysis /// 0006972 // hyperosmotic response // inferred from reviewed computational analysis /// 0006979 // response to oxidative stress // inferred from direct assay /// 0006979 // response to oxidative stress // inferred from expression pattern /// 0007010 // cytoskeleton organization // inferred from reviewed computational analysis /// 0007030 // Golgi organization // inferred from reviewed computational analysis /// 0009060 // aerobic respiration // inferred from reviewed computational analysis /// 0009266 // response to temperature stimulus // inferred from reviewed computational analysis /// 0009408 // response to heat // inferred from expression pattern /// 0009651 // response to salt stress // inferred from expression pattern /// 0009651 // response to salt stress // inferred from reviewed computational analysis /// 0009744 // response to sucrose // inferred from expression pattern /// 0010154 // fruit development // inferred from mutant phenotype /// 0010498 // proteasomal protein catabolic process // inferred from reviewed computational analysis /// 0034976 // response to endoplasmic reticulum stress // inferred from reviewed computational analysis /// 0042542 // response to hydrogen peroxide // inferred from direct assay /// 0046686 // response to cadmium ion // inferred from expression pattern /// 0046686 // response to cadmium ion // inferred from reviewed computational analysis /// 0048316 // seed development // inferred from mutant phenotype /// 0051775 // response to redox state // inferred from direct assay /// 0055114 // oxidation-reduction process // inferred from electronic annotation 0005634 // nucleus // inferred from direct assay /// 0005737 // cytoplasm // not recorded /// 0005739 // mitochondrion // inferred from direct assay /// 0005740 // mitochondrial envelope // inferred from direct assay /// 0005774 // vacuolar membrane // inferred from direct assay /// 0005794 // Golgi apparatus // inferred from reviewed computational analysis /// 0005829 // cytosol // inferred from direct assay /// 0005829 // cytosol // traceable author statement /// 0005886 // plasma membrane // inferred from direct assay /// 0009507 // chloroplast // inferred from direct assay /// 0016020 // membrane // inferred from direct assay /// 0048046 // apoplast // inferred from direct assay 0003677 // DNA binding // inferred from electronic annotation /// 0004365 // glyceraldehyde-3-phosphate dehydrogenase (NAD+) (phosphorylating) activity // traceable author statement /// 0005507 // copper ion binding // inferred from direct assay /// 0008886 // glyceraldehyde-3-phosphate dehydrogenase (NADP+) (non-phosphorylating) activity // inferred from direct assay /// 0016491 // oxidoreductase activity // inferred from electronic annotation /// 0016620 // oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor // inferred from electronic annotation /// 0050661 // NADP binding // inferred from electronic annotation /// 0051287 // NAD binding // inferred from electronic annotation β€” β€” β€” β€” This probe set was annotated using the Design Representative Id based pipeline to a UniGene identifier using 1 transcripts. // false // Design Representative Id // R M64116 AFFX-Athal_GAPDH // β€” // unknown // β€” // β€” /// M64116 // M64116 Arabidopsis thaliana glyceraldehyde 3-phosphate dehydrogenase C subunit (GapC) gene // gb // β€” // β€” β€”

We would appreciate any feedback on this topic by Friday, August 9th. Thank you all!

1 Like