DiTaxa is an open-source bioinformatics software tool for analysing 16S rRNA sequence data, developed at the Helmholtz Centre for Infection Research (HZI). The tool is designed for reference- and alignment-free identification of microbial biomarkers and the prediction of host phenotypes in microbiome-related studies.
Instead of traditional OTU clustering methods, DiTaxa decomposes sequence data into frequently occurring variable subsequences using a Nucleotide-Pair Encoding (NPE) approach. These represent characteristic patterns in microbial communities and can be used for machine learning in disease and biomarker analysis.
Key applications include:
Biomarker discovery in microbiome data (e.g. in inflammation or chronic diseases)
Classification of disease phenotypes based on 16S data
Efficient, scalable processing of large sequence datasets without reference databases
DiTaxa is used in research to improve the resolution and accuracy of microbial diagnostics and is available as an open-source project via GitHub.
To offer you an optimal experience, we use technologies such as cookies. This site does not use cookies to store personal data, but only technically necessary cookies such as the session cookie. To the privacy policy