Boa_g: Boa for Genomics

Boa is a domain specific language initially designed for Mining Software Repository. Boa_g is a domain specific language designed for analyzing biological data.

Source code and Documentation

GitHub Repository

Background

Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boa_g is needed to efficiently process and parse data contained in large data repositories. The main features of Boa_g are inspired from existing languages for data intensive computing and can easily integrate data from biological data repositories.

Boa_g Results and Dataset

Boa for genomics, Boa_g, has been implemented to analyze RefSeq’s 153,848 annotation (GFF) and assembly (FASTA) file metadata. Boa_g provides a massive improvement from existing solutions like Python and MongoDB, by utilizing a domain-specific language that uses Hadoop infrastructure for a smaller storage footprint that scales well and requires fewer lines of code.
Boa_g databases provide a significant reduction in required storage of the raw data and a significant speed up in its ability to query large datasets due to automated parallelization and distribution of Hadoop infrastructure during computations.

Boa_g illustration

Examples here on this website are few illustrations of query results that were obtained by Boa_g. A web-based interface is also provided to write and submit more complex queries to our infrastructure. Please see our GitHub Repository for more information.

Tree of Life provides summary statistics for each node in the tree of life.

Summary Statistics display summary statistics of phylogenetic trees based on NCBI taxonomy.

Assembly programs example provides some insight regarding the programs that have been used for genome assembly as well as some statistics like Contigs, Scaffolds, etc.

Boag: Boa for Genomics

Source code and Documentation

Background

Boag Results and Dataset

Boag illustration

Boa_g: Boa for Genomics

Boa_g Results and Dataset

Boa_g illustration