Source codes and documents on GitHub


Boag dataset, data schema, data generation steps could be obtained here.

The current dataset includes RefSeq from NCBI

 

Integrate Boag with jupyter notebook

Boag scripts can be integrated with jupyter notebooks that uses R or Python.

Please see more details and examples from the "Shared Data Science Infrastructure for Genomics data" paper on our GitHub reposityory here: jupyter notebook

Query 1 : smallest and largest genome
Query 2 : exon, gene count, number of exons per gene after 2016
Query 3 : Popularity of the top assembly programs in Bacteria
Query 4 : assembly quality changes after 2016

 

Run Boag as a command line

Please see more details on our GitHub reposityory here: Command Line

 

Download full dataset and VirtualBox

Web interface is implemented in the VirtualBox. Google Drive Link.

 

This is the sample Boag sctipt that gives the top three most used assembler:

    
        g: Genome = input;
        counts: output top(3) of string weight int;
        data :=getAssembler(g.refseq);
        foreach(i:int;def(data.assembler[i]))
            counts << data.assembler[i].name weight 1;