Code and Software

Committed to Open Source, Re-usable Scientific Software

In GitHub we trust.

Scientific software is at it's best when it is well engineered, well tested, open source, freely available and allows analytic reproducibility. With all the software we put together we strive to hit all these targets, whether we’re writing scripts for short analyses or large packages. We’re not perfect yet, but we're getting there.

Naturally, all of this is available through online repositories and under non-restrictive open source licenses, our favoured place to push software, whatever state it's in is GitHub. 

We recently started using open-lab books and push even our day to day code and findings through GitHub for an increasing number of our projects. You can check out our stuff below.  


Better genome browsing through better community involvement.

Genome browsers help us view genomic data in a linear track context, they're only as useful as the data they contain. Many non-model organism or small genomics communities struggle to bring together the knowledge from the scientist at the bench and the genomicist. GeeFu integrates multiple technologies to take in atomic pieces of information about gene level objects as well as high-throughput experiments and give full credit to contributions using Mozilla Open Badges and GitHub. It is also a great and easy to set up genome browser.

GeeFu is a Ruby on Rails based RESTful web-service application that serves genome feature data on request. It is ideally suited to serving large amounts of data such as those from high-throughput sequencing experiments. GeeFu is used in conjunction with web-service viewports such as BioDalliance and WebApollo to create very fast, data-rich, attractive, RDBMS agnostic genome browsers that can be easily extended into fuller custom web-applications using the powerful Rails framework. 

GeeFu on GitHub


SAMTools wrapped in Ruby.  Downloaded 20478 times.

SAMTools is an insanely popular tool for interacting with BAM files of sequence read alignments. Ruby is a high level scripting language. bio-samtools allows easy access to BAM file data, through SAMTools functionality, in Ruby.

The bio-samtools package is a Ruby wrapper around (for Linux) and libbam.1.dylib (for Mac OS X), the core shared object library from the SAMtools package. Samtools is a set of utilities that manipulates alignments of next-generation sequence reads to longer reference sequence when stored in the BAM format. SAMtools can do sorting, merging and indexing and can retrieve reads in any regions swiftly. bio-samtools hides the low-level SAMtools C API completely, by wrapping SAMtools in this way the scientist may use the high level, easily learned, Ruby language that facilitates quick development.

install on the command line: 
gem install bio-samtools
bio-samtools on GitHub


Render publication quality genomic diagrams, straight from data.   Downloaded 8252 times.

Publication quality images can be hard to get when your analytic software generates blocky little PNG or JPG files that won’t upscale. SVG format scales wonderfully and bio-svgenes makes it easy to generate figures of genomic data in SVG format as you analyse.

install on the command line:
gem install bio-svgenes
bio-svgenes on GitHub


Citizen Science for Ash Dieback.   Accessed 51624 times.

A Facebook game that allows members of the public to join the effort to understand a disease that has killed millions of ash trees across Europe. Fraxinus helps the public help us to analyse genetic variation in ash and ash dieback.

Fraxinus on Facebook


Reference-free SNP detection.

Finding genetic variation in sequence data usually requires a reference sequence. Some of the latest software doesn't need to take this approach and Bubbleparse can help you find and classify the real variants (SNPs) in the complex output from such programs. 

Bubbleparse is a tool for identifying SNPs with expected hetero- and homozygosity between different samples and is optimised for finding Bubbleparse uses the efficient de Bruijn graph representation provided by Cortex (Iqbal et al. 2011), and implements a new algorithm for identifying and classifying bubbles. Given sufficient coverage, the method can call all possible variants, but also errors, paralogs and misassemblies. Bubbleparse gives each a type classification determined by the number of paths through the bubble and the number of colours that follow each path. Bubbleparse creates these classifications and collates quantities such as node coverage per colour path, kmer quality score, coverage ratio.

Bubbleparse on GitHub


OpenBadges for OpenScience.

Merit is a web-app built on the Java Play 2 Framework for creating and distributing Mozilla Open Badges. Badges are great for recognising achievement and  can incentivise positive behaviours or reward contribution on many levels.   

Merit on GitHub

© Dan MacLean. All Rights Reserved.