Software Projects

Here are a few projects that have come from my and my teams work


bio-samtools

The bio-samtools package is a Ruby wrapper around libbam.so (for Linux) and libbam.1.dylib (for Mac OS X),the core shared object library from the SAMtools package. Samtools is a set of utilities that manipulates alignments of next-generation sequence reads to longer reference sequence when stored in the BAM format. SAMtools can do sorting, merging and indexing and can retrieve reads in any regions swiftly. bio-samtools hides the low-level SAMtools C API completely, by wrapping SAMtools in this way the scientist may use the high level, easily learned, Ruby language that facilitates quick development. This software is available as a BioRuby plug-in at RubyGems.org and as source code at github.

BubbleParse

Bubbleparse is a tool for identifying SNPs with expected hetero- and homozygosity between different samples and is optimised for finding Bubbleparse uses the efficient de Bruijn graph representation provided by Cortex (Iqbal et al. 2011), and implements a new algorithm for identifying and classifying bubbles. Given sufficient coverage, the method can call all possible variants, but also errors, paralogs and misassemblies. Bubbleparse gives each a type classification determined by the number of paths through the bubble and the number of colours that follow each path. Bubbleparse creates these classifications and collates quantities such as node coverage per colour path, kmer quality score, coverage ratio. Developed in my group in collaboration with Mario Caccamo s group at TGAC, Bubbleparse is available at github.

Becard

Assembly of Next Generation sequence reads into longer contiguous sequences is still sometimes more of an art than a science, many assemblies fail in certain common ways. Becard is an in-development tool that can identify regions in assemblies that correspond to common mis-assembly and allows for the rebuilding of the assembly. Implemented in Java and designed as an end-user tool, this software is available on request.

bio-gngm

bio-gngm is a BioRuby plugin that implements and extends the methods in Austin et al (2011) and in the NGM web-tool for different backgrounds and expected zygosity, providing a generalised framework for exploring Next-Generation sequence data from samples of organisms showing mutant phenotypes for potential causative mutations.

Gee Fu

Gee Fu is a Ruby on Rails based RESTful web-service application that serves genome feature data on request. It is ideally suited to serving large amounts of data such as those from high-throughput sequencing experiments. gee_fu can be used in conjunction with web-service viewports such as AnnoJ to create very fast, data-rich, attractive, RDBMS agnostic genome browsers that can be easily extended into fuller custom web-applications using the powerful Rails framework. see on github

NiBLS

NiBLS is an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. see on github

PhosCalc

PhosCalc is a system for estimating the site of phosphorylation in a phosphorylated peptide with ambiguous sites. It does this by generating ions from peptide sequences and calculating their masses, generating different ions and masses for water-loss, different cysteine modifications and different charge states. see on github