Here are a few projects that have come from my and my teams work
The bio-samtools package is a Ruby wrapper around libbam.so (for Linux) and libbam.1.dylib (for Mac OS X),the core shared object library from the SAMtools package. Samtools is a set of utilities that manipulates alignments of next-generation sequence reads to longer reference sequence when stored in the BAM format. SAMtools can do sorting, merging and indexing and can retrieve reads in any regions swiftly. bio-samtools hides the low-level SAMtools C API completely, by wrapping SAMtools in this way the scientist may use the high level, easily learned, Ruby language that facilitates quick development. This software is available as a BioRuby plug-in at RubyGems.org and as source code at github.
Assembly of Next Generation sequence reads into longer contiguous sequences is still sometimes more of an art than a science, many assemblies fail in certain common ways. Becard is an in-development tool that can identify regions in assemblies that correspond to common mis-assembly and allows for the rebuilding of the assembly. Implemented in Java and designed as an end-user tool, this software is available on request.
bio-gngm is a BioRuby plugin that implements and extends the methods in Austin et al (2011) and in the NGM web-tool for different backgrounds and expected zygosity, providing a generalised framework for exploring Next-Generation sequence data from samples of organisms showing mutant phenotypes for potential causative mutations.
Gee Fu is a Ruby on Rails based RESTful web-service application that serves genome feature data on request. It is ideally suited to serving large amounts of data such as those from high-throughput sequencing experiments. gee_fu can be used in conjunction with web-service viewports such as AnnoJ to create very fast, data-rich, attractive, RDBMS agnostic genome browsers that can be easily extended into fuller custom web-applications using the powerful Rails framework. see on github
NiBLS is an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. see on github
PhosCalc is a system for estimating the site of phosphorylation in a phosphorylated peptide with ambiguous sites. It does this by generating ions from peptide sequences and calculating their masses, generating different ions and masses for water-loss, different cysteine modifications and different charge states. see on github