Fraxinus, citizen science and human computation

Citizen Science is a broad term that describes the involvement of the general public in scientific studies. We have developed a Facebook game, called Fraxinus, that allows a non-trained player to carry out multiple sequence alignments with high-throughput sequencing reads to identify genetic variants like SNPs and INDELs. 

The human capacity for problem solving is key to this. We can therefore think of the crowd as a distributed, organic computer that is programmed very differently to the familiar digital computer. 

With this project we hope to be able to characterise the ‘human computer’ for work in bioinformatics. Most importantly we want to know whether the human computer can do anything better than the digital one, and how can we reverse-engineer those processes? We are also asking questions like what can be achieved by the ‘crowd-sourced processor’? How long can such a machine run and what sort of productivity can it generate? What are the limitations in complexity of problem? How easily do the human processors understand what the task is and what is the role of training in this? 

To date we have shown convincingly that the human computer can do certain classes of alignment problem better than the digital one, INDELs in tricky regions can be identified with much higher scores by the human players.  We are now working on identifying the productivity of the machine and extending our knowledge of how the human solves the problems.   


OpenAshDieBack: A hub for crowdsourced community genomics

Ash dieback is a devastating disease of ash trees caused by the aggressive fungal pathogen Chalara fraxinea. This fungus emerged in the early 1990s in Poland and has since spread west across Europe reaching native forests in the UK late 2012.

To kick start genomic analyses of the pathogen and host, groups at TSL took the unconventional step of rapidly generating and releasing genomic sequence data through our 'hub’ at 

By doing this we aimed to foster open science and make it possible for experts around the world to access the data and analyse it immediately,  speeding up the process of discovery. We saw that by providing data as soon as possible we were able to stimulate open community engagement to tackle this devastating pathogen.

We also aim to be able to include smaller contributions from the wider group of bench and field scientists. We are currently producing new bioinformatics tools that allow the dissemination and collection of data from our disperse but engaged community. We are also examining mechanisms for incentivising contributions from scientists beyond the traditional ‘names on papers'  reward. For example, we will add Mozilla Open badges as reward and recognition for contributions.


‘Next-Gen’ Genetics and Genomics

A major focus for us is the development of new methods for making the most of modern sequence technologies, especially when working with sequence from organisms that lack a high quality or finished reference genome. 

We have developed reference-free SNP discovery platforms that can classify SNPs based on topological features in internal data structures such as the De Bruijn graph data structure. Our software is sensitive and  accurate and can generate and rank many thousands of SNPs on allele frequency, allowing distinction between homo- and heterozygous accurately directly from sequence reads. We also study graph topology to identify structures that correspond to precise SNPs and apply machine learning techniques to further classify SNPs.

Identifying causative mutations direct from sequence data in different genetic backgrounds is a major first goal of many functional phenotype studies. We have developed software and algorithms to simplify and streamline this process and make it easily applicable in practice in the non-bioinformatics lab.  We are currently engaged in extending this work to make it applicable generally to organisms with non-sequenced genomes.


Evolutionary Modelling

Understanding biological systems, from the molecular mechanisms underlying transcriptional regulation of genes to the genetic factors affecting genome size, requires the use of models. Often in biology these are implicit models based on flow diagrams of interactions elucidated in numerous laboratory experiments. As the complexity of these systems becomes more apparent then more explicit models that incorporate the knowledge into a mathematical framework can allow us to see emergent dynamics and properties that are not obvious from implicit diagrams.

We have studied the action of some important pathogen and plant molecular circuits including Boolean modelling of the Type III secretion system of Pseudomonas syringae and created the first reconstructions of possible large-scale small RNA networks in Arabidopsis revealing important insights about their structure and dynamics. A reconstruction of the transcriptional response of Arabidopsis to flg22 treatment, an important indicator of infection revealed key interactors and potential regulators that are starting to be revealed by further experimentation.

Recently, we have begun studies of genome size in filamentous plant pathogens. We can gain insight into the mechanism of pathogen genome size change and identify features of the system by applying agent based discrete models to various scenarios and running simulations. 

© Dan MacLean. All Rights Reserved.