Request More Information  |  Email Support  |  Education Site  |  Resource Center


Assembly Improvements Using Sequence Metadata

Christie Robertson 1, Gary Montry 2, Mon-Chaio Lo 1, Joe Slagel, and Todd M. Smith 1
1. Geospiza, Inc., 2442 NW Market St., Seattle, WA 98107 USA
2. Southwest Parallel Software, Albuquerque, NM


Assembly programs align nucleotide sequences to each other based on similarity between the sequences. Since each assembly algorithm relies on thresholds to determine which sequences are similar enough to align and which are not, every algorithm will inevitably wrongly assemble in some cases and wrongly fail to assemble in others. An algorithm that performs well on one set of data might fail dreadfully on another. Assembly algorithms are being challenged by increasingly diverse biological questions, including EST clustering, genotyping, and comparative genomics, and by problems inherent to certain data sets, such as repetitive DNA. We are re-engineering Phrap to improve its performance and utility by optimizing the core algorithms and developing a framework to store, manipulate, and view sequence data. XML-formatted hints and constraints will provide instructions to the core alignment program regarding how parts of the data, or the data set as a whole, can be handled in individualized ways. We have re-engineered Phrap, allowing alignments to incorporate information regarding mate pairs --reads sequenced from the same template, and thereby possessing a known order and orientation with respect to each other. We are also utilizing mate pair information to create larger scaffold structures, with known gap sizes between contigs.


GSAC 2003 DNA Sequencing and Analysis Conference

Research
BioHDF
rPhrap
Publications
Abstracts
Posters
White Papers