
Second Generation DNA Alignment Tools
Christie Robertson1, Eric Flynn1,
Gary Montry2, Sandra Porter1,
and Todd M. Smith1
1. Geospiza, Inc., 2442 NW Market St., Seattle, WA 98107 USA
2. Southwest Parallel Software, Albuquerque, NM
The human genome project spurred the development of high throughput technologies,
especially in the area of DNA sequencing. Not only has this effort uncovered
the sequence of the human genome, it has catalyzed development of an entire
industry based on DNA sequencing and genomics. Since these technologies produce
enormous amounts of data, they depend on bioinformatics programs for data
management. Phrap, Cross_Match, RepeatMasker, and Consed have played an integral
role in
genome projects and have come to be accepted as standard tools for genomic
alignment and assembly. As sequencing technology and software have evolved,
however, so too have the scientific applications that rely on these programs.
Specific needs associated with whole genome shotgun sequencing, EST cluster
analysis, and genotyping applications highlight the importance of updating
standard bioinformatics programs to meet the requirements of a broader community.
We are re-engineering Phrap, Cross_Match and RepeatMasker to improve
their performance and utility through optimizing the core algorithms
and developing a framework to store, manipulate, and view assembled sequence
data. We are developing a structure through which specific XML-formatted
hints and constraints will be able to pass instructions to the core alignment
program, giving it information on the handling of parts of the data,
or the data set as a whole, in individualized ways. Hints regarding read
pairs, associations or non-associations between reads or contigs, sequencing
reaction conditions, highly-repetitive regions, reference sequences,
and other information will be able to be applied to direct sequence alignment,
without altering the underlying data itself. In addition, a new viewing
program is being developed to review, edit, and manipulate sequences,
giving users unprecedented control over their data.
Recomb 2003 DNA Sequencing Technologies and Computation |
|