
rPhrap - A Modern Phrap Assembler
Christie Robertson 1, Gary Montry 2, and Todd M. Smith 1
1. Geospiza, Inc., 2442 NW Market St., Seattle, WA 98107 USA
2. Southwest Parallel Software, Albuquerque, NM
The Phrap assembly algorithm, originally released over ten years ago, continues to stand the test of time. Not only is Phrap the standard assembler for many sequencing groups, it also serves as the internal engine in several other assembly programs. However, no updates to Phrap have been released since 1999. We have re-engineered and updated Phrap to create rPhrap. rPhrap has been improved through enhancing the performance for the managing of larger and more difficult datasets, incorporating the ability to restrict assembly based the identity agreement of reads, and integrating mate pair constraints where pairs of reads have been sequenced from the same template and therefore have known distance and orientation with respect to each other. Mate pairs are further used by rPhrap to order and orient contigs onto scaffolds. Additionally, large datasets can be divided into groups (“read clusters”) that can be assembled separately. Furthermore, rPhrap can be run in stages, because the program creates a file containing the pairwise match tables. rPhrap can then take its own output file as input when the program is run again, so that these numbers do not have to be recalculated. Datasets with validated reference solutions were used to examine the effects of these changes on assembly results. We found that the incorporation of mate pairs successfully resolved many of the repetitive structures present in the datasets, with an average scaffold length about twice that of the average contig length.
GSAC 2004 DNA Sequencing and Analysis Conference |
|