Home


Command-line options

Input

Output

Extras

rPhrap Input Documentation 0

rPhrap version 4.23 and higher


Mate pair library (.cl) files
Mate pair library names
The .newcl file
Read cluster files (.rc)
Multi-file sequence input
Excluded reads (.excl) files
Restart files (incremental assembly)


__________________________________
Mate pair library (.cl) files
__________________________________

In order to utilize mate pairs in assembly rPhrap needs to know how mate-paired reads are designated and how far apart from each other mate-paired reads are expected to fall. rPhrap get this information from the .cl file. This file describes the mate-pair families and their expected lengths for the assembly set. If the input multi-sequence FASTA file is called foo.screen, rPhrap will look for an input file called foo.screen.cl.

Note: If this file is not present, and the user does not specify another such file (through the -CLF command-line option), then rPhrap will not have any information about clone sizes, and so it cannot run using mate pair constraints.

Each line in the .cl file that does not begin with a pound ( # ) sign is expected to contain information about a template, or subclone, family. There should be four entries for each family. A typical .cl file might look like this:

    #family name     min clone len   avg clone len   max clone len
    AESE                      4000                6000              9000
    IEBFD                    1000                3000               5000
    EE                          1000 3000 500
    # this is another comment

All entries need to be separated by one or more spaces, with no commas.

For more information on how libraries are defined by the familiy name, see the library names documentation, below.


__________________________________
Mate pair library names
__________________________________

rPhrap has a default method for determining template library names. For each read ID, rPhrap uses the following procedure:

1) Every entry in the subclone library list obtained from the ".cl" file is examined. If the first characters in the read ID exactly match any of the subclone library entries, the read is assigned to that subclone family. For example, if the file in the previous ( "Mate pair library (.cl) files" ) section was used to initialize the subclone library list, then any reads beginning with "AESE" will be assigned to the AESE library.

Note: If you have two library names that begin with the same characters, but one name is longer than the other, they can still be specified as separate subclone libraries, in the following manner. Suppose you have one library of reads whose names begin with the letters AESEA, and a second library of reads whose names begin with AES. Make both entries in the .cl file, but put the longer name above the shorter name in the file. This works because rPhrap searches the subclone lists in order. The AES reads will not match the AESEA library name if searched first, but the AESEA reads would match the AES library name if that library name appeared in the ".cl" file before the AESEA library name. Always double-check the subclone library list in the output file to make sure rPhrap is assigning the reads as you intended.

2) If rPhrap fails to find a suitable match in the subclone library list, it begins parsing the read ID to build its own library name. First, the name is truncated at the "." delimiter. Then, the truncated name is scanned for an underscore (_). If an underscore is found, the name is truncated before the underscore, and the remaining name is used as the new subclone family name. For example, the read ID "AES179_A12.x2" would generate a new family name AES179.

If there is no underscore present in the read name, rPhrap works backward from the "." delimiter until it encounters a non-digit character. In this case, the name is truncated after this non-digit character, and the remaining ID name becomse a new subclone library. Therefore, a read ID like AES179.x2 would create a new subclone library AES.




Home  |  Back  |  Next
     

Copyright © 2005 Southwest Parallel Software, Inc. and Geospiza, Inc.
All Rights Reserved.