Clustalw2
From Rous
Multiple sequence alignment and phylogenetic analysis allow the identification of conserved positions in protein and nucleic acid sequences. This can lead to an appreciation of the evolutionary history of a group of sequences.
- The clustal family of programs is commonly used to produce multiple sequence alignments. Other options are available as well:
- There are 3 different ways to use the clustal programs.
- Web-based clustalw can be used HERE.
- ClustalX GUI (and also the command-line clustalw executable) is available for download HERE. exampledata
- Command-line clustalw2 is installed on rous
- The major difference between the 3 options is the interface and the calculation of bootstrap values for the tree, which is only available in the command-line and GUI versions. Other details of running the program are the same. For this lesson, we will use the web-based version to align these sequences.
- Login to rous and copy the clustal training files to your home directory.
cp -r /net/n3/data/Teaching/IAP_2010_day3/clustal .
NOTE: -r means "recursive", copy the folder and everything inside it to the new location "." The lack of a trailing "/" on the clustal path ensures that the folder is moved, not just it's contents.
- enter the clustal directory:
cd clustal
- view the raw sequence files:
cat *.pep
- launch the clustalw application:
clustalw2
- The following interactive menu appears, options 1 and 2 will be used in this demonstration. Additional information is available HERE
**************************************************************
******** CLUSTAL 2.0.12 Multiple Sequence Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice:
- Select option 1 to load sequence and specify the file "tiny.pep". Before being returned to the original meny, you should see:
Sequences should all be in 1 file. 7 formats accepted: NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF, RSF. Enter the name of the sequence file : tiny.pep Sequence format is Pearson Sequences assumed to be PROTEIN Sequence 1: zf_12a1_a 28 aa Sequence 2: zf_12a1_b 28 aa Sequence 3: hs_a11 28 aa Sequence 4: hs_22a1 28 aa Sequence 5: zf_a11 28 aa
- From the original menu, now select option 2. Multiple Alignments. This activates the Alignment menu:
****** MULTIPLE ALIGNMENT MENU ******
1. Do complete multiple alignment now Slow/Accurate
2. Produce guide tree file only
3. Do alignment using old guide tree file
4. Toggle Slow/Fast pairwise alignments = SLOW
5. Pairwise alignment parameters
6. Multiple alignment parameters
7. Reset gaps before alignment? = OFF
8. Toggle screen display = ON
9. Output format options
I. Iteration = NONE
S. Execute a system command
H. HELP
or press [RETURN] to go back to main menu
- All default options are correct except option 9,select this to see the output format menu:
********* Format of Alignment Output *********
F. Toggle FASTA format output = OFF
1. Toggle CLUSTAL format output = ON
2. Toggle NBRF/PIR format output = OFF
3. Toggle GCG/MSF format output = OFF
4. Toggle PHYLIP format output = OFF
5. Toggle NEXUS format output = OFF
6. Toggle GDE format output = OFF
7. Toggle GDE output case = LOWER
8. Toggle CLUSTALW sequence numbers = OFF
9. Toggle output order = ALIGNED
0. Create alignment output file(s) now?
T. Toggle parameter output = OFF
R. Toggle sequence range numbers = OFF
H. HELP
- User toggle #4 to activate PHYLIP output, hit return to revisit the alignment menu
- Select option 1 to do the alignment. Accept the default conditions and hit return until you end up at the main menu. Exit the program with "X".
- view the contents of the alignment files with:
cat tiny.[ap][lh][ny]
NOTE: Square brackets are regular expression syntax. For example [ap] = either an "a" or a "p" at that position.
This is the clustal format output:
CLUSTAL 2.0.12 multiple sequence alignment
zf_12a1_a VADLVFLVDGSWSVGRENFRFIRSFIGA--
zf_12a1_b KADLVFLIDGSWSIGDDSFAKVRQFVFS--
hs_22a1 HYDLVFLLDTSSSVGKEDFEKVRQWVAN--
hs_a11 YMDIVIVLDGSNSIYP--WVEVQHFLINIL
zf_a11 YMDIVIVLDGSNSIYP--WNEVQDFLINIL
*:*:::* * *: : :: ::
This is the phylip format output:
5 30
zf_12a1_a VADLVFLVDG SWSVGRENFR FIRSFIGA--
zf_12a1_b KADLVFLIDG SWSIGDDSFA KVRQFVFS--
hs_22a1 HYDLVFLLDT SSSVGKEDFE KVRQWVAN--
hs_a11 YMDIVIVLDG SNSIYP--WV EVQHFLINIL
zf_a11 YMDIVIVLDG SNSIYP--WN EVQDFLINIL