6
6
- Wiki Documentation
7
7
---
8
8
9
- This page describes Bio.AlignIO, a new multiple sequence Alignment
9
+ This page describes ` Bio.AlignIO ` , a new multiple sequence Alignment
10
10
Input/Output interface for BioPython 1.46 and later.
11
11
12
12
In addition to the built in API documentation, there is a whole chapter
13
13
in the [ Tutorial] ( http://biopython.org/DIST/docs/tutorial/Tutorial.html )
14
14
on Bio.AlignIO, and although there is some overlap it is well worth
15
- reading in addition to this WIKI page. There is also the [ API
15
+ reading in addition to this page. There is also the [ API
16
16
documentation] ( http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html )
17
- (which you can read online, or from within Python with the help
17
+ (which you can read online, or from within Python with the ` help() `
18
18
command).
19
19
20
20
Aims
23
23
You may already be familiar with the [ Bio.SeqIO] ( SeqIO " wikilink ")
24
24
module which deals with files containing one or more sequences
25
25
represented as [ SeqRecord] ( SeqRecord " wikilink ") objects. The purpose of
26
- the SeqIO module is to provide a simple uniform interface to assorted
26
+ the ` SeqIO ` module is to provide a simple uniform interface to assorted
27
27
sequence file formats.
28
28
29
- Similarly, Bio.AlignIO deals with files containing one or more sequence
30
- alignments represented as Alignment objects. Bio.AlignIO uses the same
31
- set of functions for input and output as in Bio.SeqIO, and the same
29
+ Similarly, ` Bio.AlignIO ` deals with files containing one or more sequence
30
+ alignments represented as Alignment objects. ` Bio.AlignIO ` uses the same
31
+ set of functions for input and output as in ` Bio.SeqIO ` , and the same
32
32
names for the file formats supported.
33
33
34
- Note that the inclusion of Bio.AlignIO does lead to some duplication or
35
- choice in how to deal with some file formats. For example, Bio.AlignIO
36
- and Bio.Nexus will both read alignments from NEXUS files - but Bio.NEXUS
37
- allows more control and the use of trees.
34
+ Note that the inclusion of ` Bio.AlignIO ` does lead to some duplication or
35
+ choice in how to deal with some file formats. For example, ` Bio.AlignIO `
36
+ and ` Bio.Nexus ` will both read alignments from NEXUS files - but
37
+ ` Bio.NEXUS ` allows more control and the use of trees.
38
38
39
39
My vision is that for reading or writing sequence alignments you should
40
- try Bio.AlignIO as your first choice. In some cases you may only care
40
+ try ` Bio.AlignIO ` as your first choice. In some cases you may only care
41
41
about the sequences themselves, in which case try using
42
42
[ Bio.SeqIO] ( SeqIO " wikilink ") on the alignment file directly. Unless you
43
43
have some very specific requirements, I hope this should suffice.
@@ -98,48 +98,50 @@ Fib\_gamma](http://pfam.sanger.ac.uk/family?acc=PF09395). At the time of
98
98
writing, this contained 14 sequences with an alignment length of 77
99
99
amino acids, and is shown below in the PFAM or Stockholm format:
100
100
101
- # STOCKHOLM 1.0
102
- #=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
103
- #=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
104
- #=GS O02676_CROCR/1-77 AC O02676.1
105
- #=GS Q6X869_TENEC/1-77 AC Q6X869.1
106
- #=GS FIBG_HUMAN/40-116 AC P02679.3
107
- #=GS O02689_TAPIN/1-77 AC O02689.1
108
- #=GS O02688_PIG/1-77 AC O02688.1
109
- #=GS O02672_9CETA/1-77 AC O02672.1
110
- #=GS O02682_EQUPR/1-77 AC O02682.1
111
- #=GS Q6X870_CYNVO/1-77 AC Q6X870.1
112
- #=GS FIBG_RAT/40-116 AC P02680.3
113
- #=GS Q6X866_DROAU/1-76 AC Q6X866.1
114
- #=GS O93568_CHICK/40-116 AC O93568.1
115
- #=GS FIBG_XENLA/38-114 AC P17634.1
116
- Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
117
- Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
118
- O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
119
- Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
120
- FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
121
- #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
122
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
123
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
124
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
125
- #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
126
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
127
- #=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
128
- O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
129
- O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
130
- O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
131
- O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
132
- Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
133
- FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
134
- Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
135
- O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
136
- #=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
137
- #=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
138
- #=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
139
- FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
140
- #=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
141
- #=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
142
- //
101
+ ```
102
+ # STOCKHOLM 1.0
103
+ #=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
104
+ #=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
105
+ #=GS O02676_CROCR/1-77 AC O02676.1
106
+ #=GS Q6X869_TENEC/1-77 AC Q6X869.1
107
+ #=GS FIBG_HUMAN/40-116 AC P02679.3
108
+ #=GS O02689_TAPIN/1-77 AC O02689.1
109
+ #=GS O02688_PIG/1-77 AC O02688.1
110
+ #=GS O02672_9CETA/1-77 AC O02672.1
111
+ #=GS O02682_EQUPR/1-77 AC O02682.1
112
+ #=GS Q6X870_CYNVO/1-77 AC Q6X870.1
113
+ #=GS FIBG_RAT/40-116 AC P02680.3
114
+ #=GS Q6X866_DROAU/1-76 AC Q6X866.1
115
+ #=GS O93568_CHICK/40-116 AC O93568.1
116
+ #=GS FIBG_XENLA/38-114 AC P17634.1
117
+ Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
118
+ Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
119
+ O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
120
+ Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
121
+ FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
122
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
123
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
124
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
125
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
126
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
127
+ #=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
128
+ #=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
129
+ O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
130
+ O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
131
+ O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
132
+ O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
133
+ Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
134
+ FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
135
+ Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
136
+ O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
137
+ #=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
138
+ #=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
139
+ #=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
140
+ FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
141
+ #=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
142
+ #=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
143
+ //
144
+ ```
143
145
144
146
You will notice that there is plenty of annotation information here,
145
147
including accession numbers for each sequence and also some PDB database
@@ -149,53 +151,54 @@ chick fibrinogen proteins.
149
151
This file contains a single alignment, so we can use the
150
152
` Bio.AlignIO.read() ` function to load it in Biopython. Let's assume
151
153
you have downloaded this alignment from Sanger, or have copy and pasted
152
- the text above, and saved this as a file called ` PF09395\_seed .sth` on
154
+ the text above, and saved this as a file called ` PF09395_seed .sth` on
153
155
your computer. Then in python:
154
156
155
157
``` python
156
158
from Bio import AlignIO
157
159
alignment = AlignIO.read(open (" PF09395_seed.sth" ), " stockholm" )
158
- print " Alignment length %i " % alignment.get_alignment_length()
160
+ print ( " Alignment length %i " % alignment.get_alignment_length() )
159
161
for record in alignment :
160
- print record.seq, record.id
162
+ print ( record.seq + " " + record.id)
161
163
```
162
164
163
165
That should give:
164
166
165
- Alignment length 77
166
- GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
167
- RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
168
- RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
169
- RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
170
- RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
171
- RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
172
- RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
173
- RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
174
- RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
175
- RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
176
- RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
177
- RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
178
- RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
179
- RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
167
+ ```
168
+ Alignment length 77
169
+ GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
170
+ RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
171
+ RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
172
+ RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
173
+ RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
174
+ RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
175
+ RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
176
+ RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
177
+ RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
178
+ RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
179
+ RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
180
+ RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
181
+ RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
182
+ RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
183
+ ```
180
184
181
185
Alignment Output
182
186
----------------
183
187
184
188
As in [ Bio.SeqIO] ( SeqIO " wikilink ") , there is a single output function
185
- ** Bio.AlignIO.write()** . This takes three arguments: some alignments, a
189
+ ` Bio.AlignIO.write() ` . This takes three arguments: some alignments, a
186
190
file handle to write to, and the format to use.
187
191
188
- As of Biopython 1.48, the alignment object acquired a ** format()**
192
+ As of Biopython 1.48, the alignment object acquired a ` . format()`
189
193
method to give a string containing the alignment in the specified file
190
194
format, e.g.
191
195
192
196
``` python
193
197
AlignIO.read(open (" PF09395_seed.sth" ), " stockholm" )
194
- print alignment.format(" fasta" )
198
+ print ( alignment.format(" fasta" ) )
195
199
```
196
200
197
- This wiki section needs to be filled out, so in the short term please
198
- refer to the Bio.AlignIO chapter in the Tutorial.
201
+ Please refer to the Bio.AlignIO chapter in the Tutorial for more details.
199
202
200
203
File Format Conversion
201
204
----------------------
0 commit comments