Skip to content

Commit 4bd8c87

Browse files
committed
Print functions and back-tick markup for AlignIO page etc
See #47.
1 parent ceef80e commit 4bd8c87

File tree

1 file changed

+80
-77
lines changed

1 file changed

+80
-77
lines changed

wiki/AlignIO.md

Lines changed: 80 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ tags:
66
- Wiki Documentation
77
---
88

9-
This page describes Bio.AlignIO, a new multiple sequence Alignment
9+
This page describes `Bio.AlignIO`, a new multiple sequence Alignment
1010
Input/Output interface for BioPython 1.46 and later.
1111

1212
In addition to the built in API documentation, there is a whole chapter
1313
in the [Tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
1414
on Bio.AlignIO, and although there is some overlap it is well worth
15-
reading in addition to this WIKI page. There is also the [API
15+
reading in addition to this page. There is also the [API
1616
documentation](http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html)
17-
(which you can read online, or from within Python with the help
17+
(which you can read online, or from within Python with the `help()`
1818
command).
1919

2020
Aims
@@ -23,21 +23,21 @@ Aims
2323
You may already be familiar with the [Bio.SeqIO](SeqIO "wikilink")
2424
module which deals with files containing one or more sequences
2525
represented as [SeqRecord](SeqRecord "wikilink") objects. The purpose of
26-
the SeqIO module is to provide a simple uniform interface to assorted
26+
the `SeqIO` module is to provide a simple uniform interface to assorted
2727
sequence file formats.
2828

29-
Similarly, Bio.AlignIO deals with files containing one or more sequence
30-
alignments represented as Alignment objects. Bio.AlignIO uses the same
31-
set of functions for input and output as in Bio.SeqIO, and the same
29+
Similarly, `Bio.AlignIO` deals with files containing one or more sequence
30+
alignments represented as Alignment objects. `Bio.AlignIO` uses the same
31+
set of functions for input and output as in `Bio.SeqIO`, and the same
3232
names for the file formats supported.
3333

34-
Note that the inclusion of Bio.AlignIO does lead to some duplication or
35-
choice in how to deal with some file formats. For example, Bio.AlignIO
36-
and Bio.Nexus will both read alignments from NEXUS files - but Bio.NEXUS
37-
allows more control and the use of trees.
34+
Note that the inclusion of `Bio.AlignIO` does lead to some duplication or
35+
choice in how to deal with some file formats. For example, `Bio.AlignIO`
36+
and `Bio.Nexus` will both read alignments from NEXUS files - but
37+
`Bio.NEXUS` allows more control and the use of trees.
3838

3939
My vision is that for reading or writing sequence alignments you should
40-
try Bio.AlignIO as your first choice. In some cases you may only care
40+
try `Bio.AlignIO` as your first choice. In some cases you may only care
4141
about the sequences themselves, in which case try using
4242
[Bio.SeqIO](SeqIO "wikilink") on the alignment file directly. Unless you
4343
have some very specific requirements, I hope this should suffice.
@@ -98,48 +98,50 @@ Fib\_gamma](http://pfam.sanger.ac.uk/family?acc=PF09395). At the time of
9898
writing, this contained 14 sequences with an alignment length of 77
9999
amino acids, and is shown below in the PFAM or Stockholm format:
100100

101-
# STOCKHOLM 1.0
102-
#=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
103-
#=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
104-
#=GS O02676_CROCR/1-77 AC O02676.1
105-
#=GS Q6X869_TENEC/1-77 AC Q6X869.1
106-
#=GS FIBG_HUMAN/40-116 AC P02679.3
107-
#=GS O02689_TAPIN/1-77 AC O02689.1
108-
#=GS O02688_PIG/1-77 AC O02688.1
109-
#=GS O02672_9CETA/1-77 AC O02672.1
110-
#=GS O02682_EQUPR/1-77 AC O02682.1
111-
#=GS Q6X870_CYNVO/1-77 AC Q6X870.1
112-
#=GS FIBG_RAT/40-116 AC P02680.3
113-
#=GS Q6X866_DROAU/1-76 AC Q6X866.1
114-
#=GS O93568_CHICK/40-116 AC O93568.1
115-
#=GS FIBG_XENLA/38-114 AC P17634.1
116-
Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
117-
Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
118-
O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
119-
Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
120-
FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
121-
#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
122-
#=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
123-
#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
124-
#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
125-
#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
126-
#=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
127-
#=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
128-
O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
129-
O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
130-
O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
131-
O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
132-
Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
133-
FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
134-
Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
135-
O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
136-
#=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
137-
#=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
138-
#=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
139-
FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
140-
#=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
141-
#=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
142-
//
101+
```
102+
# STOCKHOLM 1.0
103+
#=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
104+
#=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
105+
#=GS O02676_CROCR/1-77 AC O02676.1
106+
#=GS Q6X869_TENEC/1-77 AC Q6X869.1
107+
#=GS FIBG_HUMAN/40-116 AC P02679.3
108+
#=GS O02689_TAPIN/1-77 AC O02689.1
109+
#=GS O02688_PIG/1-77 AC O02688.1
110+
#=GS O02672_9CETA/1-77 AC O02672.1
111+
#=GS O02682_EQUPR/1-77 AC O02682.1
112+
#=GS Q6X870_CYNVO/1-77 AC Q6X870.1
113+
#=GS FIBG_RAT/40-116 AC P02680.3
114+
#=GS Q6X866_DROAU/1-76 AC Q6X866.1
115+
#=GS O93568_CHICK/40-116 AC O93568.1
116+
#=GS FIBG_XENLA/38-114 AC P17634.1
117+
Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
118+
Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
119+
O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
120+
Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
121+
FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
122+
#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
123+
#=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
124+
#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
125+
#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
126+
#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
127+
#=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
128+
#=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
129+
O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
130+
O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
131+
O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
132+
O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
133+
Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
134+
FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
135+
Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
136+
O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
137+
#=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
138+
#=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
139+
#=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
140+
FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
141+
#=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
142+
#=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
143+
//
144+
```
143145

144146
You will notice that there is plenty of annotation information here,
145147
including accession numbers for each sequence and also some PDB database
@@ -149,53 +151,54 @@ chick fibrinogen proteins.
149151
This file contains a single alignment, so we can use the
150152
`Bio.AlignIO.read()` function to load it in Biopython. Let's assume
151153
you have downloaded this alignment from Sanger, or have copy and pasted
152-
the text above, and saved this as a file called `PF09395\_seed.sth` on
154+
the text above, and saved this as a file called `PF09395_seed.sth` on
153155
your computer. Then in python:
154156

155157
``` python
156158
from Bio import AlignIO
157159
alignment = AlignIO.read(open("PF09395_seed.sth"), "stockholm")
158-
print "Alignment length %i" % alignment.get_alignment_length()
160+
print("Alignment length %i" % alignment.get_alignment_length())
159161
for record in alignment :
160-
print record.seq, record.id
162+
print(record.seq + " " + record.id)
161163
```
162164

163165
That should give:
164166

165-
Alignment length 77
166-
GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
167-
RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
168-
RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
169-
RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
170-
RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
171-
RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
172-
RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
173-
RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
174-
RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
175-
RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
176-
RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
177-
RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
178-
RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
179-
RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
167+
```
168+
Alignment length 77
169+
GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
170+
RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
171+
RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
172+
RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
173+
RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
174+
RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
175+
RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
176+
RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
177+
RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
178+
RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
179+
RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
180+
RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
181+
RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
182+
RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
183+
```
180184

181185
Alignment Output
182186
----------------
183187

184188
As in [Bio.SeqIO](SeqIO "wikilink"), there is a single output function
185-
**Bio.AlignIO.write()**. This takes three arguments: some alignments, a
189+
`Bio.AlignIO.write()`. This takes three arguments: some alignments, a
186190
file handle to write to, and the format to use.
187191

188-
As of Biopython 1.48, the alignment object acquired a **format()**
192+
As of Biopython 1.48, the alignment object acquired a `.format()`
189193
method to give a string containing the alignment in the specified file
190194
format, e.g.
191195

192196
``` python
193197
AlignIO.read(open("PF09395_seed.sth"), "stockholm")
194-
print alignment.format("fasta")
198+
print(alignment.format("fasta"))
195199
```
196200

197-
This wiki section needs to be filled out, so in the short term please
198-
refer to the Bio.AlignIO chapter in the Tutorial.
201+
Please refer to the Bio.AlignIO chapter in the Tutorial for more details.
199202

200203
File Format Conversion
201204
----------------------

0 commit comments

Comments
 (0)