Illustration with Examples
In this section we give three examples that demonstrate what input and
output files of MASIA look like and how to run the program.
The following is the multiple aligned sequences in
GCG format (with extension name *.msf).
Symbol comparison table: dayhoff
Gap weight: 8
Gap length weight: 0
MSF: 121 Check: 0 ..
Name: P02246 Len: 121 Check: 4021 Weight: 1.00
Name: P02244 Len: 121 Check: 2714 Weight: 1.00
Name: P02245 Len: 121 Check: 2709 Weight: 1.00
Name: P22766 Len: 121 Check: 809 Weight: 1.00
Name: P80255 Len: 121 Check: 7844 Weight: 1.00
Name: P22761 Len: 121 Check: 7484 Weight: 1.00
Name: P02247 Len: 121 Check: 717 Weight: 1.00
Name: P27686 Len: 121 Check: 845 Weight: 1.00
Name: P22765 Len: 121 Check: 9962 Weight: 1.00
Name: P22764 Len: 121 Check: 8331 Weight: 1.00
//
1 50
P02246 GFPIPDPYCW DISFRTFYTI VDDEHKTLFN GILL.LSQAD NADHLNELRR CTGKHFLNEQ
P02244 GFPIPDPYVW DPSFRTFYSI IDDEHKTLFN GIFH.LAIDD NADNLGELRR CTGKHFLNEQ
P02245 GFPIPDPYGW DPSFRTFYSI IDDEHKTLFN GIFH.LAIDD NADNLGELRR CTGKHFLNQE
P22766 GFPVPDPFIW DASFKTFYDD LDNQHKQLFQ AILT.QGNVG GATAGDNAYA CLVAHFLFEE
P80255 GFEIPEPYKW DESFQVFYEK LDEEHKQIFN AIFA.LCGGN NAGNLKSLVD VTANHFADEE
P22761 GFEVPEPFKW DESFQVFYDK LDEEHKQIFN AIFA.LGGGN NADNLKKMID VTANHFADEE
P02247 GWEIPEPYVW DESFRVFYEQ LDEEHKKIFK GIFDCIRD.N SAPNLATLVK VTTNHFTHEE
P27686 PFDIPEPYVW DESFRVFYDN LDDEHKGLFK GVFNCAADMS SAGNLKHLID VTTTHFRNEE
P22765 .MKIPVPYAW TPDFKTTYEN IDSEHRTLFN GLFA.LSEFN TQHQLNAAIE VFTLHFHDEQ
P22764 .VKVPEPFAW NESFATSYKN IDLEHRTLFN GLFA.LSEFN TRDQLLACKE VFVMHFRDEQ
The macro conses contains the following commands which perform consensus analysis at different conservative levels at each position of the multiple aligned sequences. The cut off for the frequency of appearece in multiple sequences are set up from 40% to 100%, and the method for searching conservation of property is dominant criterion as discribed in Method Description.
macro file conses contains following commands:
property name=consensus cut=0.40 art=d property name=consensus cut=0.50 art=d property name=consensus cut=0.60 art=d property name=consensus cut=0.70 art=d property name=consensus cut=0.80 art=d property name=consensus cut=0.90 art=d property name=consensus cut=1.00 art=d
After running the macro conses, the following outputs are generated with MASIA.
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ------------------------------------------------------------------------ consensus GFPIPEPY WDESFRTFY LDDEHKTLFNGIFA.L NNADNL L VT HFLDEE consensus GF IPEPY WDESFRTFY LD EHKTLFNGIF .L NNADNL L VT HF EE consensus GF IP PY WD SF TFY D EHK LFNGIF .L A NL L VT HF EE consensus GF IP PY WD SF FY D EHK LFNGIF .L A L T HF E consensus P P WD SF FY D EHK F F . A L HF E consensus P P W SF Y D EH F . L HF E consensus P P W F Y D H F . HF
In the macrossio, the commands are grouped to
perform secondary structure (
,
, and turn) and inside/outside predictions,
and the following commands are to generate outputs. The methods for
group and property commands used here have been described
in Hanggi & Braun, (1994), FEBS Letter, 34, 147-153.
macro file ssio contains commands:
characteristic name=alph group x=3 y=5 property name=alpha4 art=e characteristic name=beta group x=3 y=5 property name=beta4 art=e characteristic name=turn group x=4 y=4 property name=turn4 art=e characteristic name=insd group property name=inout art=e characteristic name=outs group property name=inout look=o art=e
The output .res would be:
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- alpha4 HHH H HH H HH H H H H HH HH .H H HH H H HH sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- beta4 B B B B B BBB B BBB BB .B B B BBB B sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- turn4 T T T TT T T TTT T T TT . T TT TT T T sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- inout ioiooo ioooio i ooiooo ooiio ii .i ioo oi iioio i ooo sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- inout ioiooo ioooio i ooiooo ooiio ii .i ioo oi iioio i ooo
And output .mas file:
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
alph HHH H HH H HH H H H H HH HH H H HH H H HH
HHHHHHHHHHHHHHHHHH HHHHHHHHHHHHH HHHHHHHH HHHHH
beta B B B B B BBB B BBB BB B B B BBB B
BBBBBBBBB BBBBBBBBBBBB BBBBBBBBB
turn T T T TT T T TTT T T TT T TT TT T T
insd i i i i i i ii ii i i i ii i i
i i i i i i ii ii i i i ii i i
outs o ooo ooo o oo ooo oo o oo o o o ooo
o ooo ooo o oo ooo oo o oo o o o ooo
---------------------------------------------------------------------
alph HHHHHHHHHHHHHHHHHH HHHHHHHHHHHHH HHHHHHHH HHHHH
beta BBBBBBBBB BBBBBBBBBBBB BBBBBBBBB
turn
insd i i i i i i ii ii i i i ii i i
outs o ooo ooo o oo ooo oo o oo o o o ooo
SSP HHHHHHHHHHHHBBBBBBBBB BBBBBBBBBBBB HHHHHHBBBBBBBBBHHHH
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
To obtain the comparison of predicted results with experimental data, the corresponding experemental secondary structure and inside/outside file filename.sec is needed as below,
#TOPOLOGY FILE alph H1 22 37 alph H2 41 60 insd 9 4 7 8 10 11 14 15 17 18 insd 9 23 26 27 30 33 34 35 37 38 insd 9 40 41 45 49 52 53 54 56 57 insd 2 59 60 outs 9 1 2 3 5 6 9 12 13 16 outs 9 19 20 21 22 24 25 28 29 31 outs 9 32 36 39 42 43 44 46 47 48 outs 4 50 51 55 58
The comparison file *.cpr from MASIA is:
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
alph HHH H HH H HH H H H H HH HH H H HH H H HH
HHHHHHHHHHHHHHHHHH HHHHHHHHHHHHH HHHHHHHH HHHHH
beta B B B B B BBB B BBB BB B B B BBB B
BBBBBBBBB BBBBBBBBBBBB BBBBBBBBB
turn T T T TT T T TTT T T TT T TT TT T T
insd i i i i i i ii ii i i i ii i i
i i i i i i ii ii i i i ii i i
outs o ooo ooo o oo ooo oo o oo o o o ooo
o ooo ooo o oo ooo oo o oo o o o ooo
---------------------------------------------------------------------
alph HHHHHHHHHHHHHHHHHH HHHHHHHHHHHHH HHHHHHHH HHHHH
beta BBBBBBBBB BBBBBBBBBBBB BBBBBBBBB
turn
insd i i i i i i ii ii i i i ii i i
outs o ooo ooo o oo ooo oo o oo o o o ooo
SSP HHHHHHHHHHHHBBBBBBBBB BBBBBBBBBBBB HHHHHHBBBBBBBBBHHHH
exp. Str HHHHHHHHHHHHH HHH HHHHHHHHHHHHHHHHHHH
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
accuracy: 22.0
In the macro kings, the rules for making secondary structure prediction are grouped together accordingly to perform the pattern analysis. All the rules have been generalized by using symbolic induction method, and can be found in King RD et al., (1990) J. Mol. Biol. 216, 441-457. We have added all properties in the property library prop.lib.
the macro file kings:characteristic name=beta abbrev=B group crit=0.60 print=B property name=sm/pl art=e property name=al/lg-pl art=e property name=ar/al/M art=e property name=al/lg-pl art=e property name=hb/sm-P art=e step *4 characteristic name=alph grpcrit=0.50 abbrev=H group crit=0.75 print=H property *4 name=all-KP art=e property name=sm-P/pl-ar art=e property name=all-KP art=e property name=All art=e property name=hb/sm-P art=e property name=pl-ar/ch art=e property name=ar/M art=e property name=ar/al/M art=e property *2 name=all-P art=e property name=hb/P art=e step *13 group crit=0.80 print=H property name=all-KP art=e property *2 name=sm/pl art=e property name=charged-H art=e property name=al/lg-pl art=e property name=all-P art=e property name=sm-P/pl art=e property name=hphobic art=e property name=ar/vhb art=e property name=ti/pl art=e property name=sm-P/pl art=e step *10 group crit=0.80 print=H property name=All art=e property name=sm-P/pl-ar art=e property name=All art=e property name=al/sm+hb-T art=e property name=ti/n+hl/T art=e property name=sm-P/pl-ar art=e property name=vhb art=e property name=lg art=e property name=positive art=e property name=all-P art=e property name=All art=e step *10 characteristic name=turn abbrev=T group print=T property name=All art=e property name=ti/sm+pl art=e cut=1.20 property name=All art=e property name=ti/pl-ar art=e cut=1.20 step *3
After running macro kings, the result and MASIA files are generated as following.
The result file exam3.res:sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- sm/pl s sss sssss ss sss ssssss ss s. s sssss ss ss s s ssss al/lg-pl a a a a a aa aa .a a a a a ar/al/M a a a a a aa a a aa aa .a a a a aa al/lg-pl a a a a a aa aa .a a a a a hb/sm-P h h hhhh hh hhh hhh hh hhhhhhh.hhhhhh h hhhhhhhhhhhh h sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa sm-P/pl-ar s s s sss ss ss sss ss ss s. s sss s ss ss s s ss all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA hb/sm-P h h hhhh hh hhh hhh hh hhhhhhh.hhhhhh h hhhhhhhhhhhh h pl-ar/ch p p ppp p pp pppppp p p. pp p p p p p ppp ar/M a a a a aa a a a . aa ar/al/M a a a a a aa a a aa aa .a a a a aa all-P aaa a aaaa aaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaa all-P aaa a aaaa aaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaa hb/P hh hh hhhh h hhh h hh hh hhh .hh hh hhhhh hhhhhh sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- all-KP a a a a aa aaaaaaa aaaaa aaa aaaa.aaaaaaaaaa aaaaaaaaaaaaaa sm/pl s sss sssss ss sss ssssss ss s. s sssss ss ss s s ssss sm/pl s sss sssss ss sss ssssss ss s. s sssss ss ss s s ssss charged-H c c cc c c ccc c . c cc al/lg-pl a a a a a aa aa .a a a a a all-P aaa a aaaa aaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaa sm-P/pl s s sssss ss sss ssssss ss s. s sssss ss ss s s ssss hphobic hh h hhh h hhh h hhhhh hhh .hh h hhhhh hhhhhh ar/vhb aa a aaa a aa a a aa aaa .aa a a a aa a a aa ti/pl t t ttttt t ttt tttttt tt t. tt ttttt tt t t tttt sm-P/pl s s sssss ss sss ssssss ss s. s sssss ss ss s s ssss sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA sm-P/pl-ar s s s sss ss ss sss ss ss s. s sss s ss ss s s ss All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA al/sm+hb-T a a a a a aa .aa a a aa a a ti/n+hl/T t ttt tt ttt t tt t. tt ttttt t t tt ttt sm-P/pl-ar s s s sss ss ss sss ss ss s. s sss s ss ss s s ss vhb hh h h h h h hh hhh .hh h h h hh h h h lg lll l l l ll ll l lll ll ll .l l l ll ll positive p pp . p all-P aaa a aaaa aaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaa All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ ----------------------------------------------------------------------- All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA ti/sm+pl t t t t t. t ttt t t t All AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.AAAAAAAAAAAAAAAAAAAAAAAAA ti/pl-ar t t ttt t tt t t tt tt t. ttt t t t tt
The MASIA file exam3.mas:
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
beta BBBBB BBBBBBBBBBB BBBBBBBBBBBBBB BBBBBBBBBBBBBBBB
BBBBB BBBBBBBBBBB BBBBBBBBBBBBBB BBBBBBBBBBBBBBBB
alph HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
alph HHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHH
HHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHH
alph HHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHH
HHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHH
turn TTTTTT TTTT TTTTTTTT
TTTTTT TTTT TTTTTTTT
---------------------------------------------------------------------
beta BBBBB BBBBBBBBBBB BBBBBBBBBBBBBB BBBBBBBBBBBBBBBB
alph HHHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHH
turn TTTTTT TTTT TTTTTTTT
SSP TTTTTTHHHHHTTTTHHHHHHHHHHH TTTTTTTTHHHHHHHHHHH
exp. Str HHHHHHHHHHHHH HHH HHHHHHHHHHHHHHHHHHH
sequence GFPIPDPYCWDISFRTFYTIVDDEHKTLFNGILL.LSQADNADHLNELRRCTGKHFLNEQ
accuracy: 67.8