I. CENTER FOR INFORMATION BIOLOGY AND DNA DATA BANK OF JAPAN
I-b. Laboratory for Gene-Product Informatics - Ken Nishikawa Group

RESEARCH ACTIVITIES

(1) Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins

Akira R. Kinjo and Ken Nishikawa

--The pattern of amino acid substitutions and sequence conservation over many structure-based alignments of protein sequences was analyzed as a function of percentage sequence identity. The statistics of the amino acid substitutions were converted into the form of log-odds amino acid substitution matrices to which eigenvalue decomposition was applied. It was found that the most important component of the substitution matrices exhibited a sharp transition at the sequence identity of 30-35%, which coincides with the twilight zone. Above the transition point, the most dominant component is related to the mutability of amino acids and it acts to disfavor any substitutions, whereas below the transition point, the most dominant component is related to the hydrophobicity of amino acids and substitutions between residues of similar hydrophobic character are positively favored. Implications for protein evolution and sequence analysis are discussed. See Ref. 1 for details.

(2) Estimation of the number of authentic orphan genes in bacterial genomes

Satoshi Fukuchi and Ken Nishikawa

--Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as “Orphans". The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of “authentic" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea. See Ref. 2 for details.

(3) Alternative splice variants encoding unstable protein domains exist in the human brain

Keiichi Homma, Reiko F. Kikuno, Takahiro Nagase, Osamu Ohara and Ken Nishikawa

--Alternative splicing has been recognized as a major mechanism by which protein diversity is increased without significantly increasing genome size in animals and has crucial medical implications, as many alternative splice variants are known to cause diseases. Despite the importance of knowing what structural changes alternative splicing introduces to the encoded proteins for the consideration of its significance, the problem has not been adequately explored. Therefore, we systematically examined the structures of the proteins encoded by the alternative splice variants in the HUGE protein database derived from long (>4 kb) human brain cDNAs. Limiting our analyses to reliable alternative splice junctions, we found alternative splice junctions to have a slight tendency to avoid the interior of SCOP domains and a strong statistically significant tendency to coincide with SCOP domain boundaries. These findings reflect the occurrence of some alternative splicing events that utilize protein structural units as a cassette. However, 50 cases were identified in which SCOP domains are disrupted in the middle by alternative splicing. In six of the cases, insertions are introduced at the molecular surface, presumably affecting protein functions, while in 11 of the cases alternatively spliced variants were found to encode pairs of stable and unstable proteins. The mRNAs encoding such unstable proteins are much less abundant than those encoding stable proteins and tend not to have corresponding mRNAs in non-primate species. We propose that most unstable proteins encoded by alternative splice variants lack normal functions and are an evolutionary dead-end. See Ref. 3 for details.

(4) Construction and characterization of chimeric proteins composed of type-1 and type-2 periplasmic binding proteins MglB and ArgT

Kenji Kashiwagi, Kaoru Fukami-Kobayashi, Kiyotaka Shiba and Ken Nishikawa

--The respective type-1 and type-2 periplasmic binding proteins (PBPs) MglB and ArgT are believed to have evolved from a common ancestor into siblings showing topological differences in their main chain connectivity. At first glance, they show similar structure. But, more detailed examination reveals that the chain connectivity of ArgT is more convoluted than that of MglB. Reflecting that complexity, the folding of ArgT is complicated and involves intermediate folds. On the other hand, the folding of MglB is a simple two-state transition. In the present study, we constructed and characterized several chimeras made up of various subdomains of MglB and ArgT with the aim of gaining insight into the evolution of protein folding and protein structure. Although these chimeras did not fold as compactly as their parental proteins, some did exhibit cooperative folding, which suggests that novel proteins with new connectivity and new folding pathways could have emerged at a fairly high rate throughout the evolution of proteins. See Ref. 4 for details.

PUBLICATIONS

Papers
1. Kinjo, A.R. and Nishikawa, K. (2004). Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins. Bioinformatics, 20 (16) 2504-2508.
2. Fukuchi, S. and Nishikawa, K. (2004). Estimation of the number of authentic orphan genes in bacterial genomes. DNA Res., 11, 219-231.
3. Homma, K., Kikuno, R.F., Nagase, T., Ohara, O. and Nishikawa, K. (2004). Alternative splice variants encoding unstable protein domains exist in the human brain. J. Mol. Biol., 343, 1207-1220.
4. Kashiwagi, K., Fukami-Kobayashi, K., Shiba, K. and Nishikawa, K. (2004). Construction and characterization of chimeric proteins composed of type-1 and type-2 periplasmic binding proteins MglB and ArgT. Biosci. Biotechnol. Biochem., 68 (4) 808-813.
5. Imanishi, T., Itoh, T., Suzuki, Y., O'Donovan, C. and Fukuchi, S. et al. (2004). Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biology, 2 (6), 856-875.
6. Kinjo, A.R., Horimoto, K. and Nishikawa, K. (2005). Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins, 58, 158-165.
7. Kinjo, A.R. and Nishikawa, K. Recoverable one-dimensional encoding of protein three-dimensional structures. Bioinformatics, in press.

Reviews
8. 吉宗一晃,福地佐斗志,森口充瞭,西川建(2004)「タンパク質から見た極限微生物の環境適応戦略」,バイオサイエンスとインダストリー,Vol.62, 17-22.
9. 福地佐斗志,西川建(2004)「蛋白質構造解析プログラム・データベース」蛋白質核酸酵素増刊「バイオ高性能機器・新技術利用マニュアル」(小原収他編,共立出版)Vol.49 (11), pp.1944-1948.

Database
GTOP(ゲノム中のタンパク質立体構造DB):
http://spock.genes.nig.ac.jp/~genome/gtop.html
PMD(変異タンパク質DB):
http://pmd.ddbj.nig.ac.jp/~pmd/pmd.html
TTDB(原核生物の転写因子DB):
http://spock.genes.nig.ac.jp/~ttdb/

ORAL PRESENTATIONS

1. Kinjo, R.A.: Competition between protein folding and aggregation inside the cell: Studies by density functional theory (invited talk). NMRS 2004 Symposium on NMR, Drug Design, and Bioinformatics, Saha Institute of Nuclear Physics, Kolkata,India, Feb. 2004.
2. Nishikawa, K.: Genome-wide compositional changes of DNA and proteins in thermophilic bacteria for adaptation to higher temperatures. The 1st Pacific-Rim International Conference on Protein Science, Yokohama, Apr. 2004.
3. Nishikawa, K.: A study of comparative genomics based on domain structures of proteins. Satellite Symposium of PRICPS2004, Yokohama, Apr. 2004.

POSTER PRESENTATIONS

1. 福地佐斗志、深海薫、本間桂一、太田元規、西川建「H-invitationalヒトcDNAの解析から見つかった挿入アミノ酸配列」第27回日本分子生物学会年会、神戸、2004年12月
2. 本間桂一、菊野玲子、長瀬隆弘、小原収、西川建「選択的スプライシングバリアントの中には不安定なタンパク質をコードするものもある」第27回日本分子生物学会年会、神戸、2004年12月
3. 長島剛宏、三井崇志、金城玲、西川建「Wang-Landau MDを用いたタンパク質の構造空間探索」第42回日本生物物理学会年会、京都、2004年12月
4. 金城玲、西川建「タンパク質の一次元情報から天然構造を再現する」第42回日本生物物理学会年会、京都、2004年12月
5. 峯崎善章、西川建「自動判定法による転写因子の網羅的同定と比較ゲノム解析」第42回日本生物物理学会年会、京都、2004年12月

EDUCATION

1. 福地佐斗志 第9回DDBJing講習会,東京,3月.
2. 西川建,福地佐斗志,金城玲 科学技術振興事業団主催ゲノムリテラシー講座「データベースを利用した蛋白質の立体構造予測」東京,7月.
3. 西川建 立命館大学理工学部生命情報学科セミナー,11月.