|
I. CENTER FOR
INFORMATION BIOLOGY AND DNA DATA BANK OF JAPAN
I-d. Laboratory for Research and Development of
Biological Databases - Hideaki Sugawara
Group
RESEARCH
ACTIVITIES
(1)
Information systems for molecular biology and its
related disciplines
1) From Web services to
a Bioportal
Yasumasa Shigemoto†,
Haruka Sakai, Takashi Abe, Satoru Miyazaki††
and Hideaki Sugawara (†Hitachi
soft, ††Tokyo
Univ. of Sci.)
--The publicly
available bioinformatics resources, comprising
databases and analytical tools, have expanded in
recent years. While the information environment for
life sciences has gradually become more abounding,
it is still difficult to combine multiple,
heterogeneous bioinformatics resources for a
specific research purpose. To set up and run an
integrated system, it is often necessary to write
and update custom programs. In addition, different
research groups continually write programs that
have overlapping functions. We need an information
environment that is conducive to efficient and
appropriate bioinformatics resource utilization for
a wide range of users. Therefore, the Center for
Information Biology and DNA Data Bank of Japan, in
alliance with the National Institute of Informatics
(NII) and the Mitsubisi Research Institute, Inc.
(MRI) have started a three years long project since
2003, “Research and Development of the New
Generation of Bio-portal", to enhance the
information environment for the relevant user
communities. In this project, the Laboratory for
Research and Development of Biological Databases is
responsible for the development of biological Web
services. The project site is open at http://www.bioportal.jp/in
2004. From there, a Web page for links to sites
complete genome sequence and annotation are also
prepared and accessible, in addition to the
biological Web services. The former is named
“Genome Menu".
2) Expansion of Genome
Information Broker (GIB)
Masaki Hirahata, Naoto Tanaka, Takashi Abe,
Satoru Miyazaki†
and Hideaki Sugawa (†Tokyo
Univ. of Sci.)
--GIB was
originally created for the retrieval and analysis
of E. coli genomic information in a set. We
implemented microbial genome data into GIB whenever
genome sequencing was completed and the data is
made open to the public. At the GIB Web page
(http://gib.genes.nig.ac.jp/),
key word search, homology search, links to DBGET,
KEGG and GTOP and visualization of the data are
available for more than 200 strains as of December
2004. We have utilized XML, CORBA and a distributed
database in order to cope with the explosion of
microbial genome information.
(2)
Information systems on
microbes1)
1) WFCC-MIRCEN World
Data Centre for Microorganisms
(WDCM)
Yasumasa Shigemoto†,
Junko Nagaya and Hideaki Sugawara (†Fujitsu)
--WFCC and MIRCEN
stand for World Federation for Culture Collections
and Microbial Resource Centers network
respectively. The laboratory is the host of WDCM
and maintains the World Directory of microbial
resource centers. The on-line World Directory
contains the detailed information of 469 centers in
65 countries and also the list of their holdings.
Any culture collection is able to register, update
and delete the information at http://www.wdcm.org/.
WDCM could promote the update of the data by
culture collections funded by the American Society
for Microbiology and UNESCO.
2) Development of an
e-Workbench for Biological Classification and
Identification (InforBIO)
Naoto Tanaka, Kouji Koorikawa†,
Takashi Abe, Satoru Miyazaki††
and Hideaki Sugawara (†Hitachi
soft, ††Tokyo
Univ. of Sci.)
--We continued the
development of an e-Workbench named InforBIO by use
of JAVA, XML and a relational database management
system in the public domain. We have distributed
InforBIO to several laboratories that study
microbes and improved the utility and robustness of
InforBIO based on the feedback (http://lilium.genes.nig.ac.jp/index_e.html).
3) An information
system for pathogenic microorganisms
Masaki Hirahata, Naoto Tanaka, Yasumasa
Shigemoto†
and Hideaki Sugawa (†Fujitsu)
--We participated
in a national project for the resource center of
pathogenic microorganisms. Our role is to develop
an information system for pathogenic fungi and
actinomycetes, and also a portal site for
pathogenic microorganisms in general (http://www.wdcm.org/byogen/).
(*) The information system on pathogenic
microorganisms has been supported by Special
Coordination Funds for Promoting Science and
Technology.
(3)
Applications of IT to the International Nucleotide
Sequence Database2)
1) Development of Open
Annotation System
Satoru Miyazaki†,
Takashi Abe and Hideaki Sugawara (†Tokyo
Univ. of Sci.)
--A number of the
complete genome sequences have been submitted to
INSD since 1995. The annotation information,
however, is not consistent among genome sequencing
teams. In addition, researchers outside of the team
might have more information and knowledge on some
genes and biological molecules. Therefore, it is
quite important to develop the system which allows
any expert to evaluate the annotation given by the
team to attach more valuable information. As a new
feature of INSD, we develop so-called “Open
Annotation System (OASYS)" as an annotation editor
in the distributed environment on the Internet.
(*) OASYS project has been supported by
BIRD of Japan Science and Technology Corporation
(JST)
2) Exhaustive evaluation
of microbial genome information by use of
GRID
Takehide Kosuge, Toshihisa Okido, Yasumasa
Shigemoto†,
Masaki Hirahata, Naoto Tanaka, Yuzuru Maruyama,
Takashi Abe, Satoru Miyazaki††
and Hideaki Sugawara (†Fujitsu,
††Tokyo
Univ. of Sci.)
--Tsunami of
biological data and multiple views of the data
analysis require an expandable and flexible
information environment. GRID computing is expected
to be the solution. We prepared a computational
environment composed of 5 sites in OBIGrid and
succeeded in analyzing horizontal gene transfer and
clusters of ORFs of more than 100 microbial genomes
that were stored in the Genome Information Broker
as of May, 2003. This scheme is being applied to
more than 300 thousands ORFs of genomic sequences
of 124 microbial species. In 2004, we evaluated the
results of the analysis and have developed site to
diffuse the result to the public. We also applied
the workflow to all the microbial genome sequences
that were publicly available by September 2004.
(4)
Genomics
1) Development of the
H-Invitational Database
Yasumasa Shigemoto†,
Satoru Miyazaki††
and Hideaki Sugawara (†Fujitsu,
††Tokyo
Univ. of Sci.)
--We performed an
exhaustive integrative characterization of 41,118
full-length cDNAs that capture the gene transcripts
as complete functional cassettes, providing an
unequivocal report of structural and functional
diversity at the gene level. Our international
collaboration has validated 21,037 human gene
candidates by analysis of high-quality full-length
cDNA clones through curation using unified
criteria. We have developed a human gene database,
called the H-Invitational Database (H-InvDB;
http://www.h-invitational.jp/). The H-InvDB
platform represents a substantial contribution to
resources needed for the exploration of human
biology and pathology.
2) Splicing Profile
Based Protein Categorization between Human and
Mouse Genomes
Åke Väastermark†,
Yasumasa Shigemoto††,
Takashi Abe and Hideaki Sugawara (†Univ.
of Oxford, ††Fujitsu)
--We compared gene
structures of human and mouse to explore the
relationships of functions of genes and exon-intron
structures. The central question is whether protein
function is more correlated with splicing profiles
than sequence similarity, or not. To approach this
question, a splicing profile similarity (SPS)
index, which measures relative exon length
discrepancy, was devised. Arbitrary human proteins
were compared, in terms of SPS and amino acid
sequence similarity, to their 1) mouse orthologues
and 2) human paralogues, which epitomise functional
equivalence and non-equivalence, respectively, to
methodically elucidate the global relationship
between a) biological function, b) splicing profile
similarity, and c) sequence similarity. Protein
function is more correlated with splicing profile
similarity than sequence similarity as demonstrated
by the fact that human-mouse orthologues (HMOs)
display significantly higher splicing profile
similarity than do human-human paralogues (HHPs),
despite the mutual sequence similarity between
these two categories. This finding indicates that
splicing profile-based protein categorisation is
biologically meaningful4).
3) Phylogenetics
Analyses of Environmental Samples on the Basis of
Self-Organizing Map (SOM)
Takashi Abe, Toshimichi Ikemura†
and Hideaki Sugawara (†SOKEN-DAI)
--Metagenomic
approach, which is the genome analysis on a mixture
of uncultured microorganisms, has been recently
developed to search for novel and industrially
useful genes and to study microbial diversity in a
wide variety of environments. We previously
modified the conventional SOM for genome
informatics to make the learning process and
resulting map independent of the order of data
input5), 6). In the present study, we
developed the SOM as a novel bioinformatics
strategy to capture and visualize microbial
diversity and relative abundance of microorganisms
within an environmental sample. First we
constructed SOMs of tri- and tetranucleotide
frequencies in 1- and 5-kb sequence fragments from
prokaryotic genomes for which complete sequence is
available. The sequences could be classified
primarily according to species and to 11 major
phylogenetic groups without information regarding
the species. For example, 88% of 5-kb sequences
were classified into the correct phylogenetic
group. Importantly, the classification could be
done without orthologous sequence sets, and
therefore, SOM was especially useful to analyze
novel sequences from poorly characterized species
for industrial applications and scientific studies.
With the SOM method, all non-rRNA sequences in the
Database that were from unidentified or uncultured
bacteria and longer than 1 kb were classified into
major phylogenetic groups7). The present
method can also be developed as a tool for surveys
of pathogenic microorganisms in environmental and
clinical samples that can not be cultured easily
and in sterilized samples.
PUBLICATIONS
Papers
1. Sugawara, H., Abe, T., Tanaka, N. and
Miyazaki, S. (2004). Encounter of microbiology with
the data science in the phase called post-genome
sequencing. Soil microorganisms. 58 (2),
57-67.
2. Miyazaki, S., Sugawara, H., Ikeo, K., Gojobori,
T. and Tateno, Y. (2004). DDBJ in the stream of
various biological data. Nucleic Acids Research.
32, D31-D34.
3. Imanishi, T., Itoh, T., Suzuki, Y., O'Donovan,
C., Fukuchi, S., Koyanagi, K., Barrero, R., Tamura,
T., Yamaguchi-Kabata, Y., Tanino, M., Yura, K.,
Miyazaki, S., Ikeo, K., Homma, K., Kasprzyk, A.,
Nishikawa, T., Hirakawa, M., Thierry-Mieg, J.,
Thierry-Mieg, D., Ashurst, J., Jia, L., Nakao, M.,
Thomas, M., Mulder, N., Karavidopoulou, Y., Jin,
L., Kim, S., Yasuda, T., Lenhard, B., Eveno, E.,
Suzuki, Y., Yamasaki, C., Takeda, J., Gough, C.,
Hilton, P., Fujii, Y., Sakai, H., Tanaka, S., Amid,
C., Bellgard, M., Bonaldo, Mde, F., Bono, H.,
Bromberg, S., Brookes, A., Bruford, E., Carninci,
P., Chelala, C., Couillault, C., De Souza, SJ.,
Debily, M., Devignes, M., Dubchak, I., Endo, T.,
Estreicher, A., Eyras, E., Fukami-Kobayashi, K.,
Gopinath, G.., Graudens, E., Hahn, Y., Han, M.,
Han, Z., Hanada, K., Hanaoka, H., Harada, E.,
Hashimoto, K., Hinz, U., Hirai, M., Hishiki, T.,
Hopkinson, I., Imbeaud, S., Inoko, H., Kanapin, A.,
Kaneko, Y., Kasukawa, T., Kelso, J., Kersey, P.,
Kikuno, R., Kimura, K., Korn, B., Kuryshev, V.,
Makalowska, I., Makino, T., Mano, S.,
Mariage-Samson, R., Mashima, J., Matsuda, H.,
Mewes, H., Minoshima, S., Nagai, K., Nagasaki, H.,
Nagata, N., Nigam, R., Ogasawara, O., Ohara, O.,
Ohtsubo, M., Okada, N., Okido, T., Oota, S., Ota,
M., Ota, T., Otsuki, T., Piatier-Tonneau, D.,
Poustka, A., Ren, S., Saitou, N., Sakai, K.,
Sakamoto, S., Sakate, R., Schupp, I., Servant, F.,
Sherry, S., Shiba, R., Shimizu, N., Shimoyama, M.,
Simpson, AJ., Soares, B., Steward, C., Suwa, M.,
Suzuki, M., Takahashi, A., Tamiya, G., Tanaka, H.,
Taylor, T., Terwilliger, J., Unneberg, P.,
Veeramachaneni, V., Watanabe, S., Wilming, L.,
Yasuda, N., Yoo, H., Stodolsky, M., Makalowski, W.,
Go, M., Nakai, K., Takagi, T., Kanehisa, M.,
Sakaki, Y., Quackenbush, J., Okazaki, Y.,
Hayashizaki, Y., Hide, W., Chakraborty, R.,
Nishikawa, K., Sugawara, H., Tateno, Y., Chen, Z.,
Oishi, M., Tonellato, P., Apweiler, R., Okubo, K.,
Wagner, L., Wiemann, S., Strausberg, R., Isogai,
T., Auffray, C., Nomura, N., Gojobori, T. and
Sugano, S. (2004). Integrative annotation of 21,037
human genes validated by full-length cDNA clones.
PLoS Biol., 2 (6), e162.
4. Vastermark, A., Shigemoto, Y., Abe, T. and
Sugawara, H. (2004). Splicing Profile-based Protein
Categorization between Human and Mouse Genome by
use of DDBJ Web Services. Genome Informatics
15, 13-20.
5. Abe, T., Kanaya, S., Kinouchi, M. and Ikemura,
T. (2004). Genome Informatics for Unveiling Hidden
Genome Signatures. Proceedings of the Institute of
Statistical Mathematics 52, 207-215.
6. Abe, T., Kanaya, S., Kinouchi, M., Kosaka, Y.
and Ikemura, T. (2004). Novel bioinformatics for
unveiling hidden characteristics in genome
sequences and searching in silico for genetic
signal sequences. Proceeding of The 8th World
Multi-Conference on Systemics, Cybernetics and
Informatics.
7. Abe, T., Ikemura, T., Kanaya, S., Kinouchi, M.
and Sugawara, H. (2004). Novel genome informatics
for unveiling hidden signatures in genome
sequences: self-organizing map (SOM) of
oligonucleotide frequencies. Proceedings of
Information-Based Induction Sciences, 94-99.
Books
8. Sugawara, H. (2004). Tsunami of data:
Data resources and utilization. Microbial Genetic
Resources and Biodiscovery. Kurtboke, I. and
Swings, J. ed., (National Library of Australia),
40-56.
Databases
9. Japanese Bio-portal site (Jabion),
http://www.bioportal.jp/
10. Genome Information Broker, http://gib.genes.nig.ac.jp/
11. WFCC-MIRCEN World Data Centre for
Microorganisms (WDCM), http://www.wdcm.org/
12. The portal site for pathogenic microorganisms,
http://www.wdcm.org/byogen/
13. e-Workbench for Biological Classification and
Identification, http://lilium.genes.nig.ac.jp/index_e.html
14. H-Invitational Database, http://www.h-invitational.jp/
ORAL
PRESENTATIONS
1. Sugawara, H., Culture collections face
challenges and opportunities, International
Symposium Towards a New Era's Microbial Resource
Center, Beijing, February, 2004.
2. Miyazaki, S., Sugawara, H., Exhaustive analysis
of microbial genomes by Web services and GRID
JST-BIR International Workshop"Integrated Databases
and DataGrid for Structural Biology and Molecular
Biology, Osaka, March, 2004.
3. Sugawara, H., Evolution of WFCC-MIRCEN World
Data Centre for Microorganisms (WDCM). ISBER US
Meeting 2004, New York City, May, 2004.
4. Sugawara, H., Gene Trek in Procaryote Space
powered by a GRID environment Proceedings of the
First International Workshop on Life Science Grid.
LSGRID2004, Kanazawa city, May, 2004.
5. Sugawara, H., The Butterfly Effect. JSCC Award
Lecture, Tsukuba, October, 2004.
6. Sugawara, H., WFCC-MIRCEN World Data Centre for
Microorganisms (WDCM) meets Global Biodiversity
Information Facility (GBIF). 19th International
CODATA Conference The Information Society: New
Horizons for Science, Berlin, November, 2004.
7. Kosuge, K., Okido, T., Hirahata, M., Shigemoto,
S., Miyazaki, S., Abe, T., Gojobori, T., Sugawara,
S., Development of a common protocol for the
prediction of microbial genes. Genome Informatics
Workshop, Yokohama, Decmber, 2004.
8.
菅原秀明、「国際連携と情報ネットワーク」第1回NITEバイオテクノロジーショップ「微生物資源センターを取り巻く最近の話題と今後の展開」、東京、2004年1月.
9.
阿部貴志、菅原秀明、池村淑道、「環境中に潜んでいる未開拓ゲノム資源を活用するためのバイオインフォマティクス」、国際バイオEXPO東京、2004年5月.
10.
菅原秀明、「微生物とデータ科学」、2004年度日本土壌微生物学会、筑波、2004年6月.
11.
阿部貴志、池村淑道、中川智、上月登喜男、木ノ内誠、金谷重彦、菅原秀明、「環境由来DNA配列に基づく培養困難な微生物群の系統推定のための新規な情報学的手法:自己組織化地図法(Self-Organizing
Map)」、第6回進化学会、東京、2004年8月.
12.
小林悟志、川本祥子、水田洋子、ムリアディ・ヘンドリィ、出宮スウェン・ミノル、岩間久和、竹崎直子、伊藤武彦、荒木次郎、吉成泰彦、北本朝展、五條堀孝、菅原秀明、宮崎智、武田英明、藤山秋佐夫「新世代バイオポータルの開発:Webサービスによる遺伝学の普及をめざして」日本遺伝学会第76回大会、吹田、2004年9月.
13.
阿部貴志、菅原秀明、金谷重彦、木ノ内誠、中川智、上月登喜男、池村淑道、「自己組織化地図法(SOM)を用いた環境中の難培養性微生物群由来のゲノムDNA断片配列の系統分類」、日本遺伝学会第76回大会、吹田、2004年9月.
14.
田中尚人、小菅武英、大城戸利久、平畠壮規、重元康昌、宮崎智、阿部貴志、菅原秀明「国際塩基配列データベース登録微生物ORFの統一的再評価」日本微生物系統分類研究会、伊東、2004年11月.
15.
水島洋、菅原秀明、嘉納時男、苙口隆重、「バイオデータベース相互運用性に向けてのSNPsデータ交換の標準化」、IPABシンポジウム2004、東京、2004年12月.
16.
阿部貴志、菅原秀明、木ノ内、金谷、池村淑道、「ゲノムに潜む未知のシグナル配列類を探索するための新規なゲノム情報学」、第27回日本分子生物学会、神戸、2004年12月.
17.
菅原秀明、「配列から見る生命科学―配列以外のフィーチャーは役に立つのか」第27回日本分子生物学会、神戸、2004年12月.
18.
菅原秀明、「Webサービスが加速するバイオ情報環境」、日本生物物理学会年会、京都、2004年12月.
EDUCATION
1. Dr. H. Sugawara was invited to give a lecture
on “Databases are the key to bioinformatics" at
the 2nd Open Symposium of Joint Research with
Wakayama Pref., Wakayama, 2004 (in Japanese).
2. Dr. H. Sugawara was invited to give a lecture on
“Invitation to information biology" at Campus
system Research Group of Private Universities,
Hamanako, August 2004 (in Japanese).
3. Dr. T. Abe was invited to give a lecture on
“Genome analysis by PC-cluster." at Working group
of scientific system, Tokyo, August, 2004 (in
Japanese).
学会活動
1. Dr. H. Sugawara organized the 2nd
International Conference on Biodata
Interoperability, Tokyo, June, 2004.
2. Dr. H. Sugawara organized International program
committee of the International Congress for Culture
Collections, Tsukuba, October, 2004.
3. Dr, H. Sugawara organized Program committee of
Genome Informatics 2004, Yokohama, December,
2004.
4. Task Force of Biological Resource Centers, OECD
Working Party for Biotechnology (Vice-chair).
5. World Federation for Culture Collections,
Executive board member and journal editor.
6. 極限微生物学会(評議員)
7. 日本情報知識学会(理事)
|