G. CENTER FOR GENETIC RESOURCE INFORMATION
G-a. Genetic Informatics Laboratory - Yukiko Yamazaki Group

RESEARCH ACTIVITIES

1 SHIGEN project
1.1 E coli Databases-E. coli strain DB and PEC (Profiling of Escherichia coli Chromosome) database.

Takehiro Yamakawa, Junichi Kato, Akiko Nishimura and Yukiko Yamazaki

--As a result of the National BioResource Project, a lot of new E. coli resources were accumulated. These resources include mutant strains established from research activities of individual researchers, Genomic DNA plasmid such as Cosmid collection and pLC plasmid, and deletion clones for each ORF (mobile plasmid clones and deletion mutants). E. coli strain database was rebuilt in order to compile these resources and allow users to request them through the internet directly from the list page after searching. Gene information for each ORF in the strain database was linked to the GenoBase (E. coli W3110 strain Genome Database) and to the PEC (Profiling of E. coli MG1655 chromosome). Since newly developed large scale chromosomal deletion mutants of E. coli that lack 2.4% to 29.7% of the parental chromosome (Ref.5) are available for use, “Essential Genes and Minimal Genome" section was added to PEC and a minimal genomic view and detailed information of each deletion mutants were also provided. By successfully reducing the genome size of E. coli, the number of genes which were classified as “unknown" decreased. Gene's essential information as well as huge amount of resource collections would become very useful to precede E. coli researches. PEC not only collects latest public information of MG1655 but also gives comparative genomics platform among microbe genomes to researchers.
--PEC database can be accessed through the SHIGEN server
(http://www.shigen.nig.ac.jp/ecoli/pec/).

1.2 CARD R-DB and IMSR (International Mouse Strain Resources)

Takehiro Yamakawa, Hideki Kato, Naomi Nakagata, Kenichi Yamamura and Yukiko Yamazaki

--CARD R-BASE is a database of transgenic mouse strains established and deposited by individual researchers. All resources are stored as frozen embryos and available on request. The database contains genetic backgrounds, genes destroyed/transferred, and relevant human diseases for each strain. In this year, we introduced new links from CARD R-BASE to the gene details page of the Mouse Genome Informatics (MGI). CARD-R-BASE became a member of International Mouse Strain Resources (IMSR) since a new version was implemented in July 2004. IMSR is a searchable online database of mouse strains and stocks available worldwide and maintained by Jackson Laboratory. Before merging data set of CARD R-BASE into IMSR, manual annotation on nomenclature of strain type, mutant type and gene/allele name was required. In order to encourage researchers, who deposit mouse strains, to get used to the controlled vocabulary of strain name, a new tool named “GODFATHER" was developed. GODFATHER is a simple naming tool with web-interface and it is divided into three levels depending on the level of user's knowledge. By filling necessary information such as background mouse name, gene name and laboratory code, the user can obtain a candidate name for the mouse strain. An English version of GODFATHER is also planned to be developed. CARD R-BASE also made bilateral linking to the Exchangeable Gene Trap Clones (EGTC) which provides trapped gene sequence information and its homology search results.
--The International Federation of Mouse Resource (IFMR) established in 2004 is a collaborating group of Mouse Repository and Resource Centers worldwide whose collective goal is to archive and provide strains of mice as cryopreserved embryos and gametes, ES cell lines, and live breeding stock to the research community. CARD R-BASE as well as RIKEN BRC (BioResource Center) are members of IFMR. The goals of the IFMR are (1) to coordinate repositories and resource centers, (2) to establish consistent, highest quality animal health standards in all resource centers, (3) to provide genetic verification and quality control of genetic background and mutations, and (4) to provide resource training to enhance user ability to utilize cryopreserved resources. IMSR database would be a good start of IFMR database.
--The CARD R-DB is accessible at
http://cardb.cc.kumamoto-u.ac.jp/transgenic/.

1.3 Mouse polymorphism database

Takehiro Yamakawa, Eri Kibukawa, Toshihiko Shiroishi and Yukiko Yamazaki

--Mouse microsatellite database (MMDBJ) provided SSLPs among different mouse strains and cSNPs between B6 and MSM mouses based on their full length cDNA sequence information. Since a huge amount of MSM BAC end sequences as well as B6/MSM gSNPs are now available for use, the name of the database was changed from MMDBJ to Mouse Polymorphism Database and all information was put together in a genomic viewer. We applied the Ensembl, which is freely provided by EBI, as a genome browser and added original data of MSM BAC ends, MSM/B6 cSNPs and microsatellite polymorphism as DAS (Distributed annotation source) and then customized the browser. Although keeping the Ensembl data up-to-date is not a simple task, adding original data as DAS sources to Ensembl instead of developing an original genomic browser from scratch was quicker to implement. All data are accessible through the internet now. Then we started developing our genomic browser (shigen genomic browser) to overcome several functional limitations of Ensembl.
--Mouse Polymorphism Database is released at

http://shigen.lab.nig.ac.jp/mouse/polymorphism/

1.4 Oryzabase and Plant Ontology

Takehiro Yamakawa, Nori Kurata and Yukiko Yamazaki

--Oryzabase is a comprehensive rice database to collect as much as public knowledge information and provide them with a user friendly interface to support rice research activities. As a member of the National BioResource Project, large amount of new rice resources were collected and ready to be distributed to researchers through the Oryzabase. We developed an online resource order system which allows users to send e-mails to order from each resource center directly from the resource list page. The system selects an appropriated Material Transfer Agreement (MTA) form depending on which resource center maintains the requested resource, and a reply mail is automatically sent back to senders as an attached file.
--Oryzabase started to provide genomic sequence information using shigen-genome-browse.
--POC (Plant Ontology Consortium) is a consortium that aims to develop, curate and share controlled vocabulary that describe plant growth stages and structures (anatomy). Japanese rice researchers independently started to clarify organ specific developmental stage and collected stage specifically expressed genes and their mutants from journal articles. Oryzabase integrated these data into the already existing genes and mutant collections. We have been developing multiple ontology platform in order to integrate several different ontologies and vocabularies consistently. Three dimensional matrixes consisting of “time" (development/growth), “space" (location/structure/anatomy), and “features" (characteristics) was conceptually build at the early developmental stage. Recently Zebrafish group (ZFIN) started to establish the PATO (Phenotype And Trait Ontology) to describe mutant phenotypes. PATO is described with a combination of terms and their attribute and has a more general concept than our “features" which means that most species can share this concept. Therefore, Oryzabase started to apply PATO and is replacing “feature" with PATO type expression. As a result of collecting all available information as much as possible and as soon as possible, the current Oryzabase have a lot of inconsistencies on genes, mutants and phenotypes. Our next task will be to resolve these problems.
--Oryzabase is available at
http://www.shigen. nig.ac.jp/rice/oryzabase/.

1.5 KOMUGI

Takehiro Yamakawa, Takashi Endo, Yasunari Ogihara, Hitoshi Tsujimoto, Taihachi Kawahara, Tetsuro Sasakuma and Yukiko Yamazaki

--KOMUGI is a wheat resource database and is unique as a wheat specific database in the world. Bioresources of KOMUGI contains wild strains; Landraces; experimental strains such as chromosome lines, cytoplasm substitution lines, and mutants; and DNA resources such as EST clones and array. The KOMUGI is also a member of the National BioResource Project and the e-mail resource order system has been implemented. Array data was newly incorporated into KOMUGI database and is used internally at the moment. The task of adding most recent information to KOMUGI gene dictionary remained although the “Macgene" system was developed for this purpose last year. Since MacGene is a stand alone type system instead of a web-based platform, there are still problems for dictionary keepers to share a common reference number when they work individually at different locations. Making cross linking between sequence accessions and KOMUGI genes is also a task that will be carried out next.
--KOMUGI database is available at
http://www.shigen.nig.ac.jp/wheat/komugi/.

1.6 NBRP-databases

--As an information center of the National BioResource Project, we intensively support the construction of resource databases of each resource center including the following species so that all resource centers could make their resources public.

1.6.1 NBRP-C. elegans

Miharu Ikizawa, Shohei Mitani, Hiroshi Kagoshima, and Yukiko Yamazaki

--NBRP-C. elegans provides deletion mutants on request after isolating targeted gene among mutant pool. Users can see the status of mutant screening on the web page and resource curator also manage the screening processes using a web-interface. This system also gives results of statistic analysis. The database has bilateral cross-linkings to WormBase, a worldwide famous C. elegans database. Promoter information is also part of this database and it is maintained by several volunteers. The common search site for resource database and promoter database are now available.

1.6.2 NBRP-Silkworm

Miharu Ikizawa, Yutaka Banno, Hiroshi Fujii, and Yukiko Yamazaki

--NBTP-Silkworm database consists of the following three parts, (1) gene dictionary, (2) references, and (3) strain phenotypes. Since the traditional classification of phenotypes and resources was not consistent enough to construct an electronic database, manual annotation was performed intensively. As there is no silkworm gene database in the world, this database is the first gene dictionary of silkworm and it will be very useful for scientific researches.
--We are planning to incorporate genomic information when it is opened to the public.

1.6.3 NBRP-Legume (Lotus and Glycine) database

Takehiro Yamakawa, Shoko Isobe, Masatsugu Hashiguchi, Ryo Akashi, Satoshi Tabata, and Yukiko yamazaki

--NBRP-LegumeBase is a resource ordering site shared by Lotus japonica resource database and Glycine max. resource database. Lotus japonica database provides wild accessions, root culture, Rebominant Inbred Lines as resources. Some strains were characterized in phenotype when growing in northern areas and southern areas. Although Glycine max database currently only contains wild species, Glycine soja, RIL as well as DNA resources such as EST, full length cDNA clones are expected to be added. Legume follows rice (monocot model plant) and arabidopsis (dicot model plant) as the third model plant.

1.6.4 NBRP-Medaka database

Takehiro Yamakawa, Yuko Wakamatsu, and Yukiko Yamazaki

--The NBRP Medaka site web server was moved from Nagoya University to National Institute of Genetics this year. Medaka Genomic Information has been incorporated into the construction of the Medaka resource database.
--The Medaka site also prepares the web-based protocols “Medaka Book" in collaboration with several Medaka researchers using PukiWiki software. The Medaka web-atlas is also among the attractive contents available and the brain part is now under construction. Interactive interface and glossary will be integrated into the atlas images and illustrations.
--Medaka group applied the Zebrafish PATO ontology to describe mutant phenotypes so that the database would be designed to handle the PATO data.

1.6.5 NBRP-Drosophila database

Takehiro Yamakawa, Masatoshi Tomaru, Masatoshi Yamamoto, Ryu Ueda, and Yukiko Yamazaki

--NBRP Drosophila consists of four organizations and there are 4 individual databases. In order to make these databases more useful without changing the current system, we developed an one-stop-shop site “Flystock" from where user can search resources for four databases at once and make order directly from the resource list page.
--We are planning to develop a maintenance system to manage each database and to make connections between Flystock and each database.

PUBLICATION

Papers
1. Mochida, K., Yamazaki, Y. and Ogihara, Y. (2004). Discrimination of homologous gene expression in hexaploid wheat by SNP analysis of contigs grouped from a large number of expressed sequence tags. Mol. Gen. Genomics. 270, 371-377.
2. Sakai, T., Miura, I., Yamada-Ishibashi, S., Akita, Y., Kohara, Y., Yamazaki, Y., Inoue, T., Kominami, R., Moriwaki, K., Shiroishi, T., Yonekawa, H. and Kikkawa, Y. (2004). Update of Mouse Microsatellite Database of Japan (MMDBJ). Exp. Anim. 53 (2), 151-154.
3. Kawai, Y., Ishii, Y., Arakawa, K., Uemura, K., Saitoh, B., Nishimura, J., Kitazawa, H., Yamazaki, Y., Tateno, Y., Itoh, T. and Saito, T. (2004). Structural and Functional Differences in Two Cyclic Bacteriocins with the Same Sequences Produced by Lactobacilli, Applied and Environmental Microbiology, 70 (5), 2906-2911.
4. Yamazaki, Y., Nagato, Y. and Kurata, N. (2004). Oryzabase (Integrated Rice Database) in 2004, Rice Genetics Newsletter, 20, 9-10.
5. Hashimoto, M., Ichimura, T., Mozoguchi, H., Tanaka, K., Fujimitsu, K., Keyamura, K., Ote, T., Yamakawa, T., Yamazaki, Y., Mori, H., Katayama, T. and Kato, J. (2005). Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome. Mol. Microbiol. 55 (1), 137-149.
6. Moriguchi, K., Suzuki, T., Ito, Y., Yamazaki, Y., Niwa, Y. and Kurata, N. (2005). Functional Isolation of Novel Nuclear Proteins Showing a Variety of Subnuclear Localizations. The Plant Cell 17, 389-403.

Reviews/Books
7. 荒  武,山崎由紀子(2004)「大腸菌ゲノムデータベース」ゲノミクス・プロテオミクスの新展開,エヌ・ティ・エヌ,15-21.
8. 山崎由紀子(2004)「バイオリソースセンター」蛋白質核酸酵素Vol.49 No.11, 1956-1963.
9. 山崎由紀子(2004)「モデル動物の作製と維持」別冊,1-17.

Database
10. JMSR http://www.shigen.nig.ac.jp/mouse/jmsr/
11. Mouse Polymorphism DB http://www.shigen. nig.ac.jp/mouse/mmdbj/
12. CARD R-BASE http://cardb.cc.kumamoto-u.ac.jp/transgenic/
13. FlyStock http://218.44.182.89/%7Eflystock/html/indexU2886j.html
14. NigFly http://www.shigen.nig.ac.jp/fly/nigfly/
15. Oryzabase http://www.shigen.nig.ac.jp/rice/oryzabase/
16. KOMUGI http://www.shigen.nig.ac.jp/wheat/komugi/
17. Barley Germplasm Database http://www.shigen.nig.ac.jp/barley/
18. PEC http://www.shigen.nig.ac.jp/ecoli/pec/
19. E.coli Strain Database http://www.shigen.nig.ac.jp/ecoli/strain/
20. GRW http://www.shigen.nig.ac.jp/shigen/grw/
21. WGR http://www.shigen.nig.ac.jp/shigen/wgr/
22. NBRP-ChrysanthemumDatabase http://www. shigen.nig.ac.jp/kiku/
23. NBRP-Algae DB http://www.shigen.nig.ac.jp/algae/
24. NBRP-Silkworm http://shigen.lab.nig.ac.jp/silkwormbase/
25. NBRP-Legume http://shigen.lab.nig.ac.jp/legume/legumebase/
26. NBRP-C.elegans http://www.shigen.nig.ac.jp/cele/
27. NBRP http://www.nbrp.jp/

ORAL PRESENTATIONS

1. Sato, K. Yamazaki, Y. and Takeda, K. Comparative sequence analysis of barley ESTs and rice genome. Plant and Animal Genome XII, San Diego, January, 2004.
2. Yamazaki, Y. Integrating a new platform for rice ontology in Oryzabase (Integrated Rice Science Database). The 12nd Rice Genome Workshop 2004, Tsukuba, February, 2004.
3. 山崎由紀子.SHIGENプロジェクトにおけるオントロジーの試み、CARDシンポジウム、熊本、2月、2004
4. 山崎由紀子、わが国における「生物資源データベース」の現状と将来、教育セミナーフォーラム、東京、3月、2004
5. Yamazaki Y. (2004) .SHIGEN (SHared Information of GENetic resources) Project in Japan. Tomato: a New Model Plant in the Genomics Era, Kazusa, March, 2004
6. 山崎由紀子、生物遺伝資源情報ネットワーキングの実現に向けて、NBRPシンポジウム、神戸、12月、2004
7. Yamazaki, Y. Oryzabase (Integrated Rice Science Database), Rice Annotation Project Meeting, December, Tsukuba, 2004

POSTER PRESENTATIONS

1. Yamazaki, Y. KOMUGI-Integrated Wheat Science Database-, Plant and Animal Genome XII, San Diego, January, 2004.
2. Yamakawa, T. Oryzabase. Plant and Animal Genome XII, San Diego, January, 2004.
Sato, K. Plant and Animal Genome XII, San Diego, January, 2004.
3. Mochida, K. Kawaura, K. Nemoto, Y. Murai, K. Yamazaki, Y. Shin-i, T. Kohara, Y. and Ogihara, Y. Global characterization of gene expression patterns of stress-treated tissues in common wheat by large-scale analysis of expressed sequence tags. Plant and Animal Genome XII, San Diego, January, 2004.
4. 川浦香奈子、持田恵一、山崎由紀子、荻原保成(2004)コムギのゲノム科学XXII.パンコムギにおける22kオリゴマイクロアレイの作製および塩ストレスに応答する発現遺伝子の網羅的解析、第27回日本分子生物学会年会、神戸、12月

EDUCATION

京大併任助教授
教育セミナーフォーラム3月
熊本大学客員教授

SOCIAL CONTRIBUTIONS AND OTHERS

学術会議遺伝資源研究連絡委員
運営委員:ミヤコグサ,ES細胞,メダカ
評価委員:生物資源研究所BRC
検討委員:理研BRC