IPSJ-TS     Information Processing Society of Japan    Trial Standard

IPSJ-TS 0005:2002

Basic Subset of Coded Character Sets (BUCS)

Publication of the Version 1.0E(this version) 2002-03-29
Publication of the Version 1.xE --
Publication of the Version 1.yE --

Errata --
Please send comments about this document to TS desk, IPSJ/ITSCJ

Copyright ©2002 IPSJ/ITSCJ, All Right Reserved.

The normative version of the specification is the Japanese version found at the ITSCJ web.

Table of contents

1. Scope
2. Normative References
3. Definitions
4. Criterion
5. Listing of BUCS
6. Basic Subset
Annex 1. Bibliography


This document has been reviewed and endorsed by the IPSJ/ITSCJ technical committee. The authors of this document are the members of Working Group Five of IPSJ/ITSCJ TS Committee. They have developed this document being based on the activity (http://www.u-gakugei.ac.jp/~chinese) on Basic Unified Character Set taken by Tokyo Gakugei University.

1. Scope

This document specifies the Basic Subset of Coded Character Sets (BUCS), which consists of the Kanji characters with a high degree of functional importance. The characters have been extracted out of some National and Regional Standards/Specifications of coded character sets in accordance with a certain criterion and listed with their representative character shapes. BUCS is essentially a sub-repertoire of ISO/IEC 10646-1:2000.

2. Normative References

The following documents contain provisions which, through reference in this text, constitutes of this Trial Standard.

ISO/IEC 10646-1:2000 Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane

3. Definitions

3.1 degree of functional importance

Priority according to the frequency and applicability of a character in the language description environment.

3.2 representative character shape, alternative character shape

A representative character shape is the shape with which a character is most typically rendered. Other shapes of the character are referred to as alternative shapes. A representative character shape is sometimes depend on a country or region where the character is actually used.

4. Criterion

The Kanji characters of BUCS are defined in accordance with the following critera and procedures:

a) The 20,902 Kanji characters included in ISO/IEC 10646-1:1993 are the candidates. The Kanji characters in ISO/IEC 10646-1:2000 can be the candidates as well.

b) The Kanji characters used in Japan are extracted out of 6,355 Kanji of JIS X 0208:1997, considering their degrees of functional importance shown in the references a) through h) of Annex 1. Bibliography. Apparent wrong characters, e.g., 妛 and 袮, are removed. Wrong shapes are replaced with the corresponding correct shapes (if any) in JIS X 0221-1:2001.

c) The Kanji characters used in other countries than Japan are extracted out of the characters of ISO/IEC 10646-1:2000, referencing to h) of Annex 1. Bibliography.

5. Listing of BUCS

The BUCS contains 7,946 representative character shapes and 5,085 alternative character shapes. They are indicated with their corresponding UCS code positions.

For representative character shapes of traditional characters, the shapes of 康煕字典(Kangxi Dictionary) are employed. For representative character shapes of other characters, their country/region specific shapes are introduced.

The representative character shapes are ordered according to 康煕字典(Kangxi Dictionary) and assigned with their sequential numbers. For each representative character shape, some alternative character shapes are listed, in the order of China, Taiwan, Japan and Korea shapes, if any.

Those alternative character shapes have to have a variant relationship with each other. The relationship should be recognized by some references of Annex 1. Bibliography.

6. Basic Subset

The basic subset is shown in Table 6.1.N (N=1, 2, 3, 4, 5 or 6).

Table 6.1 The Basic Subset
Sequential numbers of
representative character shapes
Table(P) [PDF form,
with shape representation]
Table(H) [HTML form]
   0 through 1499 Table 6.1.1(P) Table 6.1.1(H)
1500 through 2999 Table 6.1.2(P) Table 6.1.2(H)
3000 through 4499 Table 6.1.3(P) Table 6.1.3(H)
4500 through 5999 Table 6.1.4(P) Table 6.1.4(H)
6000 through 7499 Table 6.1.5(P) Table 6.1.5(H)
7500 through 7945 Table 6.1.6(P) Table 6.1.6(H)

In the Table 6.1.N, the following source codes are employed with the meanings of Table 6.2.

Table 6.2 Source codes
Code Meaning
C character shape in China
J character shape in Japan
K character shape in Korea
T character shape in Taiwan
U character shape in a specific region

NOTE — Font Acknowledgement

The following fonts were used in the production of PDF version of Table 6.1:
Arial : AGFA Monotype Corporation
Batang : HangYang System
MingLiU : DynaLab Inc.
MS-Mincho, MS P Gothic : RICOH COMPANY, LTD.
RgHeiseiMin-W3 : RYOBI Limited. (JG2)
SimSun : ZHONGYI Electronic Co.

NOTE — Notice to Readers

The fonts and font data used in production of the tables may not be extracted or otherwise used in any commercial product without permission or license granted by the respective typeface owner(s).

Annex 1. Bibliography

NOTE — This Annex is not included in the normative texts of original TS 0005 (Japanese version).

a) 康煕字典 (同文書局版), 中華書局(北京), 1958
   Kangxi Dictionary (Tongwen shuju), Zhonghua Shuju(Beijing), 1958

b) 学研漢和大字典 (藤堂明保), 学習研究社, 1978
   Gakken Kanwa Daijiten (Todo Akiyasu), Gakushu kenkyusha, 1978

c) デイリーコンサイス漢字辞典 (佐竹秀雄, et al.), 三省堂, 1995
   Daily Concise Kanji Dictionary (Satake Hideo, et al.), Sanseido Publishing, 1995

d) 新華字典, 商務印書館(北京), 1980
   Xinhua Zidian, Commercial Press(Beijing), 1980

e) 大漢和辞典 (諸橋轍次), 大修館書店, 1960
   Daikanwa Jiten (Morohashi Tetsuji), Taishukan Shoten, 1960

f) 漢語大字典 (徐中舒), 四川辞書出版社・湖北辞書出版社(成都), 1986
   Hanyu Dazidian (Xu Zhongshu), Sichuan Cishu Chubanshe・Hubei Cishu Chubanshe(Chengdu), 1986

g) 漢語大詞典 (羅竹風), 上海辞書出版社(上海), 1994
   Hanyu Dacidian (Luo Zhufeng), Shanghai Cishu Chubanshe(Shanghai), 1994

h) 現代漢語通用字表, 中国国家語言文字工作委員会(北京), 1988
   Xiandai Hanyu Tongyongzibiao, Zhongguo Guojia Yuyan Wenzi Gongzuo Weiyuanhui(Beijing), 1988

i) ユニコード漢字情報辞典, 三省堂, 2000
   Unicode Kanji Information Dictionary, Sanseido Publishing, 2000