RFC1468 日本語訳

1468 Japanese Character Encoding for Internet Messages. J. Murai, M.Crispin, E. van der Poel. June 1993. (Format: TXT=10970 bytes) (Status: INFORMATIONAL)

RFC一覧
 英語原文
|Network Working Group                                           J. Murai
|Request for Comments: 1468                               Keio University
|                                                              M. Crispin
|                                                       Panda Programming
|                                                         E. van der Poel
|                                                               June 1993
|
|            Japanese Character Encoding for Internet Messages

           インターネットメッセージのための日本語表記法

| Status of this Memo
|
|    This memo provides information for the Internet community.  It does
|    not specify an Internet standard.  Distribution of this memo is
|    unlimited.

このメモの位置付け

  このメモはインターネットコミュニティのための広報である．これは
  なんらインターネット標準を定めるものではない．このメモの配布に
  制限は設けない．

| Introduction
|
|    This document describes the encoding used in electronic mail [RFC822]
|    and network news [RFC1036] messages in several Japanese networks. It
|    was first specified by and used in JUNET [JUNET]. The encoding is now
|    also widely used in Japanese IP communities.
|
|    The name given to this encoding is "ISO-2022-JP", which is intended
|    to be used in the "charset" parameter field of MIME headers (see
|    [MIME1] and [MIME2]).

始めに

  この文書は，日本のネットワークにおける電子メール[RFC822]とネッ
  トニュース[RFC1036]の文字表記方法について述べる．これは最初，
  JUNET[JUNET]で定められ，使用されていた．この表記法は，今では広
  く日本のIPコミュニティで使用されている．

  この表記法に "ISO-2022-JP" という名前を与え，MIME ヘッダの
  "charset" パラメタで使用されることを意図する([MIME1]及び[MIME2]
  を見よ)．

| Description
|    The text starts in ASCII [ASCII], and switches to Japanese characters
|    through an escape sequence. For example, the escape sequence ESC $ B
|    (three bytes, hexadecimal values: 1B 24 42) indicates that the bytes
|    following this escape sequence are Japanese characters, which are
|    encoded in two bytes each.  To switch back to ASCII, the escape
|    sequence ESC ( B is used.
|
|    The following table gives the escape sequences and the character sets
|    used in ISO-2022-JP messages. The ISOREG number is the registration
|    number in ISO's registry [ISOREG].
|
|    Note that JIS X 0208 was called JIS C 6226 until the name was changed
|    on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201.
|
|    The "Roman" character set of JIS X 0201 [JISX0201] is identical to
|    ASCII except for backslash () and tilde (~). The backslash is
|    replaced by the Yen sign, and the tilde is replaced by overline. This
|    set is Japan's national variant of ISO 646 [ISO646].
|
|    The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
|    Katakana and some other symbols and characters. Each character takes
|    up two bytes.
|
|    For further details about the JIS Japanese national character set
|    standards, refer to [JISX0201] and [JISX0208].  For further
|    information about the escape sequences, see [ISO2022] and [ISOREG].
|
|    If there are JIS X 0208 characters on a line, there must be a switch
|    to ASCII or to the "Roman" set of JIS X 0201 before the end of the
|    line (i.e., before the CRLF). This means that the next line starts in
|    the character set that was switched to before the end of the previous
|    line.
|
|    Also, the text must end in ASCII.
|
|    Other restrictions are given in the Formal Syntax below.

記述

  テキストは，ASCII[ASCII]で始まり，エスケープシーケンスにより日
  本語文字に切り替える．例えば，エスケープシーケンス  '$' 'B'
  (3バイト，16進表記で 1B 24 42) 以降の文字は日本語文字となり，
  それらは２バイト毎に表される．ASCII に切り替えたいときには，エ
  スケープシーケンス  '(' 'B' を使う．

  この表は，ISO-2022-JP メッセージで使うエスケープシーケンス
  (Esc Seq)と文字セットを示す．ISOREG 番号は ISO の国際登録簿の
  登録番号[ISOREG]である．

       Esc Seq    文字セット                     ISOREG

       ESC ( B    ASCII                             6
       ESC ( J    JIS X 0201-1976 (ローマ字セット) 14
       ESC $ @    JIS X 0208-1978                  42
       ESC $ B    JIS X 0208-1983                  87

  注： JIS X 0208 は，1987年3月1日に名前が変わるまで JIS C 6226
  と呼ばれていた．同様に，JIS C 6220 は JIX X 0201 に名前を
  変えた．

  JIS X 0201 のローマ字セットは，ASCII のバックスラッシュとチル
  ダを置き換えた以外まったく同一のものである．バックスラッシュは
  円記号に，チルダはオーバーラインに置き換えられている．このセッ
  トは，日本で使われている ISO 646 [ISO646]の変種である．

  JIS X 0208[JISX0208]文字セットは，漢字・平仮名・片仮名及び，い
  くつかの記号と文字からなっている．どの文字も２バイト毎で表記す
  る．

  JIS の標準日本語文字セットについて詳しく知りたいなら，[JISX0201]
  と[JISX0208]を参照すること．エスケープシーケンスについて詳しく
  知りたいなら，[ISO2022]と[ISOREG]を見ること．

  もし，行中に JIS X 0208 文字があるならば，行末までに ASCII ま
  たは JIS X 0201 ローマ字に切り替えなくてはならない（すなわち，
  の前までに）．これは，次の行が前の行の終わる前に切り替
  えられた文字セットで始まることを意味する．

  また，テキストは ASCII で終わらなければならない．

  その外の制限事項は，「正式な文法」の節以降で示す．

| Formal Syntax
|
|    The notational conventions used here are identical to those used in
|    RFC 822 [RFC822].
|
|    The * (asterisk) convention is as follows:
|
|       l*m something
|
|    meaning at least l and at most m somethings, with l and m taking
|    default values of 0 and infinity, respectively.
|   message             = headers 1*( CRLF *single-byte-char *segment
|                         single-byte-seq *single-byte-char )
|                                            ; see also [MIME1] "body-part"
|                                            ; note: must end in ASCII
|
|    headers             = 
|
|   segment             = single-byte-segment / double-byte-segment
|
|   single-byte-segment = single-byte-seq 1*single-byte-char
|
|   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
|
|   single-byte-seq     = ESC "(" ( "B" / "J" )
|
|   double-byte-seq     = ESC "$" ( "@" / "B" )
|
|   CRLF                = CR LF
|
|                                                     ; ( Octal, Decimal.)
|
|   ESC                 =      ; (    33,      27.)
|
|   SI                  =     ; (    17,      15.)
|
|   SO                  =    ; (    16,      14.)
|
|   CR                  = ; (    15,      13.)
|
|   LF                  =        ; (    12,      10.)
|
|   one-of-94           =      ; (41-176, 33.-126.)
|
|   7BIT                =           ; ( 0-177,  0.-127.)
|
|   single-byte-char    = 

正式な文法

  ここで使用する記法は，RFC 822 [RFC822] で使われているものと同
  一である．

  *（アスタリスク）の規則は，以下の通り：

       l*m something

  somethings が l 以上 m 以下繰り返すことを意味する，またデフォ
  ルト値はそれぞれ 0 と 無限大である．

   message             = headers 1*( CRLF *single-byte-char *segment
                         single-byte-seq *single-byte-char )
                                  ; [MIME1]の"body-part"も見よ
                                  ; 注：ASCII で終わらなくてはならない

   headers             = <[RFC822]の"fields"と[MIME1]の"body-part"を見よ>

   segment             = single-byte-segment / double-byte-segment

   single-byte-segment = single-byte-seq 1*single-byte-char

   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

   single-byte-seq     = ESC "(" ( "B" / "J" )

   double-byte-seq     = ESC "$" ( "@" / "B" )

   CRLF                = CR LF

                                                    ; ( 8進数,  10進数 )

   ESC                 =      ; (    33,      27.)

   SI                  =     ; (    17,      15.)

   SO                  =    ; (    16,      14.)

   CR                  = ; (    15,      13.)

   LF                  =        ; (    12,      10.)

   one-of-94           =      ; (41-176, 33.-126.)

   7BIT                =           ; ( 0-177,  0.-127.)

   single-byte-char    = 

| MIME Considerations
|
|    The name given to the JUNET character encoding is "ISO-2022-JP". This
|    name is intended to be used in MIME messages as follows:
|
|       Content-Type: text/plain; charset=iso-2022-jp
|
|    The ISO-2022-JP encoding is already in 7-bit form, so it is not
|    necessary to use a Content-Transfer-Encoding header. It should be
|    noted that applying the Base64 or Quoted-Printable encoding will
|    render the message unreadable in current JUNET software.
|
|    ISO-2022-JP may also be used in MIME Part 2 headers.  The "B"
|    encoding should be used with ISO-2022-JP text.

MIME についての考察

  JUNET 文字表記法を "ISO-2022-JP" と名付ける．この名前は MIME
  メッセージにおいて，以下のように使うことを意図する．

       Content-Type: text/plain; charset=iso-2022-jp

  ISO-2022-JP 表記法は既に７単位符号になっているので，
  Content-Transfer-Encoding ヘッダは必要ない．現在の JUNET のソ
  フトウェアは，Base64 や Quoted-Printable 形式を読むことができ
  ないことを注意すべきだ．

  ISO-2022-JP は MIME パート２のヘッダでも使われるだろう．
  ISO-2022-JP には，"B" encoding を使うべきだろう．

| Background Information
|
|    The JUNET encoding was described in the JUNET User's Guide [JUNET]
|    (JUNET Riyou No Tebiki Dai Ippan).
|
|    The encoding is based on the particular usage of ISO 2022 announced
|    by 4/1 (see [ISO2022] for details). However, the escape sequence
|    normally used for this announcement is not included in ISO-2022-JP
|    messages.
|
|    The Kana set of JIS X 0201 is not used in ISO-2022-JP messages.
|
|    In the past, some systems erroneously used the escape sequence ESC (
|    H in JUNET messages. This escape sequence is officially registered
|    for a Swedish character set [ISOREG], and should not be used in ISO-
|    2022-JP messages.
|
|    Some systems do not distinguish between ESC ( B and ESC ( J or
|    between ESC $ @ and ESC $ B for display. However, when relaying a
|    message to another system, the escape sequences must not be altered
|    in any way.
|
|    The human user (not implementor) should try to keep lines within 80
|    display columns, or, preferably, within 75 (or so) columns, to allow
|    insertion of ">" at the beginning of each line in excerpts. Each JIS
|    X 0208 character takes up two columns, and the escape sequences do
|    not take up any columns. The implementor is reminded that JIS X 0208
|    characters take up two bytes and should not be split in the middle to
|    break lines for displaying, etc.
|
|    The JIS X 0208 standard was revised in 1990, to add two characters at
|    the end of the table. Although ISO 2022 specifies special additional
|    escape sequences to indicate the use of revised character sets, it is
|    suggested here not to make use of this special escape sequence in
|    ISO-2022-JP text, even if the two characters added to JIS X 0208 in
|    1990 are used.
|
|    For further information about Japanese character encodings such as PC
|    codes, FTP locations of implementations, etc, see "Electronic
|    Handling of Japanese Text" [JPN.INF].

背景

  JUNET 表記法は JUNET ユーザーズガイド(参考文献[JUNET]参照)で述
  べられている(JUNET 利用の手引き第１版)．

  この表記法は ISO 2022 の使い方 (詳細は[ISO2022]を見よ)の一部を
  基にしている．

  JIS X 0201 カナ文字は ISO-2022-JP メッセージでは使用しない．

  昔，いくつかのシステムは JUNET メッセージの中でエスケープシー
  ケンス  '(' 'H' を誤って使用していた．このエスケープシー
  ケンスは，スウェーデン用文字セット[ISOREG]として正式に登録され
  ており，ISO-2022-JP メッセージでは使用すべきでない．

  いくつかのシステムは， '(' 'B' と  '(' 'J' を，また
   '$' '@' と  '$' 'B' を区別せずに表示する．しかしな
  がら，もう１つのシステムへメッセージを中継する時には，エスケー
  プシーケンスを変更してはならない．

  ユーザ(実装者のことではない)は１行を80カラム以内に，できること
  なら，75カラム以内にすべきだ，引用するときに各行の先頭に ">"
  を挿入できるようにするために．全ての JIS X 0208 文字は２カラム
  使用し，エスケープシーケンスはカラムを使わない．実装者は，
  JIS X 0208 文字が２バイト使うことを思い出せ，また表示などのた
  めに，その間で行を中断してはならないだろう．

  JIS X 0208 規格は 1990 年に改定され，テーブルの最後に２文字追
  加された．けれども，ISO 2022 で定めた，改定された文字セットを
  示す特別付加エスケープシーケンスを，ISO-2022-JP テキストでは使
  用しないように提案する，たとえ JIS X 0208-1990 で追加された2文
  字を使うとしても．

  PC の日本語文字文字コードや実装例のある FTP ロケーションなどに
  ついて詳しく知りたい場合は，"Electronic Handling of Japanese Text"
  [JPN.INF] を見よ．

| References

参考文献

   [ASCII] American National Standards Institute, "Coded character set
   -- 7-bit American national standard code for information
   interchange", ANSI X3.4-1986.

   [ISO646] International Organization for Standardization (ISO),
   "Information technology -- ISO 7-bit coded character set for
   information interchange", International Standard, Ref. No. ISO/IEC
   646:1991.

   [ISO2022] International Organization for Standardization (ISO),
   "Information processing -- ISO 7-bit and 8-bit coded character sets
   -- Code extension techniques", International Standard, Ref. No. ISO
   2022-1986 (E).

   [ISOREG] International Organization for Standardization (ISO),
   "International Register of Coded Character Sets To Be Used With
   Escape Sequences".

   [JISX0201] Japanese Standards Association, "Code for Information
   Interchange", JIS X 0201-1976.

   [JISX0208] Japanese Standards Association, "Code of the Japanese
   graphic character set for information interchange", JIS X 0208-1978,
   -1983 and -1990.

   [JPN.INF] Ken R. Lunde , "Electronic Handling of
   Japanese Text", March 1992,
   msi.umn.edu(128.101.24.1):pub/lunde/japan[123].inf

   [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
   Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
   User's Guide (First Edition)"), February 1988.

   [MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose
   Internet Mail Extensions): Mechanisms for Specifying and
   Describing the Format of Internet Message Bodies", RFC 1341,
   Bellcore, Innosoft, June 1992.

   [MIME2] Moore, K., "Representation of Non-ASCII Text in Internet
   Message Headers", RFC 1342, University of Tennessee, June 1992.

   [RFC822] Crocker, D., "Standard for the Format of ARPA Internet
   Text Messages", STD 11, RFC 822, UDEL, August 1982.

   [RFC1036] Horton M., and R. Adams, "Standard for Interchange of USENET
   Messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic
   Studies, December 1987.

| Acknowledgements
|
|    Many people assisted in drafting this document. The authors wish to
|    thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
|    Handa.

謝辞

  この文書を書くために，多くの人から協力してもらった．著者は，
  Akira Kato ，Masahiro Sekiguchi そして Ken'ichi Handa に格別の
  感謝をしたい．

| Security Considerations
|
|    Security issues are not discussed in this memo.

セキュリティに関する考察

  このメモでは，セキュリティについて述べていない．

| Authors' Addresses

著者の連絡先

   Jun Murai
   Keio University
   5322 Endo, Fujisawa
   Kanagawa 252 Japan

   Fax: +81 466 49 1101
   EMail: jun@wide.ad.jp


   Mark Crispin
   Panda Programming
   6158 Lariat Loop NE
   Bainbridge Island, WA 98110-2098
   USA

   Phone: +1 206 842 2385
   EMail: MRC@PANDA.COM


   Erik M. van der Poel
   A-105 Park Avenue
   4-4-10 Ohta, Kisarazu
   Chiba 292 Japan

   Phone: +81 438 22 5836
   Fax:   +81 438 22 5837
   EMail: erik@poel.juice.or.jp
一覧

RFC 1～100	RFC 1401～1500	RFC 2801～2900	RFC 4201～4300
RFC 101～200	RFC 1501～1600	RFC 2901～3000	RFC 4301～4400
RFC 201～300	RFC 1601～1700	RFC 3001～3100	RFC 4401～4500
RFC 301～400	RFC 1701～1800	RFC 3101～3200	RFC 4501～4600
RFC 401～500	RFC 1801～1900	RFC 3201～3300	RFC 4601～4700
RFC 501～600	RFC 1901～2000	RFC 3301～3400	RFC 4701～4800
RFC 601～700	RFC 2001～2100	RFC 3401～3500	RFC 4801～4900
RFC 701～800	RFC 2101～2200	RFC 3501～3600	RFC 4901～5000
RFC 801～900	RFC 2201～2300	RFC 3601～3700	RFC 5001～5100
RFC 901～1000	RFC 2301～2400	RFC 3701～3800	RFC 5101～5200
RFC 1001～1100	RFC 2401～2500	RFC 3801～3900	RFC 5201～5300
RFC 1101～1200	RFC 2501～2600	RFC 3901～4000	RFC 5301～5400
RFC 1201～1300	RFC 2601～2700	RFC 4001～4100	RFC 5401～5500
RFC 1301～1400	RFC 2701～2800	RFC 4101～4200
RFC1468 日本語訳

一覧

リンク

メニュー

コメント

お問い合わせ

プライバシーポリシー