一個語言標記(language tag)是由一個或多個“子標記”(subtag)序列組成的羹与。而子標記就是由字母和數(shù)字字符序列構成的,在一個語言標記中,子標記由連字符(-
葱绒,[Unicode] U+002D)隔開绍在。子標記有多種類型门扇,它們是通過長度,在語言標記中所處的位置偿渡,以及內容來區(qū)分的臼寄。
語言標記句法是由 ABNF [RFC5234] 描述的,接下來我將逐步解釋一下語言標記句法的意思:
Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)
extension = singleton 1*("-" (2*8alphanum))
; Single alphanumerics
; "x" reserved for private use
singleton = DIGIT ; 0 - 9
/ %x41-57 ; A - W
/ %x59-5A ; Y - Z
/ %x61-77 ; a - w
/ %x79-7A ; y - z
privateuse = "x" 1*("-" (1*8alphanum))
grandfathered = irregular ; non-redundant tags registered
/ regular ; during the RFC 3066 era
irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn" ; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE"
regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang"
alphanum = (ALPHA / DIGIT) ; letters and numbers
Language-Tag
Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags
Language-Tag
是由 langtag
溜宽、privateuse
或 grandfathered
三者中的某一個表示的吉拳,其中 langtag
表示標準的語言標記,privateuse
表示私有用途的標記适揉,grandfathered
表示祖父級標記(永遠不會改變)合武。
langtag
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
langtag
(語言標記)是由一個 language
,0 或 1 個 "-" script
涡扼,0 或 1 個 "-" region
稼跳,0 或無數(shù)個 "-" variant
,0 或無數(shù)個 "-" extension
吃沪,0 或 1 個 "-" privateuse
構成汤善。
language
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
language
(語言)是由 2*3ALPHA ["-" extlang]
、4ALPHA
或 5*8ALPHA
表示票彪,第一個是最短的 ISO 639 編碼红淡,有時后面會緊跟擴展的語言子標記;第二個預留給未來使用降铸;第三個是已注冊的語言子標記在旱。
script
script = 4ALPHA ; ISO 15924 code
script
(字母系統(tǒng))是由 4 個字母序列表示,值為 ISO 15924 標準中列出的編碼推掸。
region
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code
region
(區(qū)域)是由 2 個字母序列(ISO 3166-1 編碼)或 3 個數(shù)字序列(UN M.49 編碼)表示桶蝎。
variant
variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)
variant
(變體)是由 5 到 8 個字母數(shù)字序列或 1 個數(shù)字及 3 個字母或數(shù)字序列表示驻仅,這些變體都是經過注冊的。
extension
extension = singleton 1*("-" (2*8alphanum))
; Single alphanumerics
; "x" reserved for private use
singleton = DIGIT ; 0 - 9
/ %x41-57 ; A - W
/ %x59-5A ; Y - Z
/ %x61-77 ; a - w
/ %x79-7A ; y - z
extension
(擴展)是由一個數(shù)字或字母(其中 "x" 字母預留給 privateuse
)以及 1 個或多個 "-" (2*8alphanum)
表示登渣。
privateuse
privateuse = "x" 1*("-" (1*8alphanum))
privateuse
開頭由 "x" 標記噪服,后面追加一個或多個 "-" (1*8alphanum)
,其中 (1*8alphanum)
表示 1 到 8 個字母數(shù)字組成的序列胜茧。
grandfathered
grandfathered = irregular ; non-redundant tags registered
/ regular ; during the RFC 3066 era
grandfathered
是由 irregular
或 regular
表示粘优。它們都是在 RFC 3066 期間注冊的非重復標記。
irregular
irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn" ; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE"
irregular
(不規(guī)則標記)并不匹配 langtag
的標記規(guī)則呻顽,也因此不被認為是“格式良好”的標記雹顺,這些標記都是合法的,但是為了支持更現(xiàn)代的子標記或子標記組合廊遍,大多數(shù)都已被棄用嬉愧。
regular
regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang"
regular
(規(guī)則標記)能夠匹配 langtag
的標記規(guī)則,但是它們的子標記并不是擴展語言或者是變體子標記:它們的含義是由它們的注冊所定義昧碉,并且為了支持更現(xiàn)代的子標記或子標記序列英染,它們都已經被廢棄了。