前言
CSDN地址:http://blog.csdn.net/game3108/article/details/52957669
最近在看蘋果官方swift文檔《The Swift Programming Language》秸滴,記錄一些筆記。
Extended Grapheme Clusters
swift使用了Extended Grapheme Clusters作為Character的顯示。內(nèi)容如下:
Every instance of Swift’s Character
type represents a single extended grapheme cluster. An extended grapheme cluster is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character.
具體的定義可以參考unicode的標(biāo)準(zhǔn)文檔Grapheme Cluster Boundaries:
Unicode標(biāo)準(zhǔn)提供了算法去定義grapheme cluster boundaries暴凑,其中包括兩種變種:legacy grapheme clusters and extended grapheme clusters叹谁。
A legacy grapheme cluster is defined as a base (such as A or カ) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a “stack”.
An extended grapheme cluster is the same as a legacy grapheme cluster, with the addition of some other characters. The continuing characters are extended to include all spacing combining marks, such as the spacing (but dependent) vowel signs in Indic scripts.
具體的計(jì)算方式可以從文章中進(jìn)行詳細(xì)的了解。
舉例
拿一個(gè)Apple文檔里的例子:
let precomposed: Character = "\u{D55C}" // ?
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}" // ?, ?, ?
// precomposed is ?, decomposed is ???
韓文的音節(jié)可以拆分和組合,上面的兩個(gè)String就是相同的String爱态。
因?yàn)檫@種編碼方式的問題感憾,Swift想取一個(gè)String的字符個(gè)數(shù)蜡励,需要使用"".characters.count
的方式,獲取character阻桅,再獲取chara的個(gè)數(shù)凉倚。
Swift這邊的String用的是21bit Unicode scalar字符編碼方式(相當(dāng)于UTF-32),而OC中的NSString用的是UTF-16字符編碼方式嫂沉。
所以對同一個(gè)String稽寒,轉(zhuǎn)化為NSString,可能獲得的長度方式也不同:
var str = "Hello ??" // the square is an emoji
str.characters.count // returns 7
(str as NSString).length // returns 8
就是現(xiàn)在所見非所得了趟章,所以在處理swift string與nsstring轉(zhuǎn)化時(shí)杏糙,要注意一下unicode的編碼和長度問題慎王。
參考資料
1.The Swift Programming Language
2.Why is Swift counting this Grapheme Cluster as two characters instead of one?
3.Grapheme Cluster Boundaries