This article describes how word and paragraph boundaries are defined, how line breaks are represented, and how you can separate a string by paragraph.
- 本文介紹如何定義單詞和段落邊界书蚪,如何表示換行符庶诡,以及如何按段落分隔字符串。
Word Boundaries
The text system determines word boundaries in a language-specific manner according to Unicode Standard Annex #29 with additional customization for locale as described in that document. On OS X, Cocoa presents APIs related to word boundaries, such as the NSAttributedString
methods doubleClickAtIndex: and nextWordFromIndex:forward:, but you cannot modify the way the word-boundary algorithms themselves work.
- 文本系統(tǒng)根據(jù)Unicode標(biāo)準(zhǔn)附件#29以特定語言的方式確定字邊界,并對(duì)該文檔中描述的區(qū)域設(shè)置進(jìn)行額外定制朝刊。 在OS X上,Cocoa提供了與字邊界相關(guān)的API,例如NSAttributedString方法doubleClickAtIndex:和nextWordFromIndex:forward:,但是您無法修改字邊界算法本身的工作方式坯门。
Line and Paragraph Separator Characters
- 線和段落分隔符
There are a number of ways in which a line or paragraph break can be represented. Historically, \n
, \r
, and \r\n
have been used. Unicode defines an unambiguous paragraph separator, U+2029
(for which Cocoa provides the constant NSParagraphSeparatorCharacter
), and an unambiguous line separator, U+2028
(for which Cocoa provides the constant NSLineSeparatorCharacter
).
- 可以通過多種方式表示行或段落。 從歷史上看逗扒,已使用\ n古戴,\ r和\ r \ n。 Unicode定義了一個(gè)明確的段落分隔符矩肩,U + 2029(Cocoa為其提供常量NSParagraphSeparatorCharacter)现恼,以及一個(gè)明確的行分隔符U + 2028(Cocoa為其提供常量NSLineSeparatorCharacter)。
- 在Cocoa文本系統(tǒng)中黍檩,NSParagraphSeparatorCharacter被一致地視為段落述暂,并且NSLineSeparatorCharacter被一致地視為不是段落中斷的換行符,即段落中的換行符建炫。 但是,在其他情況下疼蛾,幾乎不能保證如何處理這些字符肛跌。 例如,POSIX級(jí)軟件通常只識(shí)別\ n作為中斷察郁。 某些較舊的Macintosh軟件僅識(shí)別\ r \ n衍慎,某些Windows軟件僅識(shí)別\ r \ n。 通常皮钠,行和段落之間沒有區(qū)別稳捆。
In the Cocoa text system, the NSParagraphSeparatorCharacter
is treated consistently as a paragraph break, and NSLineSeparatorCharacter
is treated consistently as a line break that is not a paragraph break—that is, a line break within a paragraph. However, in other contexts, there are few guarantees as to how these characters will be treated. POSIX-level software, for example, often recognizes only \n
as a break. Some older Macintosh software recognizes only \r
, and some Windows software recognizes only \r\n
. Often there is no distinction between line and paragraph breaks.
- 在Cocoa文本系統(tǒng)中,NSParagraphSeparatorCharacter被一致地視為段落麦轰,并且NSLineSeparatorCharacter被一致地視為不是段落中斷的換行符乔夯,即段落中的換行符。 但是款侵,在其他情況下末荐,幾乎不能保證如何處理這些字符。 例如新锈,POSIX級(jí)軟件通常只識(shí)別\ n作為中斷甲脏。 某些較舊的Macintosh軟件僅識(shí)別\ r \ n,某些Windows軟件僅識(shí)別\ r \ n。 通常块请,行和段落之間沒有區(qū)別娜氏。
Which line or paragraph break character you should use depends on how your data may be used and on what platforms. The Cocoa text system recognizes \n
, \r
, or \r\n
all as paragraph breaks—equivalent to NSParagraphSeparatorCharacter
. When it inserts paragraph breaks, for example with insertNewline:
, it uses \n
. Ordinarily NSLineSeparatorCharacter
is used only for breaks that are specifically line breaks and not paragraph breaks, for example in insertLineBreak:
, or for representing HTML <br>
elements.
- 您應(yīng)該使用哪個(gè)行或段落中斷字符取決于您的數(shù)據(jù)的使用方式以及在哪些平臺(tái)上。 Cocoa文本系統(tǒng)將\ n墩新,\ n或\ r \ n全部識(shí)別為段落符號(hào) - 相當(dāng)于NSParagraphSeparatorCharacter贸弥。 當(dāng)它插入段落符號(hào)時(shí),例如insertNewline:抖棘,它使用\ n茂腥。 通常,NSLineSeparatorCharacter僅用于特定換行符而不是段落符的斷點(diǎn)切省,例如在insertLineBreak:中最岗,或用于表示HTML
元素。
If your breaks are specifically intended as line breaks and not paragraph breaks, then you should typically use NSLineSeparatorCharacter
. Otherwise, you may use \n
, \r
, or \r\n
depending on what other software is likely to process your text. The default choice for Cocoa is usually \n
.
- 如果您的休息時(shí)間是專門用作換行符而不是分段符朝捆,那么您通常應(yīng)該使用NSLineSeparatorCharacter般渡。 否則,您可以使用\ n芙盘,\ r或\ r \ n取決于其他軟件可能處理您的文本驯用。 Cocoa的默認(rèn)選擇通常是\ n。
Separating a String “by Paragraph”
- 按段落分隔字符串
A common approach to separating a string “by paragraph” is simply to use:
- “按段落”分隔字符串的常用方法是使用:
NSArray *arr = [myString componentsSeparatedByString:@"\n"];
This, however, ignores the fact that there are a number of other ways in which a paragraph or line break may be represented in a string—\r
, \r\n
, or Unicode separators.
- 但是儒老,這忽略了一個(gè)事實(shí)蝴乔,即在字符串 - “\ r \ n”,“\ r \ n”或Unicode分隔符中可以表示段落或換行符驮樊。
Instead you can use methods—such as enumerateSubstringsInRange:options:usingBlock: and enumerateLinesUsingBlock:—that take into account the variety of possible line terminations, as illustrated in the following example.
- 相反薇正,您可以使用諸如enumerateSubstringsInRange:options:usingBlock:和enumerateLinesUsingBlock之類的方法: - 它考慮了各種可能的行終止,如以下示例所示囚衔。
NSString *string = /* assume this exists */;
NSRange range = NSMakeRange(0, string.length);
[string enumerateSubstringsInRange:range
options:NSStringEnumerationByParagraphs
usingBlock:^(NSString * _Nullable paragraph, NSRange paragraphRange, NSRange enclosingRange, BOOL * _Nonnull stop) { // ... }];