在數(shù)據(jù)處理過程中不可避免的會(huì)出現(xiàn)一些文本字符串,涉及到字符串的分割替換等徘熔,而R中的一些函數(shù)可以很方便快捷的處理這些問題
字符串拆分函數(shù) strsplit()
其命令形式為:
strsplit(x, split, fixed= F, perl= F, useBytes= F)
其中split是分割參數(shù)车猬。所得結(jié)果以默認(rèn)以list 列表形式展示霉猛。如果希望得到一個(gè)向量,可以使用 unlist() 函數(shù)珠闰。
舉個(gè)例子:
> x = "character vector,
each element of which is to be split.
Other inputs, including a factor, will give an error." #創(chuàng)建一個(gè)字符串向量x
> x #查看x
[1] "character vector, each element of which is to be split. Other inputs, including a factor, will give an error."
> strsplit(x,split = "\\s+") #strsplit 函數(shù)惜浅,首先`\s+`是表示匹配一個(gè)或一個(gè)以上的空白字符,包括空格伏嗜、制表符和換行符等坛悉。這里的第一個(gè)`\` 是用來轉(zhuǎn)義第二個(gè)`\` 符號(hào)的。
[[1]]
[1] "character" "vector," "each" "element" "of"
[6] "which" "is" "to" "be" "split."
[11] "Other" "inputs," "including" "a" "factor,"
[16] "will" "give" "an" "error."
> strsplit(x,split = " ") #split=" "當(dāng)設(shè)置為空字符的時(shí)候承绸,strsplit函數(shù)會(huì)把字符串按照字符一個(gè)個(gè)進(jìn)行分割裸影。
[[1]]
[1] "character" "vector," "each" "element" "of"
[6] "which" "is" "to" "be" "split."
[11] "Other" "inputs," "including" "a" "factor,"
[16] "will" "give" "an" "error."
substr()
函數(shù)和substring()
函數(shù)是截取字符串最常用的函數(shù),兩個(gè)函數(shù)功能方面是一樣的军熏,只是其中參數(shù)設(shè)置不同轩猩。
substr()
函數(shù):必須設(shè)置參數(shù)start和stop,如果缺少將出錯(cuò)羞迷。
substring()
函數(shù):可以只設(shè)置first參數(shù)界轩,last參數(shù)若不設(shè)置画饥,則默認(rèn)為1000000L衔瓮,通常是指字符串的最大長度。
chartr()
函數(shù):將原有字符串中特定字符替換成所需要的字符抖甘。其中參數(shù)old 表示原有字符串中內(nèi)容热鞍;new 表示替換后的字符內(nèi)容。
chartr (old,new,x)
,chartr-將對(duì)象中舊的字符用新的字符替代薇宠。
這種功能和shell里面的rename有點(diǎn)類似偷办,但old的字符數(shù)不能大于new,new字符數(shù)大于old的字符也將會(huì)被忽略澄港,相當(dāng)于重命名的意思椒涯。不同于rename的是chartr不能隨意的替換字符串,用起來也有一定的局限性回梧。
gsub()
替換匹配到的全部废岂;sub()
替換匹配到的第一個(gè)。
入門生信最快方式請搜索生信技能樹
- 生信技能樹全球公益巡講
https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g - B站公益74小時(shí)生信工程師教學(xué)視頻合輯https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
- 招學(xué)徒
https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw