由于數(shù)據(jù)可能在Windows下編輯過咧七,保存的是UTF-16的格式用R讀取可能會出現(xiàn)以下問題。這種情況有以下三種解決方案任斋。
> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", na.strings=c("", "NA"), sep="\t", header=T)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>R'
In addition: Warning messages:
1: In read.table("/media/xxx/sampInfo_origin.txt", :
line 1 appears to contain embedded nulls
2: In read.table("/media/xxx/sampInfo_origin.txt", :
line 2 appears to contain embedded nulls
3: In read.table("/media/xxx/sampInfo_origin.txt", :
line 3 appears to contain embedded nulls
4: In read.table("/media/xxx/sampInfo_origin.txt", :
line 4 appears to contain embedded nulls
5: In read.table("/media/albert/xxx/sampInfo_origin.txt", :
line 5 appears to contain embedded nulls
解決方法一:fileEncoding="UTF16LE"或者fileEncoding="UTF16"
> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16LE", sep="\t", header=T)
> sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16", sep="\t", header=T)
> head(sampInfo)
Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72 GSMxxx25 66 <NA> <NA> <NA> NA
2 SRRxxx73 GSMxxx26 72 <NA> <NA> <NA> NA
3 SRRxxx75 GSMxxx28 61 <NA> <NA> <NA> NA
4 SRRxxx74 GSMxxx27 72 <NA> <NA> <NA> NA
5 SRRxxx76 GSMxxx29 50 <NA> <NA> <NA> NA
6 SRRxxx77 GSMxxx30 59 <NA> <NA> <NA> NA
disease_activity donor gender leflumide nsaids othermeds phenotype
1 <NA> C137 male <NA> <NA> <NA> Healthy
2 <NA> C141 male <NA> <NA> <NA> Healthy
3 <NA> C383 male <NA> <NA> <NA> Healthy
4 <NA> C148 female <NA> <NA> <NA> Healthy
5 <NA> C391 female <NA> <NA> <NA> Healthy
6 <NA> C392 female <NA> <NA> <NA> Healthy
classification status plaquenil rituximab steroids sulfasalazine tissue
1 H H <NA> <NA> <NA> <NA> Blood
2 H H <NA> <NA> <NA> <NA> Blood
3 H H <NA> <NA> <NA> <NA> Blood
4 H H <NA> <NA> <NA> <NA> Blood
5 H H <NA> <NA> <NA> <NA> Blood
6 H H <NA> <NA> <NA> <NA> Blood
解決方法二:在Excel中打開继阻,另存為csv文件即可。
> sampInfo=read.csv("/media/xxx/sampInfo_origin.csv", comment.char = "#", sep=",", header=T)
> head(sampInfo)
Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72 GSMxxx25 66 <NA> <NA> <NA> NA
2 SRRxxx73 GSMxxx26 72 <NA> <NA> <NA> NA
3 SRRxxx75 GSMxxx28 61 <NA> <NA> <NA> NA
4 SRRxxx74 GSMxxx27 72 <NA> <NA> <NA> NA
5 SRRxxx76 GSMxxx29 50 <NA> <NA> <NA> NA
6 SRRxxx77 GSMxxx30 59 <NA> <NA> <NA> NA
disease_activity donor gender leflumide nsaids othermeds phenotype
1 <NA> C137 male <NA> <NA> <NA> Healthy
2 <NA> C141 male <NA> <NA> <NA> Healthy
3 <NA> C383 male <NA> <NA> <NA> Healthy
4 <NA> C148 female <NA> <NA> <NA> Healthy
5 <NA> C391 female <NA> <NA> <NA> Healthy
6 <NA> C392 female <NA> <NA> <NA> Healthy
classification status plaquenil rituximab steroids sulfasalazine tissue
1 H H <NA> <NA> <NA> <NA> Blood
2 H H <NA> <NA> <NA> <NA> Blood
3 H H <NA> <NA> <NA> <NA> Blood
4 H H <NA> <NA> <NA> <NA> Blood
5 H H <NA> <NA> <NA> <NA> Blood
6 H H <NA> <NA> <NA> <NA> Blood
解決方法三:在linux系統(tǒng)里將sampInfo_origin.txt用gedit打開废酷,另存為sampInfo_origin01.txt瘟檩,“Character Encoding” 改為 UTF-8, “Line ending”改為“Unix/Linux”澈蟆。
> sampInfo=read.table("/media/xxx/sampInfo_origin01.txt", sep="\t", header=T)
> head(sampInfo,2)
Run Sample_Name age ancestry arthropathymeds biologics das_score
1 SRRxxx72 GSMxxx25 66 <NA> <NA> <NA> NA
2 SRRxxx73 GSMxxx26 72 <NA> <NA> <NA> NA
disease_activity donor gender leflumide nsaids othermeds phenotype
1 <NA> C137 male <NA> <NA> <NA> Healthy
2 <NA> C141 male <NA> <NA> <NA> Healthy
classification status plaquenil rituximab steroids sulfasalazine tissue
1 H H <NA> <NA> <NA> <NA> Blood
2 H H <NA> <NA> <NA> <NA> Blood