學(xué)習(xí)內(nèi)容:
- 學(xué)習(xí)sed編輯器
- gawk編輯器入門
- sed編輯器基礎(chǔ)
shell腳本最常見(jiàn)的一個(gè)用途就是處理文本文件惶我,但僅靠shell腳本命令來(lái)處理文本文件的內(nèi)容有點(diǎn)勉為其難黑滴。如果我們想在shell腳本中處理任何類型的數(shù)據(jù)职车,需要熟悉Linux中的sed和gawk工具。這兩個(gè)工具可以極大簡(jiǎn)化我們需要進(jìn)行的數(shù)據(jù)處理任務(wù)家妆。
文本處理
當(dāng)我們需要自動(dòng)處理文本文件对省,又不想動(dòng)用交互式文本編輯器時(shí)炬藤,sed和gawk是我們最好的選擇。
sed編輯器
也被稱為流編輯器(stream editor)饺窿,會(huì)在編輯器處理數(shù)據(jù)之前基于預(yù)先提供的一組規(guī)則來(lái)編輯數(shù)據(jù)流歧焦。
sed編輯器可以根據(jù)命令來(lái)處理數(shù)據(jù)流中的數(shù)據(jù),這些命令既可以從終端輸入肚医,也可以存儲(chǔ)進(jìn)腳本文件中绢馍。
sed會(huì)執(zhí)行以下的操作:
- 一次從輸入中讀取一行數(shù)據(jù)
- 根據(jù)所提供的命令匹配數(shù)據(jù)
- 按照命令修改流中的數(shù)據(jù)
- 將新的數(shù)據(jù)輸出到STDOUT
這一過(guò)程會(huì)重復(fù)直至處理完流中的所有數(shù)據(jù)行。
sed命令的格式如下:
sed options script file
選項(xiàng)options
可以允許我們修改sed
命令的行為
選項(xiàng) | 描述 |
---|---|
-e script | 在處理輸入時(shí)肠套,將script中指定的命令添加到已有的命令中 |
-f file | 在處理輸入時(shí)舰涌,將file中指定的命令添加到已有的命令中 |
-n | 不產(chǎn)生命令輸出,使用print 命令來(lái)完成輸出 |
script
參數(shù)指定用于流數(shù)據(jù)上的單個(gè)命令你稚,如果需要多個(gè)命令瓷耙,要么使用-e
選項(xiàng)在命令行中指定,要么使用-f
選項(xiàng)在單獨(dú)的文件中指定刁赖。
在命令行中定義編輯器命令
默認(rèn)sed會(huì)將指定命令應(yīng)用到STDIN輸入流上搁痛,我們可以配合管道命令使用。
wsx@wsx-laptop:~/tmp$ echo "This is a test" | sed 's/test/big test/'
This is a big test
s
命令使用斜線間指定的第二個(gè)文本來(lái)替換第一個(gè)文本字符串模式(注意是替換整個(gè)模式乾闰,支持正則匹配)落追,比如這個(gè)例子用big test
替換了test
。
假如有以下文本:
wsx@wsx-laptop:~/tmp$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
鍵入命令涯肩,查看輸出
wsx@wsx-laptop:~/tmp$ sed 's/dog/cat/' data1.txt
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
可以看到符合模式的字符串都被修改了轿钠。
要記住,sed并不會(huì)修改文本文件的數(shù)據(jù)病苗,它只會(huì)將修改后的數(shù)據(jù)發(fā)送到STDOUT疗垛。
在命令行上使用多個(gè)編輯器命令
使用-e
選項(xiàng)可以執(zhí)行多個(gè)命令
wsx@wsx-laptop:~/tmp$ sed -e 's/brown/green/; s/dog/cat/' data1.txt
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
兩個(gè)命令都作用到文件中的每一行數(shù)據(jù)上。命令之間必須用分號(hào)隔開(kāi)硫朦,并且在命令末尾與分號(hào)之間不同有空格贷腕。
如果不想使用分號(hào),可以用bash shell中的次提示符來(lái)分隔命令。
wsx@wsx-laptop:~/tmp$ sed -e '
> s/brown/green/
> s/fox/elephant/
> s/dog/cat/' data1.txt
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
從文件中讀取編輯器命令
如果有大量要處理的sed命令泽裳,將其單獨(dú)放入一個(gè)文本中會(huì)更方便瞒斩,可以用sed命令的-f
選項(xiàng)來(lái)指定文件。
wsx@wsx-laptop:~/tmp$ cat script1.sed
s/brown/green/
s/fox/elephant/
s/dog/cat/
wsx@wsx-laptop:~/tmp$ sed -f script1.sed data1.txt
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
這種情況不用在每個(gè)命令后面放一個(gè)分號(hào)涮总,sed知道每行都有一條單獨(dú)的命令胸囱。
gawk程序
gawk是一個(gè)處理文本的更高級(jí)工具,能夠提供一個(gè)類編程環(huán)境來(lái)修改和重新組織文件中的數(shù)據(jù)瀑梗。
說(shuō)明 在所有的發(fā)行版都沒(méi)有默認(rèn)安裝gawk程序烹笔,請(qǐng)先安裝
gawk程序是Unix中原始awk的GNU版本,它讓流編輯器邁上了一個(gè)新的臺(tái)階抛丽,提供了一種編程語(yǔ)言而不只是編輯器命令谤职。
我們可以利用它做下面的事情:
- 定義變量來(lái)保存數(shù)據(jù)
- 使用算術(shù)和字符串操作符來(lái)處理數(shù)據(jù)
- 使用結(jié)構(gòu)化編程概念來(lái)為數(shù)據(jù)處理增加處理邏輯
- 通過(guò)提取數(shù)據(jù)文件中的數(shù)據(jù)元素,將其重新排列或格式化亿鲜,生成格式化報(bào)告
gawk程序的報(bào)告生成能力通常用來(lái)從大文本文件中提取數(shù)據(jù)元素允蜈,并將它們格式化成可讀的報(bào)告,使得重要的數(shù)據(jù)更易于可讀蒿柳。
基本命令格式
gawk options program file
下面顯示了gawk程序的可用選項(xiàng)
選項(xiàng) | 描述 |
---|---|
-F fs | 指定行中劃分?jǐn)?shù)據(jù)字段的字段分隔符 |
-f file | 從指定文件中讀取程序 |
-v var=value | 定義gawk程序中的一個(gè)變量及其默認(rèn)值 |
-mf N | 指定要處理的數(shù)據(jù)文件中的最大字段數(shù) |
-mr N | 指定數(shù)據(jù)文件中的最大數(shù)據(jù)行數(shù) |
-W keyword | 指定gawk的兼容模式或警告等級(jí) |
gawk的強(qiáng)大之處在于程序腳本(善于利用工具最強(qiáng)之處)陷寝,可以寫腳本來(lái)讀取文本行的數(shù)據(jù),然后處理并顯示數(shù)據(jù)其馏,創(chuàng)建任何類型的輸出報(bào)告凤跑。
從命令行讀取腳本
我們必須將腳本命令放入兩個(gè)花括號(hào)中,而由于gawk命令行假定腳本是單個(gè)文本字符串叛复,所以我們必須把腳本放到單引號(hào)中仔引。
下面是一個(gè)簡(jiǎn)單的例子:
wsx@wsx-laptop:~/tmp$ gawk '{print "Hello World!"}'
Hello World!
This is a test
Hello World!
This is
Hello World!
Hello World!
print
命令將文本打印到STDOUT。如果嘗試允許命令褐奥,我們可能會(huì)有些失望咖耘,因?yàn)槭裁炊疾粫?huì)發(fā)生,原因是沒(méi)有指定文件名撬码,所以gawk會(huì)從STDIN接收數(shù)據(jù)儿倒,如果我們按下回車,gawk會(huì)對(duì)這行文本允許一遍程序腳本呜笑。
要終止這個(gè)程序必須表明數(shù)據(jù)流已經(jīng)結(jié)束了夫否,bash shell提供組合鍵來(lái)生成EOF(End-of-File)字符。Ctrl+D組合鍵會(huì)在bash中產(chǎn)生一個(gè)EOF字符叫胁。
使用數(shù)據(jù)字段變量
gawk的主要特性之一是其處理文本文件中數(shù)據(jù)的能力凰慈,它自動(dòng)給一行的每個(gè)數(shù)據(jù)元素分配一個(gè)變量。
- $0代表整個(gè)文本行
- $1代表文本行的第一個(gè)數(shù)據(jù)字段
- $2代表文本行的第二個(gè)數(shù)據(jù)字段
- $n代表文本行的第n個(gè)數(shù)據(jù)字段
gawk在讀取一行文本時(shí)驼鹅,會(huì)用預(yù)定義的字段分隔符劃分每個(gè)數(shù)據(jù)字段微谓。默認(rèn)字段分隔符為任意的空白字符(例如空格或制表符)森篷。
下面例子gawk讀取文本顯示第一個(gè)數(shù)據(jù)字段的值。
wsx@wsx-laptop:~/tmp$ cat data2.txt
One line of test text.
Two lines of test text.
Three lines of test text.
wsx@wsx-laptop:~/tmp$ gawk '{print $1}' data2.txt
One
Two
Three
我們可以使用-F
選項(xiàng)指定其他字段分隔符:
wsx@wsx-laptop:~/tmp$ gawk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
...
這個(gè)簡(jiǎn)短程序顯示了系統(tǒng)中密碼文件的第一個(gè)數(shù)據(jù)字段豺型。
在程序腳本中使用多個(gè)命令
在命令之間放個(gè)分號(hào)即可仲智。
wsx@wsx-laptop:~/tmp$ echo "My name is Shixiang" | gawk '{$4="Christine"; print $0}'
My name is Christine
也可以使用次提示符一次一行輸入程序腳本命令(類似sed)。
從文件中讀取程序
wsx@wsx-laptop:~/tmp$ cat script2.gawk
{print $1 " 's home directory is " $6}
wsx@wsx-laptop:~/tmp$ gawk -F: -f script2.gawk /etc/passwd
root 's home directory is /root
daemon 's home directory is /usr/sbin
bin 's home directory is /bin
sys 's home directory is /dev
sync 's home directory is /bin
games 's home directory is /usr/games
man 's home directory is /var/cache/man
lp 's home directory is /var/spool/lpd
mail 's home directory is /var/mail
news 's home directory is /var/spool/news
uucp 's home directory is /var/spool/uucp
proxy 's home directory is /bin
...
可以在程序文件中指定多條命令:
wsx@wsx-laptop:~/tmp$ cat script3.gawk
{
text = "'s home directory is "
print $1 text $6
}
wsx@wsx-laptop:~/tmp$ gawk -F: -f script3.gawk /etc/passwd
root's home directory is /root
daemon's home directory is /usr/sbin
bin's home directory is /bin
sys's home directory is /dev
sync's home directory is /bin
games's home directory is /usr/games
man's home directory is /var/cache/man
lp's home directory is /var/spool/lpd
mail's home directory is /var/mail
news's home directory is /var/spool/news
...
在處理數(shù)據(jù)前運(yùn)行腳本
使用BEGIN關(guān)鍵字可以強(qiáng)制gawk再讀取數(shù)據(jù)前執(zhí)行BEGIN關(guān)鍵字指定的程序腳本姻氨。
wsx@wsx-laptop:~/tmp$ cat data3.txt
Line 1
Line 2
Line 3
wsx@wsx-laptop:~/tmp$ gawk 'BEGIN {print "The data3 File Contents:"}
> {print $0}' data3.txt
The data3 File Contents:
Line 1
Line 2
Line 3
在gawk執(zhí)行了BEGIN腳本后坎藐,它會(huì)用第二段腳本來(lái)處理文件數(shù)據(jù)。
在處理數(shù)據(jù)后允許腳本
與BEGIN關(guān)鍵字類似哼绑,END關(guān)鍵字允許我們指定一個(gè)腳本,gawk在讀完數(shù)據(jù)后執(zhí)行碉咆。
wsx@wsx-laptop:~/tmp$ gawk 'BEGIN {print "The data3 File Contents:"}
> {print $0}
> END {print "End of File"}' data3.txt
The data3 File Contents:
Line 1
Line 2
Line 3
End of File
我們把所有的內(nèi)容放在一起組成一個(gè)漂亮的小程序腳本抖韩,用它從簡(jiǎn)單的數(shù)據(jù)文件中創(chuàng)建一份完整報(bào)告。
wsx@wsx-laptop:~/tmp$ cat script4.gawk
BEGIN {
print "The latest list of users and shells"
print " UserID \t Shell"
print "-------- \t ------"
FS=":"
}
{
print $1 " \t " $7
}
END {
print "This concludes the listing"
}
wsx@wsx-laptop:~/tmp$ gawk -f script4.gawk /etc/passwd
The latest list of users and shells
UserID Shell
-------- ------
root /bin/bash
daemon /usr/sbin/nologin
bin /usr/sbin/nologin
sys /usr/sbin/nologin
sync /bin/sync
games /usr/sbin/nologin
man /usr/sbin/nologin
lp /usr/sbin/nologin
mail /usr/sbin/nologin
news /usr/sbin/nologin
uucp /usr/sbin/nologin
proxy /usr/sbin/nologin
www-data /usr/sbin/nologin
backup /usr/sbin/nologin
list /usr/sbin/nologin
irc /usr/sbin/nologin
gnats /usr/sbin/nologin
nobody /usr/sbin/nologin
systemd-timesync /bin/false
systemd-network /bin/false
systemd-resolve /bin/false
systemd-bus-proxy /bin/false
syslog /bin/false
_apt /bin/false
lxd /bin/false
messagebus /bin/false
uuidd /bin/false
dnsmasq /bin/false
sshd /usr/sbin/nologin
pollinate /bin/false
wsx /bin/bash
This concludes the listing
我們以后會(huì)繼續(xù)學(xué)習(xí)gawk高級(jí)編程疫铜。
sed編輯器基礎(chǔ)
下面介紹一些可以集成到腳本中的基本命令和功能茂浮。
更多的替換選項(xiàng)
之前我們已經(jīng)學(xué)習(xí)了用s
命令在行中替換文本,這個(gè)命令還有一些其他選項(xiàng)壳咕。
替換標(biāo)記
替換命令s
默認(rèn)只替換每行中出現(xiàn)的第一處席揽。要讓該命令能替換一行中不同地方出現(xiàn)的文本必須使用替換標(biāo)記。該標(biāo)記在替換命令字符串之后設(shè)置谓厘。
s/pattern/replacement/flags
替換標(biāo)記有4種:
- 數(shù)字幌羞,表明替換第幾處模式匹配的地方
- g,表明替換所有匹配的文本
- p竟稳,表明原先行的內(nèi)容要打印出來(lái)
- w file属桦,將替換的結(jié)果寫入文件中
wsx@wsx-laptop:~/tmp$ cat data4.txt
This is a test of the test script.
This is the second test of the test script.
wsx@wsx-laptop:~/tmp$ sed 's/test/trial/2' data4.txt
This is a test of the trial script.
This is the second test of the trial script.
該命令只替換每行中第二次出現(xiàn)的匹配模式。而g
標(biāo)記替換所有的匹配之處他爸。
wsx@wsx-laptop:~/tmp$ sed 's/test/trial/g' data4.txt
This is a trial of the trial script.
This is the second trial of the trial script.
p
替換標(biāo)記會(huì)打印與替換命令中指定的模式匹配的行聂宾,通常與sed的-n
選項(xiàng)一起使用。
wsx@wsx-laptop:~/tmp$ cat data5.txt
This is a test line.
This is a different line.
wsx@wsx-laptop:~/tmp$ sed -n 's/test/trial/p' data5.txt
This is a trial line.
-n
選項(xiàng)禁止sed編輯器輸出诊笤,但p
標(biāo)記會(huì)輸出修改過(guò)的行系谐。兩者配合使用就是只輸出被替換命令修改過(guò)的行。
w
標(biāo)記會(huì)產(chǎn)生同樣的輸出讨跟,不過(guò)會(huì)將輸出(只輸出被替換命令修改過(guò)的行)保存到指定文件中纪他。
wsx@wsx-laptop:~/tmp$ sed 's/test/trial/w test.txt' data5.txt
This is a trial line.
This is a different line.
wsx@wsx-laptop:~/tmp$ cat test.txt
This is a trial line.
替換字符
有一些字符不方便在替換模式中使用,常見(jiàn)的例子為正斜線晾匠。
替換文件中的路徑名會(huì)比較麻煩止喷,比如用C shell替換/etc/passwd文件中的bash shell,必須這樣做(通過(guò)反斜線轉(zhuǎn)義):
wsx@wsx-laptop:~/tmp$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
wsx@wsx-laptop:~/tmp$ sed 's/\/bin\/bash/\/bin\/csh/' /etc/passwd
root:x:0:0:root:/root:/bin/csh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
為解決這樣的問(wèn)題混聊,sed編輯器允許選擇其他字符來(lái)替換命令中的字符串分隔符:
wsx@wsx-laptop:~/tmp$ sed 's!/bin/bash!/bin/csh!' /etc/passwd
root:x:0:0:root:/root:/bin/csh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
使用地址
如果只想要命令作用于特定行或某些行弹谁,必須使用行尋址乾巧。
有兩種形式:
- 以數(shù)字形式表示行區(qū)間
- 用文本模式來(lái)過(guò)濾出行
它們都使用相同地格式來(lái)指定地址:
[address]command
也可以將多個(gè)命令分組
address {
command1
command2
command3
}
以數(shù)字的方式行尋址
sed編輯器會(huì)將文本流中的第一行編號(hào)為1,然后繼續(xù)按順序給以下行編號(hào)预愤。
指定的地址可以是單個(gè)行號(hào)沟于,或者用行號(hào)、逗號(hào)以及結(jié)尾行號(hào)指定的一定區(qū)間范圍的行植康。
wsx@wsx-laptop:~/tmp$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
wsx@wsx-laptop:~/tmp$ sed '2s/dog/cat/' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
wsx@wsx-laptop:~/tmp$ sed '2,3s/dog/cat/' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
wsx@wsx-laptop:~/tmp$ sed '2,$s/dog/cat/' data1.txt # 美元符指代最后一行
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
使用文本模式過(guò)濾器
sed允許指定文本模式來(lái)過(guò)濾出命令要作用的行旷太,格式如下:
/pattern/command
比如我要修改默認(rèn)的shell,可以使用sed命令:
wsx@wsx-laptop:~/tmp$ grep wsx /etc/passwd
wsx:x:1000:1000:"",,,:/home/wsx:/bin/bash
wsx@wsx-laptop:~/tmp$ grep '/wsx/s/bash/csh/' /etc/passwd
wsx@wsx-laptop:~/tmp$ sed '/wsx/s/bash/csh/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
wsx:x:1000:1000:"",,,:/home/wsx:/bin/csh
正則表達(dá)式允許創(chuàng)建高級(jí)文本模式匹配表達(dá)式來(lái)匹配各種數(shù)據(jù)销睁,結(jié)合一系列通配符供璧、特殊字符來(lái)生成幾乎任何形式文本的簡(jiǎn)練模式。我們后續(xù)會(huì)學(xué)習(xí)到冻记。
命令組合
使用花括號(hào)可以將多條命令組合在一起睡毒。
wsx@wsx-laptop:~/tmp$ sed '2{
> s/fox/elephant/
> s/dog/cat/
> }' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown elephant jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
也可以在一組命令前指定一個(gè)地址區(qū)間。
wsx@wsx-laptop:~/tmp$ sed '3,${
s/brown/green/
s/lazy/active/
}' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
刪除行
如果需要?jiǎng)h除文本流中的特定行冗栗,使用刪除命令d
演顾,它會(huì)刪除匹配指定尋址模式的所有行。使用時(shí)要特別小心隅居,如果忘記加入尋址模式钠至,會(huì)將所有文本行刪除。
wsx@wsx-laptop:~/tmp$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
wsx@wsx-laptop:~/tmp$ sed 'd' data1.txt
和指定的地址一起使用才能發(fā)揮刪除命令的最大功用胎源。
wsx@wsx-laptop:~/tmp$ cat data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
wsx@wsx-laptop:~/tmp$ sed '3d' data6.txt
This is line number 1.
This is line number 2.
This is line number 4.
通過(guò)特定行區(qū)間指定:
wsx@wsx-laptop:~/tmp$ sed '2,3d' data6.txt
This is line number 1.
This is line number 4.
通過(guò)特殊文本結(jié)尾字符指定:
wsx@wsx-laptop:~/tmp$ sed '2,$d' data6.txt
This is line number 1.
還可以使用模式匹配特性:
wsx@wsx-laptop:~/tmp$ sed '/number 1/d' data6.txt
This is line number 2.
This is line number 3.
This is line number 4.
sed會(huì)刪除包含匹配模式的行棉钧。
記住,sed不會(huì)修改原始文件涕蚤。
還可以使用兩個(gè)文本模式來(lái)刪除某個(gè)區(qū)間內(nèi)的行掰盘,但做的時(shí)候需要特別小心,指定的第一個(gè)模式會(huì)“打開(kāi)”行刪除功能赞季,第二個(gè)模式會(huì)“關(guān)閉”行刪除功能愧捕。sed會(huì)刪除兩個(gè)指定行之間的所有行(包括指定行)。
wsx@wsx-laptop:~/tmp$ cat data7.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.
wsx@wsx-laptop:~/tmp$ sed '/1/,/3/d' data7.txt
This is line number 4.
第二個(gè)出現(xiàn)的數(shù)字“1”的行再次觸發(fā)了刪除命令申钩,因?yàn)槲茨苷业酵V鼓J健?”次绘,所以將數(shù)據(jù)流剩余的行全部刪掉了。
wsx@wsx-laptop:~/tmp$ sed '/1/,/5/d' data7.txt
wsx@wsx-laptop:~/tmp$ sed '/2/,/4/d' data7.txt
This is line number 1.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.
插入和附加文本
sed允許向數(shù)據(jù)流插入和附加文本行:
- 插入命令
i
會(huì)在指定行前增加一個(gè)新行 - 附加命令
a
會(huì)在指定行后增加一個(gè)新行
注意撒遣,它們不能在單個(gè)命令行上使用邮偎,必須要指定是要插入還是要附加到的那一行。
wsx@wsx-laptop:~/tmp$ echo "Test Line 2" | sed 'i\Test Line 1'
Test Line 1
Test Line 2
wsx@wsx-laptop:~/tmp$ echo "Test Line 2" | sed 'a\Test Line 1'
Test Line 2
Test Line 1
要向數(shù)據(jù)流行內(nèi)部插入或附加數(shù)據(jù)义黎,必須用尋址來(lái)告訴sed數(shù)據(jù)應(yīng)該出現(xiàn)在什么位置禾进。
wsx@wsx-laptop:~/tmp$ sed '3i\ This is an inserted line.' data6.txt
This is line number 1.
This is line number 2.
This is an inserted line.
This is line number 3.
This is line number 4.
wsx@wsx-laptop:~/tmp$ sed '3a\ This is an inserted line.' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an inserted line.
This is line number 4.
如果想要給數(shù)據(jù)流末尾添加多行數(shù)據(jù),通過(guò)$
指定位置即可廉涕。
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is a new line.
修改行
修改(change)命令允許修改整個(gè)數(shù)據(jù)流中整行文本內(nèi)容泻云。它跟插入和附加命令的工作機(jī)制一樣艇拍。
wsx@wsx-laptop:~/tmp$ sed '3c\This is a changed line.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line.
This is line number 4.
wsx@wsx-laptop:~/tmp$ sed '/number 3/c\This is a changed line.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line.
This is line number 4.
轉(zhuǎn)換命令
轉(zhuǎn)換命令(y)是唯一可以處理單字符的sed命令。格式如下:
[address]y/inchars/outchars
轉(zhuǎn)換命令會(huì)對(duì)inchars
和outchars
值進(jìn)行一對(duì)一的映射宠纯。如果兩者字符長(zhǎng)度不同卸夕,則sed產(chǎn)生一條錯(cuò)誤信息。
wsx@wsx-laptop:~/tmp$ sed 'y/123/789/' data6.txt
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 4.
轉(zhuǎn)換命令是一個(gè)全局命令婆瓜,它會(huì)在文本行中找到的所有指定字符自動(dòng)進(jìn)行轉(zhuǎn)換快集,而不會(huì)考慮它們出現(xiàn)的位置。
回顧命令
另有3個(gè)命令可以用來(lái)打印數(shù)據(jù)流中的信息:
-
p
命令用來(lái)打印文本行 - 等號(hào)
=
命令用來(lái)打印行號(hào) -
l
用來(lái)列出行
打印行
wsx@wsx-laptop:~/tmp$ echo "this is a test" | sed 'p'
this is a test
this is a test
p
打印已有的數(shù)據(jù)文本廉白。最常用的用法是打印符合匹配文本模式的行个初。
wsx@wsx-laptop:~/tmp$ cat data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
wsx@wsx-laptop:~/tmp$ sed -n '/number 3/p' data6.txt
This is line number 3.
在命令行上使用-n
選項(xiàng),可以禁止輸出其他行猴蹂,只打印包含匹配文本模式的行院溺。
也可以用來(lái)快速打印數(shù)據(jù)流中的某些行:
wsx@wsx-laptop:~/tmp$ sed -n '2,3p' data6.txt
This is line number 2.
This is line number 3.
打印行號(hào)
wsx@wsx-laptop:~/tmp$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
wsx@wsx-laptop:~/tmp$ sed '=' data1.txt
1
The quick brown fox jumps over the lazy dog.
2
The quick brown fox jumps over the lazy dog.
3
The quick brown fox jumps over the lazy dog.
4
The quick brown fox jumps over the lazy dog.
5
The quick brown fox jumps over the lazy dog.
6
The quick brown fox jumps over the lazy dog.
7
The quick brown fox jumps over the lazy dog.
8
The quick brown fox jumps over the lazy dog.
9
The quick brown fox jumps over the lazy dog.
這用來(lái)查找特定文本模式的話非常方便:
wsx@wsx-laptop:~/tmp$ sed -n '/number 4/{
> =
> p
> }' data6.txt
4
This is line number 4.
列出行
wsx@wsx-laptop:~/tmp$ cat data9.txt
This line contains tabs.
wsx@wsx-laptop:~/tmp$ sed -n 'l' data9.txt
This\tline\tcontains\ttabs.$
使用Sed處理文件
寫入文件
w
命令用來(lái)向文件寫入行。該命令格式如下:
[address]w filename
將文本的前兩行寫入其他文件:
wsx@wsx-laptop:~/tmp$ sed '1,2w test.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
wsx@wsx-laptop:~/tmp$ cat test.txt
This is line number 1.
This is line number 2.
如果不想讓行顯示到STDOUT(因?yàn)閟ed默認(rèn)數(shù)據(jù)文本流)晕讲,可以使用sed命令的-n
選項(xiàng)。
讀取數(shù)據(jù)
讀取命令為r
马澈。
wsx@wsx-laptop:~/tmp$ cat data12.txt
This is an added line.
This is the second added line.
wsx@wsx-laptop:~/tmp$ sed '3r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an added line.
This is the second added line.
This is line number 4.
這效果有點(diǎn)像插入文本命令i
和補(bǔ)充命令a
瓢省。
同樣適用于文本模式地址:
wsx@wsx-laptop:~/tmp$ sed '/number 2/r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is an added line.
This is the second added line.
This is line number 3.
This is line number 4.
文本末尾添加:
wsx@wsx-laptop:~/tmp$ sed '$r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is an added line.
This is the second added line.
讀取命令的一個(gè)很酷的用法是和刪除命令配合使用:利用另一個(gè)文件中的數(shù)據(jù)來(lái)替換文件中的占位文本。假如你有一份套用信件保存在文本中:
wsx@wsx-laptop:~/tmp$ cat notice.std
Would the following people:
LIST
please report to the ship's captain.
套用信件將通用占位文本LIST
放在人物名單的位置痊班,我們先根據(jù)它插入文本字符勤婚,然后刪除它。
wsx@wsx-laptop:~/tmp$ sed '/LIST/{
> r data10.txt
> d
> }' notice.std
Would the following people:
This line contains an escape character.
please report to the ship's captain.
wsx@wsx-laptop:~/tmp$ cat data10.txt
This line contains an escape character.
wsx@wsx-laptop:~/tmp$ cat data11.txt
wangshx zhdan
wsx@wsx-laptop:~/tmp$ sed '/LIST/{
r data11.txt
d
}' notice.std
Would the following people:
wangshx zhdan
please report to the ship's captain.
可以看到占位符被替換成了數(shù)據(jù)文件中的文字涤伐。
完馒胆。