WebHDFS與HttpFS的使用
WebHDFS
介紹
提供HDFS的RESTful接口传轰,可通過此接口進(jìn)行HDFS文件操作背传。
安裝
WebHDFS服務(wù)內(nèi)置在HDFS中呆瞻,不需額外安裝、啟動(dòng)径玖。
配置
需要在hdfs-site.xml打開WebHDFS開關(guān)痴脾,此開關(guān)默認(rèn)打開。
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
使用
連接NameNode的50070端口進(jìn)行文件操作梳星。
比如:curl "http://ctrl:50070/webhdfs/v1/?op=liststatus&user.name=root" | python -mjson.tool
更多操作
參考文檔:官方WebHDFS REST API
HttpFS(Hadoop HDFS over HTTP)
介紹
HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API.
安裝
Hadoop自帶赞赖,不需要額外安裝。默認(rèn)服務(wù)未啟動(dòng)冤灾,需要手工啟動(dòng)前域。
配置
httpfs-site.xml
有配置文件httpfs-site.xml,此配置文件一般保存默認(rèn)即可韵吨,無需修改匿垄。hdfs-site.xml
需要增加如下配置,其他兩個(gè)參數(shù)名稱中的root代表的是啟動(dòng)hdfs服務(wù)的OS用戶学赛,應(yīng)以實(shí)際的用戶名稱代替年堆。
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
啟動(dòng)
sbin/httpfs.sh start
sbin/httpfs.sh stop
啟動(dòng)后,默認(rèn)監(jiān)聽14000端口:
[root@ctrl sbin]# netstat -antp | grep 14000
tcp 0 0 :::14000 :::* LISTEN 7415/java
[root@ctrl sbin]#
使用
curl "http://ctrl:14000/webhdfs/v1/?op=liststatus&user.name=root" | python -mjson.tool
更多操作
參考文檔
更多操作:
官方WebHDFS REST API
HttpFS官方文檔
WebHDFS與HttpFS的關(guān)系
WebHDFS vs HttpFs Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a “gateway” and will be a single point of data transfer to the client node. So, HttpFs could be choked during a large file transfer but the good thing is that we are minimizing the footprint required to access HDFS.