hadoop
Hadoop是一個(gè)由Apache基金會(huì)所開發(fā)的分布式系統(tǒng)基礎(chǔ)架構(gòu)丹禀。
Hadoop實(shí)現(xiàn)了一個(gè)分布式文件系統(tǒng)(Hadoop Distributed File System)唱逢,簡(jiǎn)稱HDFS粤铭。HDFS有高容錯(cuò)性的特點(diǎn)挖胃,并且設(shè)計(jì)用來部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)來訪問應(yīng)用程序的數(shù)據(jù)梆惯,適合那些有著超大數(shù)據(jù)集(large data set)的應(yīng)用程序酱鸭。HDFS放寬了(relax)POSIX的要求,可以以流的形式訪問(streaming access)文件系統(tǒng)中的數(shù)據(jù)垛吗。
Hadoop的框架最核心的設(shè)計(jì)就是:HDFS和MapReduce凹髓。HDFS為海量的數(shù)據(jù)提供了存儲(chǔ),而MapReduce則為海量的數(shù)據(jù)提供了計(jì)算怯屉。
HDFS架構(gòu)
(1)NameNode
(2)DataNode
(3)Secondary NameNode
NameNode
(1)是整個(gè)文件系統(tǒng)的管理節(jié)點(diǎn)妻味。它維護(hù)著整個(gè)文件系統(tǒng)的文件目錄樹,文件/目錄的元信息和每個(gè)文件對(duì)應(yīng)的數(shù)據(jù)塊列表巧娱。接收用戶的操作請(qǐng)求拒课。
(2)文件包括:
fsimage:元數(shù)據(jù)鏡像文件。存儲(chǔ)某一時(shí)段NameNode內(nèi)存元數(shù)據(jù)信息羡儿。
edits:操作日志文件礼患。
fstime:保存最近一次checkpoint的時(shí)間
(3)以上這些文件是保存在linux的文件系統(tǒng)中。
SecondaryNameNode
(1)HA的一個(gè)解決方案。但不支持熱備缅叠。配置即可悄泥。
(2)執(zhí)行過程:從NameNode上下載元數(shù)據(jù)信息(fsimage,edits),然后把二者合并肤粱,生成新的fsimage弹囚,在本地保存,并將其推送到NameNode狼犯,替換舊的fsimage.
(3)默認(rèn)在安裝在NameNode節(jié)點(diǎn)上余寥,但這樣不安全!
Datanode
(1)提供真實(shí)文件數(shù)據(jù)的存儲(chǔ)服務(wù)悯森。
(2)文件塊(block):最基本的存儲(chǔ)單位宋舷。對(duì)于文件內(nèi)容而言,一個(gè)文件的長(zhǎng)度大小是size瓢姻,那么從文件的0偏移開始祝蝠,按照固定的大小,順序?qū)ξ募M(jìn)行劃分并編號(hào)幻碱,劃分好的每一個(gè)塊稱一個(gè)Block绎狭。HDFS默認(rèn)Block大小是128MB,以一個(gè)256MB文件褥傍,共有256/128=2個(gè)Block.
dfs.block.size
(3)不同于普通文件系統(tǒng)的是儡嘶,HDFS中,如果一個(gè)文件小于一個(gè)數(shù)據(jù)塊的大小恍风,并不占用整個(gè)數(shù)據(jù)塊存儲(chǔ)空間
(4)Replication蹦狂。多復(fù)本。默認(rèn)是三個(gè)朋贬。hdfs-site.xml的dfs.replication屬性
一.單機(jī)模式
建立用戶凯楔,設(shè)置密碼(密碼此次設(shè)置為(redhat)
[root@server1 ~]# useradd -u 1000 hadoop
[root@server1 ~]# passwd hadoop
2.hadoop的安裝配置
[root@server1 ~]# mv hadoop-3.0.3.tar.gz jdk-8u181-linux-x64.tar.gz /home/hadoop
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ ls
hadoop-3.0.3.tar.gz? jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf jdk-8u181-linux-x64.tar.gz
[hadoop@server1 ~]$ tar zxf hadoop-3.0.3.tar.gz
[hadoop@server1 ~]$ ln -s jdk1.8.0_181/ java
[hadoop@server1 ~]$ ln -s hadoop-3.0.3 hadoop
[hadoop@server1 ~]$ ls
hadoop? ? ? ? hadoop-3.0.3.tar.gz? jdk1.8.0_181
hadoop-3.0.3? java? ? ? ? ? ? ? ? jdk-8u181-linux-x64.tar.gz
3.配置環(huán)境變量
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim hadoop-env.sh
54 export JAVA_HOME=/home/hadoop/java
[hadoop@server1 ~]$ vim .bash_profile
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/java/bin
[hadoop@server1 ~]$ source .bash_profile
[hadoop@server1 ~]$ jps
2133 Jps
4.測(cè)試
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ mkdir input
[hadoop@server1 hadoop]$ cp etc/hadoop/*.xml input/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000? _SUCCESS
[hadoop@server1 output]$ cat *
1 dfsadmin
二.偽分布式
namenode和datanode都在自己這臺(tái)主機(jī)上
1.編輯文件
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim core-site.xml
<configuration>
? ? <property>
? ? ? ? <name>fs.defaultFS</name>
? ? ? ? <value>hdfs://localhost:9000</value>
? ? </property>
</configuration>
[hadoop@server1 hadoop]$ vim hdfs-site.xml
<configuration>
? ? <property>
? ? ? ? <name>dfs.replication</name>
? ? ? ? <value>1</value>
? ? </property>
</configuration>
2.生成密鑰做免密連接
[hadoop@server1 hadoop]$ ssh-keygen
[hadoop@server1 hadoop]$ ssh-copy-id localhost
3.格式化,并開啟服務(wù)
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ cd sbin/
[hadoop@server1 sbin]$ ./start-dfs.sh
[hadoop@server1 sbin]$ jps
2458 NameNode
2906 Jps
2765 SecondaryNameNode
2575 DataNode
4.瀏覽器查看http://172.25.68.1:9870
5.測(cè)試锦募,創(chuàng)建目錄摆屯,并上傳
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir -p /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls
Found 1 items
drwxr-xr-x? - hadoop supergroup? ? ? ? ? 0 2019-05-28 10:20 input
刪除input和output文件,重新執(zhí)行命令(測(cè)試從分布式上拉取文件)
[hadoop@server1 hadoop]$ rm -fr input/ output/
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
[hadoop@server1 hadoop]$ ls
bin? etc? include? lib? libexec? LICENSE.txt? logs? NOTICE.txt? README.txt? sbin? share
**此時(shí)input和output不會(huì)出現(xiàn)在當(dāng)前目錄下糠亩,而是上傳到了分布式文件系統(tǒng)中虐骑,網(wǎng)頁上可以看到**
[hadoop@server1 hadoop]$ bin/hdfs dfs -cat output/*
1 dfsadmin
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output? ? ? ##從分布式系統(tǒng)中g(shù)et下來output目錄
[hadoop@server1 hadoop]$ cd output/
[hadoop@server1 output]$ ls
part-r-00000? _SUCCESS
[hadoop@server1 output]$ cat *
1 dfsadmin
分布式
主機(jī)是namenode,其它從機(jī)是datanode
1.先停掉服務(wù)赎线,清除原來的數(shù)據(jù)
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps
3927 Jps
[hadoop@server1 hadoop]$ cd /tmp/
[hadoop@server1 tmp]$ ls
hadoop? hadoop-hadoop? hsperfdata_hadoop
[hadoop@server1 tmp]$ rm -fr *
2.新開兩個(gè)虛擬機(jī)富弦,當(dāng)做節(jié)點(diǎn)
[root@server2 ~]# useradd -u 1000 hadoop
[root@server3 ~]# useradd -u 1000 hadoop
[root@server1 ~]# yum install -y nfs-utils
[root@server2 ~]# yum install -y nfs-utils
[root@server3 ~]# yum install -y nfs-utils
[root@server1 ~]# systemctl start rpcbind
[root@server2 ~]# systemctl start rpcbind
[root@server3 ~]# systemctl start rpcbind
3.server1開啟服務(wù),配置
[root@server1 ~]# systemctl start nfs-server
[root@server1 ~]# vim /etc/exports
/home/hadoop? *(rw,anonuid=1000,anongid=1000)
[root@server1 ~]# exportfs -rv
exporting *:/home/hadoop
[root@server1 ~]# showmount -e
Export list for server1:
/home/hadoop *
4.server2,3掛載
[root@server2 ~]# mount 172.25.68.1:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
Filesystem? ? ? ? ? ? ? 1K-blocks? ? Used Available Use% Mounted on
/dev/mapper/rhel-root? ? 10258432 1097104? 9161328? 11% /
devtmpfs? ? ? ? ? ? ? ? ? ? 497292? ? ? 0? ? 497292? 0% /dev
tmpfs? ? ? ? ? ? ? ? ? ? ? 508264? ? ? 0? ? 508264? 0% /dev/shm
tmpfs? ? ? ? ? ? ? ? ? ? ? 508264? 13072? ? 495192? 3% /run
tmpfs? ? ? ? ? ? ? ? ? ? ? 508264? ? ? 0? ? 508264? 0% /sys/fs/cgroup
/dev/sda1? ? ? ? ? ? ? ? ? 1038336? 141516? ? 896820? 14% /boot
tmpfs? ? ? ? ? ? ? ? ? ? ? 101656? ? ? 0? ? 101656? 0% /run/user/0
172.25.68.1:/home/hadoop? 10258432 2796544? 7461888? 28% /home/hadoop
[root@server3 ~]# mount 172.25.68.1:/home/hadoop/ /home/hadoop/
5.server1免密登陸server2和server3
[root@server1 tmp]# su - hadoop
Last login: Tue May 28 10:17:23 CST 2019 on pts/0
[hadoop@server1 ~]$ ssh 172.25.68.2
[hadoop@server2 ~]$ logout
Connection to 172.25.68.2 closed.
[hadoop@server1 ~]$ ssh 172.25.68.3
[hadoop@server3 ~]$ logout
Connection to 172.25.68.3 closed.
6.編輯文件(server1做namenode,server2和server3做datanode)
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server1 hadoop]$ vim core-site.xml
<configuration>
? ? <property>
? ? ? ? <name>fs.defaultFS</name>
? ? ? ? <value>hdfs://172.25.68.1:9000</value>
? ? </property>
</configuration>
[hadoop@server1 hadoop]$ vim hdfs-site.xml
<configuration>
? ? <property>
? ? ? ? <name>dfs.replication</name>
? ? ? ? <value>2</value>
? ? </property>
</configuration>
[hadoop@server1 hadoop]$ vim workers
[hadoop@server1 hadoop]$ cat workers
172.25.68.2
172.25.68.3
7.格式化氛驮,并啟動(dòng)服務(wù)
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
Starting namenodes on [server1]
Starting datanodes
Starting secondary namenodes [server1]
[hadoop@server1 hadoop]$ jps #出現(xiàn)SecondaryNameNode
4673 SecondaryNameNode
4451 NameNode
4787 Jps
從節(jié)點(diǎn)可以是datanode
[hadoop@server2 hadoop]$ jps
2384 DataNode
2447 Jps
[hadoop@server3 hadoop]$ jps
2386 DataNode
2447 Jps
8.測(cè)試
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir -p /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir input
[hadoop@server1 hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output 'dfs[a-z.]+'
網(wǎng)頁上查看,有兩個(gè)節(jié)點(diǎn)济似,且數(shù)據(jù)已經(jīng)上傳
9.添加節(jié)點(diǎn)server4
[root@server4 ~]# useradd -u? 1000 hadoop
[root@server4 ~]# yum? install -y? nfs-utils
[root@server4 ~]# systemctl start rpcbind
[root@server4 ~]# mount 172.25.68.1:/home/hadoop /home/hadoop
[root@server4 ~]# su - hadoop
[hadoop@server4 hadoop]$ pwd
/home/hadoop/hadoop/etc/hadoop
[hadoop@server4 hadoop]$ vim workers
172.25.68.2
172.25.68.3
172.25.68.4
[hadoop@server4 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ sbin/start-dfs.sh
[hadoop@server4 hadoop]$ jps
3029 DataNode
3081 Jps
瀏覽器查看矫废,節(jié)點(diǎn)添加成功
server4上傳文件
[hadoop@server4 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 15.8634 s, 33.1 MB/s
[hadoop@server4 hadoop]$ bin/hdfs dfs -put bigfile
如果你對(duì)編程感興趣或者想往編程方向發(fā)展盏缤,可以關(guān)注微信公眾號(hào)【筑夢(mèng)編程】,大家一起交流討論蓖扑!小編也會(huì)每天定時(shí)更新既有趣又有用的編程知識(shí)唉铜!