1. SNN****(****secondNamenode****)
1.1****secondNamenode****工作機(jī)制
1.secondNamenode執(zhí)行checkpoint動(dòng)作的時(shí)候乖仇,namenode會(huì)停止使用當(dāng)前的edit文件515-516憾儒,會(huì)暫時(shí)將讀寫操作記錄到一個(gè)新的edit文件中 517
2.secondNamenode將namenode的fsImage 514 和 edits文件 515-516 遠(yuǎn)程下載到本地
3.secondNamenode將fsimage 514加載到內(nèi)存中,將 edits文件 515-516 內(nèi)容之內(nèi)存中從頭到尾的執(zhí)行一次乃沙,創(chuàng)建一個(gè)新的fsimage文件 516
4.secondNamenode將新的fsimage 516推送給namenode
5.namenode接受到fsimage 516.ckpt 滾動(dòng)為fsimage 516起趾,新的edit文件中 517.new 滾動(dòng)為 edit 517 是一份最新edits文件
1.2 secondNamenode**** 學(xué)習(xí)的價(jià)值
SNN操作流程 一般主要是面試,但是一定要了解 幫助對(duì)hdfs的底層實(shí)現(xiàn)基本掌握警儒。
生產(chǎn)上我們是不用secondNamenode 训裆,是用HDFS HA (熱備)
會(huì)有兩個(gè)namenode
NN active NN standby 熱備
2. hadoop命令
2.1 hadoop 命令來源
[hadoop@ruozedata001 bin]$ ./hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
s3guard manage data on S3
trace view and modify Hadoop tracing settings
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
2.2 hadoop 常見的壓縮格式
hadoop: zlib: snappy: lz4: bzip2: openssl:
2.3 查看****是否支持壓縮
[hadoop@ruozedata001 bin]$ hadoop checknative
20/11/28 20:51:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
編譯: https://blog.csdn.net/u010452388/article/details/99691421
涉及到maven
執(zhí)行或者程序拋異常
[hadoop@ruozedata001 bin]$ hadoop classpath
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/etc/hadoop:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/hdfs/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/yarn/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/lib/*:/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce/*:/home/hadoop/app/hadoop/contrib/capacity-scheduler/*.jar
[hadoop@ruozedata001 bin]$
http://cn.voidcc.com/question/p-tenieuea-bex.html
3. hdfs命令
3.1 hdfs命令的來源
[hadoop@ruozedata001 bin]$ ./hdfs
Usage: hdfs [--config confdir] COMMAND
where COMMAND is one of:
dfs run a filesystem command on the file systems supported in Hadoop.
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
journalnode run the DFS journalnode
zkfc run the ZK Failover Controller daemon
datanode run a DFS datanode
dfsadmin run a DFS admin client
diskbalancer Distributes data evenly among disks on a given node
haadmin run a DFS HA admin client
fsck run a DFS filesystem checking utility
balancer run a cluster balancing utility
jmxget get JMX exported values from NameNode or DataNode.
mover run a utility to move block replicas across
storage types
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory contents with a snapshot
lsSnapshottableDir list all snapshottable dirs owned by the current user
Use -help to see options
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies list/get/set block storage policies
version print the version
Most commands print help when invoked w/o parameters.
[hadoop@ruozedata001 bin]$
3.2 溫馨提示
hadoop fs 和 hdfs dfs 是等價(jià)的
腳本里面執(zhí)行的內(nèi)容是一樣的
Hadoop fs
# the core commands
if [ "$COMMAND" = "fs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
Hdfs dfs
elif [ "$COMMAND" = "dfs" ] ; then
CLASS=org.apache.hadoop.fs.FsShell
hdfs dfs命令:
Usage: hadoop fs [generic options]
[-cat [-ignoreCrc] <src> ...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>] 等價(jià)于put
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] 等價(jià)于get
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-du [-s] [-h] [-x] <path> ...]
[-find <path> ... <expression> ...]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-mv <src> ... <dst>] 【生產(chǎn)上不建議使用移動(dòng),原因是移動(dòng)過程中假如有問題蜀铲,會(huì)導(dǎo)致數(shù)據(jù)不全边琉。建議是使用cp ,驗(yàn)證通過记劝,再去刪除源端】
[-rm [-f] [-r|-R] [-skipTrash] <src> ...] 【-skipTrash 不建議使用】
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
3.3.dfs管理操作命令
[hadoop@ruozedata001 bin]$ hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
[-report [-live] [-dead] [-decommissioning]]
[-safemode <enter | leave | get | wait>] 【安全模式】
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true|false|check]
[-refreshNodes]
[-setQuota <quota> <dirname>...<dirname>]
[-clrQuota <dirname>...<dirname>]
[-setSpaceQuota <quota> <dirname>...<dirname>]
[-clrSpaceQuota <dirname>...<dirname>]
[-finalizeUpgrade]
[-rollingUpgrade [<query|prepare|finalize>]]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh <host:ipc_port> <key> [arg1..argn]
[-reconfig <datanode|...> <host:ipc_port> <start|status|properties>]
[-printTopology]
[-refreshNamenodes datanode_host:ipc_port]
[-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
[-setBalancerBandwidth <bandwidth in bytes per second>]
[-fetchImage <local directory>]
[-allowSnapshot <snapshotDir>]
[-disallowSnapshot <snapshotDir>]
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-metasave filename]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-listOpenFiles [-blockingDecommission] [-path <path>]]
[-help [cmd]]
3.4 ** shell腳本封裝****,****獲取HA切換狀態(tài)預(yù)警腳本**
高級(jí)班 shell腳本封裝 獲取HA切換狀態(tài)預(yù)警腳本
[hadoop@ruozedata001 bin]$ hdfs haadmin
Usage: DFSHAAdmin [-ns <nameserviceId>]
[-transitionToActive <serviceId> [--forceactive]]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
getconf get config values from configuration
健康檢查
[hadoop@ruozedata001 bin]$ hdfs fsck /
20/11/28 21:14:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://ruozedata001:50070/fsck?ugi=hadoop&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.0.3 for path / at Sat Nov 28 21:14:58 CST 2020
.
/1.log: Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741868_1044\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
.
/2.log: Under replicated BP-1245831-192.168.0.3-1605965291938:blk_1073741869_1045\. Target Replicas is 2 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
.....................................Status: HEALTHY
Total size: 257432 B
Total dirs: 19
Total files: 39
Total symlinks: 0
Total blocks (validated): 37 (avg. block size 6957 B)
Minimally replicated blocks: 37 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 2 (5.4054055 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (5.1282053 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Sat Nov 28 21:14:58 CST 2020 in 4 milliseconds
The filesystem under path '/' is HEALTHY
[hadoop@ruozedata001 bin]$
4. 安全模式
[hadoop@ruozedata001 bin]$ hdfs dfsadmin -safemode get 【先開啟hdfs】
20/11/28 21:37:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
[hadoop@ruozedata001 bin]$
OFF關(guān)閉 讀寫都o(jì)k
ON開啟 寫不行变姨,讀ok
[hadoop@ruozedata001 bin]$ hdfs dfs -put 3.log /
20/11/28 21:39:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create file/3.log._COPYING_. Name node is in safe mode.
[hadoop@ruozedata001 bin]$
[hadoop@ruozedata001 bin]$
[hadoop@ruozedata001 bin]$ hdfs dfs -ls /
20/11/28 21:40:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:31 /1.log
-rw-r--r-- 2 hadoop supergroup 4 2020-11-25 22:33 /2.log
drwxr-xr-x - hadoop supergroup 0 2020-11-28 19:09 /system
drwx------ - hadoop supergroup 0 2020-11-22 19:52 /tmp
drwxr-xr-x - hadoop supergroup 0 2020-11-21 21:50 /user
drwxr-xr-x - hadoop supergroup 0 2020-11-22 19:52 /wordcount
[hadoop@ruozedata001 bin]$ hdfs dfs -cat /1.log
20/11/28 21:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
123
[hadoop@ruozedata001 bin]$
4.1 被動(dòng)--》安全模式
未來必然hdfs查看日志出現(xiàn)安全模式的英文單詞,不要大驚小怪厌丑,
必然說明你的hdfs集群是有問題的定欧,相當(dāng)于處于一個(gè)保護(hù)模式
一般需要你嘗試手動(dòng)執(zhí)行命令,離開安全模式 【優(yōu)先操作】
4.2 主動(dòng)--》安全模式怒竿,做 維護(hù)操作
這個(gè)時(shí)間段保證hdfs不會(huì)有新數(shù)據(jù)進(jìn)入
5. 回收站
5.1設(shè)置回收站時(shí)間
hdfs-site.xml文件中:
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
72460=10080
[hadoop@ruozedata001 ~]$ hdfs dfs -rm /1.log
20/11/28 21:59:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/28 21:59:48 INFO fs.TrashPolicyDefault: Moved: 'hdfs://ruozedata001:9000/1.log' to trash at: hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log
hdfs://ruozedata001:9000/user/hadoop/.Trash/Current/1.log這個(gè)是回收站的地址
[hadoop@ruozedata001 ~]$ hdfs dfs -rm -skipTrash /2.log
20/11/28 22:00:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /2.log
[hadoop@ruozedata001 ~]$
【生產(chǎn)上必須要回收站砍鸠,且回收站默認(rèn)時(shí)間盡量長(zhǎng),7天耕驰;】
【涉及到刪除爷辱,不準(zhǔn)使用 -skipTrash,就是讓文件進(jìn)入回收站耍属,以防萬一 】
6. 各個(gè)節(jié)點(diǎn)平衡
[hadoop@ruozedata001 sbin]$ sh ./start-balancer.sh
[hadoop@ruozedata001 sbin]$ cat /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-balancer-ruozedata001.log
2020-11-28 22:07:35,135 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: namenodes = [hdfs://ruozedata001:9000]
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, run during upgrade = false]
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: included nodes = []
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: excluded nodes = []
2020-11-28 22:07:35,138 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: source nodes = []
2020-11-28 22:07:35,242 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2020-11-28 22:07:36,086 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2020-11-28 22:07:36,087 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2020-11-28 22:07:36,090 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2020-11-28 22:07:36,103 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.0.3:50010
2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 over-utilized: []
2020-11-28 22:07:36,104 INFO org.apache.hadoop.hdfs.server.balancer.Balancer: 0 underutilized: []
[hadoop@ruozedata001 sbin]$
threshold = 10.0
每個(gè)節(jié)點(diǎn)磁盤使用率-平均磁盤使用率<10%
第一個(gè)節(jié)點(diǎn) 90% -76% = 14% 多了4%
第二個(gè)節(jié)點(diǎn) 60% -76% = -16% 少-16%
第三個(gè)節(jié)點(diǎn) 80% -76%= 4% 滿足
230%/3=76%
【生產(chǎn)上托嚣,寫個(gè)定時(shí)腳本,每天晚上業(yè)務(wù)低谷去執(zhí)行一下】
./start-balancer.sh
參數(shù) dfs.datanode.balance.bandwidthPerSec 10m--》50m
控制數(shù)據(jù)平衡操作的帶寬大小
假如生產(chǎn)就3臺(tái)機(jī)器 3個(gè)副本厚骗,請(qǐng)問這個(gè)定時(shí)腳本有用嗎示启?沒有用
7.單個(gè)節(jié)點(diǎn)多塊磁盤平衡
7.1 設(shè)置 hdfs-site-xml
<property>
<name>dfs.datanode.data.dir </name>
<value>/data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn</value>
</property>
/data01 100G
/data02 200G
/data03 490G
[hadoop@ruozedata001 sbin]$ hdfs diskbalancer
usage: hdfs diskbalancer [command] [options]
DiskBalancer distributes data evenly between different disks on a
datanode. DiskBalancer operates by generating a plan, that tells datanode
how to move data between disks. Users can execute a plan by submitting it
to the datanode.
To get specific help on a particular command please run
hdfs diskbalancer -help <command>.
--help <arg> valid commands are plan | execute | query | cancel |
report
[hadoop@ruozedata001 sbin]$
Apache hadoop2.x 沒戲 不支持 dfs.disk.balancer.enabled 搜索不到
https://hadoop.apache.org/docs/r2.10.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Apache hadoop3.x 支持 dfs.disk.balancer.enabled 搜索到 是true
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
CDH hadoop2.x 支持 dfs.disk.balancer.enabled 搜索到 是false
如何去執(zhí)行呢?
文檔:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html
<property>
<name>dfs.disk.balancer.enabled</name>
<value>true</value>
</property>
[hadoop@ruozedata001 hadoop]$ hdfs diskbalancer -plan ruozedata001
20/11/28 22:37:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/11/28 22:37:02 INFO planner.GreedyPlanner: Starting plan for Node : ruozedata001:50020
20/11/28 22:37:02 INFO planner.GreedyPlanner: Compute Plan for Node : ruozedata001:50020 took 1 ms
20/11/28 22:37:03 INFO command.Command: No plan generated. DiskBalancing not needed for node: ruozedata001 threshold used: 10.0
hdfs diskbalancer -execute ruozedata001.plan.json 執(zhí)行
hdfs diskbalancer -query ruozedata001
生產(chǎn)
【生產(chǎn)上,寫個(gè)定時(shí)腳本领舰,每日晚上業(yè)務(wù)低谷去執(zhí)行一下】
8.總結(jié):
1.先自己分析夫嗓,必須找到log-->error
2.百度谷歌搜索
3.問老師,問同事冲秽,問群友
4.apache issue網(wǎng)站
5.源代碼導(dǎo)入idea debug
6.如何找到log文件:
配置文件 my.cnf data/hostname.err文件
當(dāng)前目錄的logs文件夾
/var/log
ps -ef 查看進(jìn)程描述
作業(yè):
1.snn整理
2.hadoop hdfs命令梳理
3.如上四個(gè)整理
4.寫到博客
5.編譯hadoop 支持壓縮