1.修改hdfs配置
在兩個集群的hdfs-site.xml中添加以下內(nèi)容:
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
<description></description>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.http-bind-host</name>
<value>0.0.0.0</value>
<description></description>
</property>
<property>
<name>dfs.namenode.https-bind-host</name>
<value>0.0.0.0</value>
<description></description>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when connecting to datanodes.
</description>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
<description>Whether datanodes should use datanode hostnames when connecting to other datanodes for data transfer.</description>
</property>
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>*</value>
<description></description>
</property>
2.配置兩個集群的hosts
由于hadoop集群之間數(shù)據(jù)遷移是分布式數(shù)據(jù)傳輸,要求兩個集群的主機都能識別對方集群的主機名窥岩,因此需要在兩個集群的各個主機的/etc/hosts文件中,都配置兩個集群所有主機的ip主機名映射沐绒。
3.在兩個集群中創(chuàng)建相同加密算法的共享principal
在源集群ZETA_RANGER.COM中:
kadmin.local: addprinc krbtgt/ZETA_RANGER.COM@PANEL.COM
kadmin.local: addprinc krbtgt/PANEL.COM@ZETA_RANGER.COM
在目的集群PANEL.COM中:
kadmin.local: addprinc krbtgt/ZETA_RANGER.COM@PANEL.COM
kadmin.local: addprinc krbtgt/PANEL.COM@ZETA_RANGER.COM
注意:如果兩個集群的kdc的默認加密算法不同茄唐,需要在addprinc時指定相同的加密算法,如:
kadmin.local: addprinc -e "aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal" krbtgt/ZETA_RANGER.COM@PANEL.COM
驗證兩個集群中新增后的principal加密算法是否相同:
kadmin.local: getprinc krbtgt/ZETA_RANGER.COM@PANEL.COM
kadmin.local: getprinc krbtgt/PANEL.COM@ZETA_RANGER.COM
4.在兩個hdfs集群中增加彼此域中受信任的principal的命名匹配規(guī)則
在core-site.xml中通過hadoop.security.auth_to_local配置項增加受信任的kdc域的principal命名匹配規(guī)則列吼,即讓源集群和目的集群都信任對方的principal,增加如下內(nèi)容:
RULE:[2:$1@$0]([ndj]n/.*@PANEL.COM)s/.*/hdfs/
RULE:[2:$1@$0](hdfs/.*@PANEL.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive/.*@PANEL.COM)s/.*/hive/
RULE:[2:$1@$0]([ndj]n@PANEL.COM)s/.*/hdfs/
RULE:[2:$1@$0](hdfs@PANEL.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@PANEL.COM)s/.*/hive/
RULE:[2:$1@$0]([nd]n@ZETA_RANGER.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@ZETA_RANGER.COM)s/.*/hive/
RULE:[2:$1@$0]([nd]n/.*@ZETA_RANGER.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive/.*@ZETA_RANGER.COM)s/.*/hive/
RULE:[1:$1@$0](^.*@ZETA_RANGER.COM$)s/^(.*)@ZETA_RANGER.COM$/$1/g
RULE:[2:$1@$0](^.*@ZETA_RANGER.COM$)s/^(.*)@ZETA_RANGER.COM$/$1/g
RULE:[1:$1@$0](^.*@PANEL.COM$)s/^(.*)@PANEL.COM$/$1/g
RULE:[2:$1@$0](^.*@PANEL.COM$)s/^(.*)@PANEL.COM$/$1/g
注意:為了保證對方kdc能夠識別并匹配到相應的principal苦始,這里最好將兩個集群中我們需要使用到的principal的命名規(guī)則都配置上寞钥。
5.修改兩個集群所有主機的krb5.conf配置文件
在源集群ZETA_RANGER.COM的所有節(jié)點的krb5.conf中增加:
[capaths]
ZETA_RANGER.COM = {
PANEL.COM = .
}
在目的集群PANEL.COM的所有節(jié)點的krb5.conf中增加:
[capaths]
PANEL.COM = {
ZETA_RANGER.COM = .
}
將[realms]中彼此的域添加到對方的文件中,類似這樣:
[realms]
PANEL.COM = {
kdc = cd-hadoop3-1
admin_server = cd-hadoop3-1
}
ZETA_RANGER.COM = {
kdc = sp-dev-1
admin_server = sp-dev-1
}
[domain_realm]增加如下配置陌选,讓所有的主機都能被對方kdc識別:
[domain_realm]
.panel.com = PANEL.COM
panel.com = PANEL.COM
.zeta_ranger.com = ZETA_RANGER.COM
zeta_ranger.com = ZETA_RANGER.COM
6.重啟兩個集群的kdc
service krb5kdc restart
service kadmin restart
7.重啟兩個集群的hdfs集群和yarn集群
注意: 必須重啟yarn理郑,否則提交MR任務會失敗
8.查看對方hdfs上的文件目錄
hadoop fs -ls hdfs://cd-hadoop3-1:8020
9.向?qū)Ψ絟dfs集群上傳文件
hadoop fs -put /tmp/test hdfs://cd-hadoop3-1:8020/tmp
10.使用distcp傳輸數(shù)據(jù)到對方hdfs集群
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true -m 30 hdfs://sp-dev-2:8020/tmp/test hdfs://cd-hadoop3-1:8020/tmp
注意:distcp實際上是運行的mr任務,如果向yarn提交任務的用戶受權限控制咨油,需要保證提交任務的用戶有相應的權限您炉。
————————————————
版權聲明:本文為CSDN博主「snail_bing」的原創(chuàng)文章,遵循CC 4.0 BY-SA版權協(xié)議役电,轉載請附上原文出處鏈接及本聲明赚爵。
原文鏈接:https://blog.csdn.net/snail_bing/article/details/120264129