Hadoop （Version 1.2.1橱鹏，JDK6）hdfs 安裝

Hadoop的框架最核心的設計就是：HDFS（Hadoop Distributed File System）和MapReduce崔泵。HDFS為海量的數(shù)據(jù)提供了存儲颠通，則MapReduce為海量的數(shù)據(jù)提供了計算。
各版本hadoop文檔地址

安裝前準備

hadoop1.2.1編譯包
將編譯包上傳到其中一臺機器
準備5臺機器致讥，虛擬機[此處為192.168.10.215~192.168.10.217]
5臺機器全部關閉防火墻:

[root@dn3 ~]# systemctl stop firewalld.service
[root@dn3 ~]# systemctl disable firewalld.service
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@dn3 ~]# iptables -F
[root@dn3 ~]# vi /etc/selinux/config
#SELINUX=enforcing #注釋掉
#SELINUXTYPE=targeted #注釋掉
#SELINUX=disabled #增加 
# 千萬不能寫成SELINUXTYPE=disabled仅仆，如果你這么寫了，
# 你可能需要這個：http://www.mamicode.com/info-detail-1847013.html
[root@dn3 ~]# setenforce 0
# 重啟所有虛擬機
[root@dn3 ~]# reboot
[root@dn3 ~]# iptables -F

修改主機名[此處使用的是centos7的命令垢袱，其他版本的linux請自行百度]hostnamectl set-hostname 主機名墓拜，這里將NN節(jié)點的主機名設為NN，SNN節(jié)點的主機名為snn,dn節(jié)點有三個请契，分別為dn1咳榜，dn2，dn3
修改hosts文件

127.0.0.1       localhost       localhost
192.168.10.219  nn              nn
192.168.10.218  snn             snn
192.168.10.217  dn1             dn1
192.168.10.216  dn2             dn2
192.168.10.215  dn3             dn3

查看與你的hadoop兼容的jdk版本,hadoop版本為1.2.1爽锥，對應的jdk兼容版本為6涌韩，并且在官方的wiki上有說明It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE.，所以直接在你的5臺機器上yum -y install java-1.6.0-openjdk.x86_64救恨，查看jdk的安裝路徑可以使用：

[root@localhost hadoop]# rpm -ql java-1.6.0-openjdk.x86_64
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/java
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/keytool
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/orbd
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/pack200
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/policytool
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/rmid
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/rmiregistry
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/servertool
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/tnameserv
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/unpack200
# 由此可知jdk安裝在/usr/lib/jvm下
[root@localhost hadoop]ls /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin
java  keytool  orbd  pack200  policytool  rmid  rmiregistry  servertool  tnameserv  unpack200

關于免密登陸：
- 實際上免密登陸會有倆次請求贸辈，并不是真的不需要密碼，假設A要免密登陸到B
- A第一次請求會將自身的ip地址和公鑰發(fā)送給B（pub key）肠槽，B收到ip和公鑰之后與authorized_keys中的公鑰進行比較擎淤，若一致則認為該機器可以免密登陸到B，接著B會將自身的密碼發(fā)送給A
- A第二次請求將用戶名和密碼發(fā)送給B秸仙，B響應登陸成功
- 也就是說嘴拢，假如A要登陸到B，B就必須要有A的公鑰

選擇一臺機器可以以免密登陸到其他4臺機器寂纪，當然自身也需要免密登陸[可選]席吴，如果不選的話赌结，在hdfs啟動過程中會暫停問你索要密碼，才會繼續(xù)孝冒，這里選的是nn節(jié)點[可選任意節(jié)點柬姚，不一定是那么name node]，自身免密登陸庄涡，同時可以免密登陸到其他四臺機器

  # 安裝ssh(因為我的系統(tǒng)自帶ssh,所以nothing to do)###################################
  [root@localhost yum.repos.d]# yum install -y sshd
  Loaded plugins: fastestmirror
  Loading mirror speeds from cached hostfile
   * base: mirrors.aliyun.com
   * extras: mirrors.tuna.tsinghua.edu.cn
   * updates: mirrors.aliyun.com
  No package ssh available.
  Error: Nothing to do
  # 安裝rsync######################################################################
  [root@localhost yum.repos.d]# yum install -y rsync
  Loaded plugins: fastestmirror
  Loading mirror speeds from cached hostfile
   * base: mirrors.aliyun.com
   * extras: mirrors.tuna.tsinghua.edu.cn
   * updates: mirrors.aliyun.com
  Resolving Dependencies
  --> Running transaction check
  ---> Package rsync.x86_64 0:3.0.9-18.el7 will be installed
  --> Finished Dependency Resolution

  Dependencies Resolved

  =========================================================================================
   Package           Arch               Version                     Repository        Size
  =========================================================================================
  Installing:
   rsync             x86_64             3.0.9-18.el7                base             360 k

  Transaction Summary
  =========================================================================================
  Install  1 Package

  Total download size: 360 k
  Installed size: 732 k
  Downloading packages:
  rsync-3.0.9-18.el7.x86_64.rpm                                     | 360 kB  00:00:00     
  Running transaction check
  Running transaction test
  Transaction test succeeded
  Running transaction
    Installing : rsync-3.0.9-18.el7.x86_64                                             1/1 
    Verifying  : rsync-3.0.9-18.el7.x86_64                                             1/1 

  Installed:
    rsync.x86_64 0:3.0.9-18.el7                                                            

  Complete!
  [root@localhost yum.repos.d]# ssh localhost
  The authenticity of host 'localhost (::1)' can't be established.
  ECDSA key fingerprint is SHA256:3h7izAi6QdCeHwDrb8PdeeoMzaJH0zP4n75SQBxlSr8.
  ECDSA key fingerprint is MD5:3a:e3:ca:15:c7:24:cf:56:37:27:31:70:14:70:d5:01.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
  root@localhost's password: 
  #生成秘鑰######################################################################
  [root@localhost yum.repos.d]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  Generating public/private rsa key pair.
  Your identification has been saved in /root/.ssh/id_rsa.
  Your public key has been saved in /root/.ssh/id_rsa.pub.
  The key fingerprint is:
  SHA256:0kVvd1pHxg8NSCq84SRl2Yi0Zr48LjA8LNz4pxx3Cms root@localhost.localdomain
  The key's randomart image is:
  +---[RSA 2048]----+
  |     ...o+.....+o|
  |      .=o..o. .oo|
  |      = = o o ..=|
  |     + = = . . +o|
  |.oo   o S     .  |
  |.o*. . o         |
  | ..* .+.         |
  |  .E*oo.         |
  |  .+oo.          |
  +----[SHA256]-----+
  #配置authorized_keys量承，公鑰追加到本地的認證文件中#################################################
  [root@localhost yum.repos.d]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  # 授權######################################################################
  [root@localhost yum.repos.d]# chmod 0600 ~/.ssh/authorized_keys
  # 自身免密登陸測試######################################################################
  [root@localhost yum.repos.d]# ssh localhost
  Last login: Wed Dec 20 22:36:50 2017 from 192.168.10.211
  # 退出######################################################################
  [root@localhost ~]# exit
  logout
  Connection to localhost closed.
  [root@localhost yum.repos.d]#

設置其他三臺機器可以使用219免密登陸[將219的公鑰拷貝給其他機器，并加入到認證文件穴店，同時自身也可以免登陸撕捍，如果不設置的話會有警告]

示例

[root@snn ~]# scp nn:/root/.ssh/id_rsa.pub /root/219_id_rsa.pub
The authenticity of host 'nn (192.168.10.219)' can't be established.
ECDSA key fingerprint is SHA256:3h7izAi6QdCeHwDrb8PdeeoMzaJH0zP4n75SQBxlSr8.
ECDSA key fingerprint is MD5:3a:e3:ca:15:c7:24:cf:56:37:27:31:70:14:70:d5:01.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'nn,192.168.10.219' (ECDSA) to the list of known hosts.
root@nn's password: 
id_rsa.pub                                                                      100%  389    80.7KB/s   00:00
[root@dn1 ~]# cat /root/219_id_rsa.pub >> ~/.ssh/authorized_keys

配置hadoop根目錄/conf下的core-site.xml文件

<configuration>
    <!-- hdfs訪問接口地址，即name node的訪問地址 -->
     <property>
         <name>fs.default.name</name>
         <value>hdfs://nn:9000</value>
     </property>
    <!-- hadoop工作基礎存放地點泣洞，包括namenode的工作目錄 -->
     <property>
         <name>hadoop.tmp.dir</name>
         <value>/opt/hadoop</value>
     </property>
</configuration>

配置hadoop根目錄下的hdfs-site.xml

<configuration>
     <property>
        <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
         <name>dfs.replication</name>
         <value>3</value>
     </property>
</configuration>

# 配置數(shù)據(jù)節(jié)點忧风，ip或者主機名稱
dn1
dn2
dn3

# 配置second name node節(jié)點，ip或者主機名稱
snn

那么球凰，name node主機是在哪配呢狮腿？看core-site.xml,配置了nn節(jié)點的訪問地址

添加HADOOP_HOME到/etc/profile

# 填你的hadoop安裝路徑
export HADOOP_HOME=/root/hadoop-1.2.1
export PATH=$HADOOP_HOME/bin:$PATH

source /etc/profile
hadoop namenode -format
start-dfs.sh
python安裝腳本
服務器配置文件

ip,user,pwd,hostname,nodes
192.168.10.215,root,mico,dn1,DN
192.168.10.216,root,mico,dn2,DN
192.168.10.217,root,mico,dn3,DN
192.168.10.218,root,mico,snn,SNN
192.168.10.219,root,mico,nn,NN

軟件配置文件

software
java-1.6.0-openjdk.x86_64
rsync

版本和路徑配置文件

;軟件版本說明
[version]
;hadoop版本
hadoop=1.2.1
;openjdk的版本
openjdk=1.6.0

;服務器路徑相關
[path]
hadoop=/opt/

python 腳本

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 關閉防火墻systemctl stop firewalld.service
#禁止防火墻開機自啟systemctl disable firewalld.service
#禁用selinux vi /etc/selinux/config
# selinux 生效setenforce 0
# 設定主機名
# 修改host文件
# 重啟 reboot
# 安裝jdk
# 免密登陸
# yum install -y ssh
# yum install -y rsync
# 生成公鑰ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# 追加到本地文件中
# 免密登陸測試

# 解壓hadoop壓縮包
# 配置hadoop-env.sh 配置JAVA_HOME
# 配置core-site.xml：
#     <configuration>
#         <!-- hdfs訪問接口地址，即name node的訪問地址 -->
#          <property>
#              <name>fs.default.name</name>
#              <value>hdfs://nn:9000</value>
#          </property>
#         <!-- hadoop工作基礎存放地點弟蚀，包括namenode的工作目錄 -->
#          <property>
#              <name>hadoop.tmp.dir</name>
#              <value>/opt/hadoop</value>
#          </property>
#     </configuration>
# 配置hdfs-site.xml
#     <configuration>
#          <property>
#             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
#              <name>dfs.replication</name>
#              <value>3</value>
#          </property>
#     </configuration>
# 配置數(shù)據(jù)節(jié)點slaves蚤霞，ip或者主機名稱
#     dn1
#     dn2
#     dn3
# 配置second name node節(jié)點masters，ip或者主機名稱
#     snn
# 將hadoop加入到系統(tǒng)環(huán)境變量
# hadoop namenode -format
# start-dfs.sh

import csv
import paramiko
import os
from www.colorprint.color_print import ColorPrint
import configparser
import urllib.request
from tqdm import tqdm
import tarfile
from lxml import etree


# 第一行插入文字
def insert_first_line(file,text):
    with open(file, 'r+',encoding="utf8") as f:
        content = f.read()
        f.seek(0, 0)
        f.write(text + content)


# 逐行寫文件
def write_lines(file, lines):
    with open(file, 'w', encoding='utf8') as f:
        for line in lines:
            if line == lines[-1]:
                f.write(line)
            else:
                f.write(line+"\n")


# 下載進度條
def process_hook(t):
    last_b = [0]

    def inner(b=1, bsize=1, tsize=None):
        """
        b  : int, optional
            Number of blocks just transferred [default: 1].
        bsize  : int, optional
            Size of each block (in tqdm units) [default: 1].
        tsize  : int, optional
            Total size (in tqdm units). If [default: None] remains unchanged.
        """
        if tsize is not None:
            t.total = tsize
        t.update((b - last_b[0]) * bsize)
        last_b[0] = b

    return inner


conf = configparser.ConfigParser()


# 獲取ini配置文件值
def read_ini(file, section, _name_):
    conf.read(file,encoding="utf8")
    return conf.get(section, _name_)


# 讀取csv文件
def read(file):
    config_ = []
    with open(file, encoding='utf8') as f:
        f_csv = csv.reader(f)
        next(f_csv)
        for row in f_csv:
            config_.append(row)
    return config_

servers = read("./server.csv")
data_nodes = []
name_node = []
second_name_node = []
hadoop_home = ""
hadoop_name = ""
hadoop_file_name = ""


# 返回一個字符串义钉，單個命令,返回該命令的所有行
def ssh(ip, user_name, pass_wd, cmd):
    result_str = ""
    err_str = ""
    try:
        ssh_client = paramiko.SSHClient()
        ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        ssh_client.connect(ip, 22, user_name, pass_wd, timeout=5)
        # print ("執(zhí)行遠程命令:服務器ip:%s,命令:%s" %(ip,cmd))
        std_in, stdout, stderr = ssh_client.exec_command(cmd)
        #           stdin.write("Y")   #簡單交互昧绣，輸入 ‘Y’
        out = stdout.readlines()
        err = stderr.readline()
        # 屏幕輸出
        for o in out:
            result_str += o
            print(o)
        for e in err:
            err_str += e
        if(len(err_str) != 0):
            print(err_str)
        # print('%s\t執(zhí)行完畢\n' % ip)
        # print("result_str:"+result_str)
        ssh_client.close()
    except Exception as e:
        print('%s\tError:%s\n' % (ip, e))

    return result_str


def upload(host_ip, username, password, local_path, remote_path):
    t = paramiko.Transport((host_ip, 22))
    t.connect(username=username, password=password)  # 登錄遠程服務器
    sftp = paramiko.SFTPClient.from_transport(t)  # sftp傳輸協(xié)議
    sftp.put(local_path, remote_path)
    t.close()


def download(host_ip, username, password, remote_path, local_path):
    t = paramiko.Transport((host_ip, 22))
    t.connect(username=username, password=password)  # 登錄遠程服務器
    sftp = paramiko.SFTPClient.from_transport(t)  # sftp傳輸協(xié)議
    src = remote_path
    des = local_path
    sftp.get(src, des)
    t.close()


# 關閉防火墻
def stop_firewall():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "systemctl stop firewalld.service")
        print(rs)


#禁止防火墻開機自啟systemctl disable firewalld.service
def disable_firewall():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "systemctl disable firewalld.service")
        print(rs)


# 禁用selinux vi /etc/selinux/config sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
def edit_selinux_config():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config")
        print(rs)


# selinux 生效setenforce 0
def selinux_config_effective():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "setenforce 0")
        print(rs)


# 設定主機名
def set_host_name():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "hostnamectl set-hostname "+ l[3])
        print(rs)


# 修改host文件
def edit_host_file():
    for l in servers:
        cmd = "echo \'"+l[0]+"  "+l[3]+"    "+l[3]+"'>>/etc/hosts"
        rs = ssh(l[0], l[1], l[2], cmd)
        print(rs)


# 重啟 reboot
def reboot():
    for l in servers:
        cmd = "reboot"
        ssh(l[0], l[1], l[2], cmd)


# 等待啟動
def wait_start_up():
    while(True):
        all_str = ""
        try:
            for l in servers:
                rs = ssh(l[0], l[1], l[2], "ls /etc/hosts")
                if rs == '':
                    rs = 'ERROR'
                all_str += rs
        except Exception as e:
            continue
        else:
            if all_str.find("ERROR")!=-1:
                continue
            else:
                break


def execute_cmd(cmd):
    for l in servers:
        ssh(l[0], l[1], l[2], cmd)


# 安裝jdk yum -y install java-1.6.0-openjdk.x86_64
# yum install -y ssh
# yum install -y rsync
def install_software():
    yum_y_install_s = "yum -y install " + "rsync"
    execute_cmd(yum_y_install_s)
    jdk_version = read_ini("./config.ini", "version", "openjdk")
    yum_y_install_s = "yum -y install " + "yum -y install java-"+jdk_version+"-openjdk.x86_64"
    execute_cmd(yum_y_install_s)


# 免密登陸,nn節(jié)點能免密登陸到其他機器
def silent_login():
    ColorPrint.print_info("數(shù)據(jù)節(jié)點:"+str(data_nodes))
    ColorPrint.print_info("NameNode:"+str(name_node))
    ColorPrint.print_info("SecondNameNode:"+str(second_name_node))
    gen_pubkey()
    for l in servers:
        if name_node[0] == l[0]:
            continue
        else:
            append_auth_keys(name_node,l)


# 復制source 的公鑰，傳到其des服務器捶闸，使source 可以免密登陸到des服務器
def append_auth_keys(source,des):
    # 復制source 的公鑰夜畴，傳到其des服務器，使source 可以免密登陸到des服務器
    ColorPrint.print_info(str(source[0]) + "公鑰復制到：" + str(des[0]))
    download(source[0], source[1], source[2], "/" + source[1] + "/.ssh/id_rsa.pub",source[0] + "id_rsa.pub")
    local_pub = source[0] + "id_rsa.pub"
    up_pub = "/" + source[1] + "/.ssh/" + source[0] + "id_rsa.pub"
    upload(des[0], des[1], des[2], local_pub, up_pub)
    append_keys = "cat /" + source[1] + "/.ssh/" + source[0] + "id_rsa.pub >> ~/.ssh/authorized_keys"
    ssh(des[0], des[1], des[2], append_keys)
    ColorPrint.print_info(str(source[0]) + "公鑰復制到：" + str(des[0]) + "成功删壮！")
    os.remove(local_pub)
    remove_des_keys = "rm -f " + up_pub
    ssh(des[0], des[1], des[2], remove_des_keys)


# 所有節(jié)點生成公鑰
def gen_pubkey():
    ColorPrint.print_info("所有節(jié)點生成公鑰....")
    for l in servers:
        rm_rsa = "rm -f /" + l[1] + "/.ssh/id_rsa*"
        ssh(l[0], l[1], l[2], rm_rsa)
    for l in servers:
        gen_rsa = "ssh-keygen -t rsa -P '' -f /" + l[1] + "/.ssh/id_rsa"
        ssh(l[0], l[1], l[2], gen_rsa)
    ColorPrint.print_info("所有節(jié)點生成公鑰完成....")
    ColorPrint.print_info("所有節(jié)點自身公鑰添加到信任文件....")
    for l in servers:
        append_auth = "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
        ssh(l[0], l[1], l[2], append_auth)
    ColorPrint.print_info("所有節(jié)點自身公鑰添加到信任文件完成....")


def analysis_nodes():
    global name_node
    global second_name_node
    for l in servers:
        if 'NN' == l[4]:
            name_node = l
        elif 'SNN' == l[4]:
            second_name_node = l
        else:
            data_nodes.append(l)


def find_jdk(dest):
    find = "find / -name 'java' -type f -perm -111 |awk -F'bin' '{print $1}'"
    rs = ssh(dest[0],dest[1],dest[2],find)
    return rs


def fetch_hadoop():
    global hadoop_name
    global hadoop_file_name
    base_url = "http://mirrors.hust.edu.cn/apache/hadoop/common/"
    hadoop_version = read_ini("./config.ini", "version", "hadoop")
    base_name = "hadoop-"+hadoop_version
    hadoop_name = base_name
    base_name_dir = base_name + "/"
    base_url += base_name_dir
    file_name = base_name + ".tar.gz"
    hadoop_file_name = file_name
    base_url += file_name

    with tqdm(unit='B', unit_scale=True, leave=True, miniters=1,desc=file_name) as t:
        urllib.request.urlretrieve(base_url, filename=file_name, reporthook=process_hook(t), data=None)


# 解壓hadoop壓縮包
def un_tar_hadoop(file_name, base_dir="./"):
    with tarfile.open(file_name) as tar:
        tar.extractall(path=base_dir)
    # tar.extractall(path="./"+base_dir)
    os.remove(file_name)


# 編輯hadoop配置文件hadoop-env.sh
# export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre
def edit_hadoop_env(file_name,java_home):
    insert_first_line(file_name,"export JAVA_HOME="+java_home)


# 配置core-site.xml：
#     <configuration>
#         <!-- hdfs訪問接口地址贪绘，即name node的訪問地址 -->
#          <property>
#              <name>fs.default.name</name>
#              <value>hdfs://nn:9000</value>
#          </property>
#         <!-- hadoop工作基礎存放地點，包括namenode的工作目錄 -->
#          <property>
#              <name>hadoop.tmp.dir</name>
#              <value>/opt/hadoop</value>
#          </property>
#     </configuration>
def edit_core_site(file_path):
    tree = xml_parser(file_path)
    root = tree.getroot()
    name_property_node = gen_xml_node(root,'property')
    dir_property_node = gen_xml_node(root,'property')

    #         <!-- hdfs訪問接口地址央碟，即name node的訪問地址 -->
    #          <property>
    #              <name>fs.default.name</name>
    #              <value>hdfs://nn:9000</value>
    #          </property>
    gen_xml_node(name_property_node,'name','fs.default.name')
    gen_xml_node(name_property_node,'value','hdfs://nn:9000')
    #         <!-- hadoop工作基礎存放地點税灌，包括namenode的工作目錄 -->
    #          <property>
    #              <name>hadoop.tmp.dir</name>
    #              <value>/opt/hadoop</value>
    #          </property>
    gen_xml_node(dir_property_node,'name','hadoop.tmp.dir')
    gen_xml_node(dir_property_node,'value','/opt/hadoop')
    tree.write(file_path,pretty_print=True,xml_declaration = True, encoding='utf8')


def xml_parser(file_path):
    parser = etree.XMLParser(encoding="utf8", remove_blank_text=True)
    tree = etree.parse(file_path,parser)
    return tree


def gen_xml_node(parent_node, name, value=''):
    node = etree.Element(name)
    node.text = value
    parent_node.append(node)
    return node


# 配置hdfs-site.xml
#     <configuration>
#          <property>
#             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
#              <name>dfs.replication</name>
#              <value>3</value>
#          </property>
#     </configuration>\
def edit_hdfs_site(file_path):
    tree = xml_parser(file_path)
    root = tree.getroot()
    replication_node = gen_xml_node(root, 'property')
    #          <property>
    #             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
    #              <name>dfs.replication</name>
    #              <value>3</value>
    #          </property>
    gen_xml_node(replication_node, 'name', 'dfs.replication')
    gen_xml_node(replication_node, 'value', '3')
    tree.write(file_path, pretty_print=True, xml_declaration=True, encoding='utf8')


# 配置數(shù)據(jù)節(jié)點slaves，ip或者主機名稱
#     dn1
#     dn2
#     dn3
def edit_slaves(file_path, data_nodes):
    slaves = []
    for d in data_nodes:
        slaves.append(str(d[3]))
    write_lines(file_path,slaves)


# 配置second name node節(jié)點masters亿虽，ip或者主機名稱
#     snn
def edit_masters(file_path, second_name_node):
    masters = [str(second_name_node[3])]
    write_lines(file_path,masters)


# 壓縮編輯后的hadoop
def tar_hadoop(file_path,tar_name):
    tar_file(file_path,tar_name)


# 統(tǒng)計文件數(shù)量
def file_count(folder):
    file_counter = 0
    for dir_path, dirs, files in (os.walk(folder)):
        file_counter += len(files)
    return file_counter


# 壓縮文件
def tar_file(input_path, tar_file_name):
    tar = tarfile.open(tar_file_name, "w:gz")
    count = file_count(input_path)
    with tqdm(total=count, unit='file',desc="壓縮文件到"+tar_file_name) as pbar:
        for dir_path, dirs, files in os.walk(input_path):
            for filename in files:
                full_path = os.path.join(dir_path, filename)
                tar.add(full_path)
                pbar.update(1)
    tar.close()


########################################################################未測試
# 上傳壓縮后的hadoop到服務器
def upload_hadoop(local_path):
    global hadoop_home
    hadoop_prefix = read_ini("./config.ini", "path", "hadoop")
    sep = local_path.rfind("/")
    if sep != -1:
        file_name = local_path[sep+1:]
        hadoop_home = hadoop_prefix + file_name
    else:
        hadoop_home = hadoop_prefix + local_path
    for l in servers:
        upload(l[0], l[1], l[2], local_path, hadoop_home)


# 解壓縮服務器上的hadoop
def extract_server_hadoop(hadoop_tar_path):
    hadoop_prefix = read_ini("./config.ini", "path", "hadoop")
    for l in servers:
        cmd = "tar -C " + hadoop_prefix + " -xvf " + hadoop_tar_path
        ssh(l[0], l[1], l[2], cmd)


# 將hadoop加入到系統(tǒng)環(huán)境變量 sed -i '$a\hadoop_home' /etc/profile
def add_hadoop_home(hadoop_home):
    hadoop = hadoop_home.replace(".tar.gz","")
    hadoop_ = "export HADOOP_HOME=" + hadoop
    path = "export PATH=$HADOOP_HOME/bin:$PATH"
    execute_cmd("sed -i '$a\\" + hadoop_ + "' /etc/profile")
    execute_cmd("sed -i '$a\\" + path + "' /etc/profile")
    execute_cmd("source /etc/profile")
    bin_ = "chmod -R +x " + hadoop + "/bin/"
    execute_cmd(bin_)


# 初始化hdfs文件系統(tǒng) hadoop namenode -format
def format_hdfs(name_nodes):
    ssh(name_nodes[0], name_nodes[1], name_nodes[2], "hadoop namenode -format")


# 啟動hdfs start-dfs.sh
def start_hdfs(name_nodes):
    ssh(name_nodes[0], name_nodes[1], name_nodes[2], "start-dfs.sh")

if __name__ == "__main__":
    # # 關閉防火墻
    # stop_firewall()
    # # 禁用防火墻
    # disable_firewall()
    # # 配置禁用selinux
    # edit_selinux_config()
    # # selinux設置生效
    # selinux_config_effective()
    # # 編輯host文件菱涤，
    # edit_host_file();
    # # 重啟
    # reboot()
    # # 等待重啟
    # wait_start_up()
    # # 安裝軟件
    # install_software()
    # # NameNode免密登陸到其他其他
    # silent_login()
    # # 分析各個節(jié)點屬性
    analysis_nodes()
    ColorPrint.print_info("數(shù)據(jù)節(jié)點:" + str(data_nodes))
    ColorPrint.print_info("NameNode:" + str(name_node))
    ColorPrint.print_info("SecondNameNode:" + str(second_name_node))
    # 查找jdk版本洛勉，各個節(jié)點都是裝的open_jdk,路徑都一樣
    jdk_home = find_jdk(name_node)
    print("jdk",jdk_home)
    # # 下載hadoop
    fetch_hadoop()
    # # 解壓hadoop
    un_tar_hadoop(hadoop_file_name)
    # edit_hadoop_env("./testData/hadoop-env.sh",jdk_home)
    # edit_core_site("./testData/core-site.xml")
    # edit_hdfs_site("./testData/hdfs-site.xml")
    # edit_slaves("./testData/slaves",data_nodes)
    # edit_masters("./testData/masters",second_name_node)
    tar_hadoop(hadoop_name + "/",hadoop_file_name)
    upload_hadoop(hadoop_file_name)
    extract_server_hadoop(hadoop_home)
    add_hadoop_home(hadoop_home)
    format_hdfs(name_node)
    start_hdfs(name_node)

Hadoop （Version 1.2.1裹唆，JDK6）hdfs 安裝

Hadoop （Version 1.2.1，JDK6）hdfs 安裝