Hadoop (Version 1.2.1,JDK6)hdfs 安裝

  • Hadoop的框架最核心的設計就是:HDFS(Hadoop Distributed File System)和MapReduce崔泵。HDFS為海量的數(shù)據(jù)提供了存儲颠通,則MapReduce為海量的數(shù)據(jù)提供了計算。
  • 各版本hadoop文檔地址
  • 安裝前準備
    • hadoop1.2.1編譯包
    • 將編譯包上傳到其中一臺機器
    • 準備5臺機器致讥,虛擬機[此處為192.168.10.215~192.168.10.217]
    • 5臺機器全部關閉防火墻:
    [root@dn3 ~]# systemctl stop firewalld.service
    [root@dn3 ~]# systemctl disable firewalld.service
    Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    [root@dn3 ~]# iptables -F
    [root@dn3 ~]# vi /etc/selinux/config
    #SELINUX=enforcing #注釋掉
    #SELINUXTYPE=targeted #注釋掉
    #SELINUX=disabled #增加 
    # 千萬不能寫成SELINUXTYPE=disabled仅仆,如果你這么寫了,
    # 你可能需要這個:http://www.mamicode.com/info-detail-1847013.html
    [root@dn3 ~]# setenforce 0
    # 重啟所有虛擬機
    [root@dn3 ~]# reboot
    [root@dn3 ~]# iptables -F
    
    • 修改主機名[此處使用的是centos7的命令垢袱,其他版本的linux請自行百度]hostnamectl set-hostname 主機名墓拜,這里將NN節(jié)點的主機名設為NN,SNN節(jié)點的主機名為snn,dn節(jié)點有三個请契,分別為dn1咳榜,dn2,dn3
    • 修改hosts文件
    127.0.0.1       localhost       localhost
    192.168.10.219  nn              nn
    192.168.10.218  snn             snn
    192.168.10.217  dn1             dn1
    192.168.10.216  dn2             dn2
    192.168.10.215  dn3             dn3
    
    • 查看與你的hadoop兼容的jdk版本,hadoop版本為1.2.1爽锥,對應的jdk兼容版本為6涌韩,并且在官方的wiki上有說明It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE.,所以直接在你的5臺機器上yum -y install java-1.6.0-openjdk.x86_64救恨,查看jdk的安裝路徑可以使用:
      [root@localhost hadoop]# rpm -ql java-1.6.0-openjdk.x86_64
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/java
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/keytool
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/orbd
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/pack200
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/policytool
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/rmid
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/rmiregistry
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/servertool
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/tnameserv
      /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin/unpack200
      # 由此可知jdk安裝在/usr/lib/jvm下
      [root@localhost hadoop]ls /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre/bin
      java  keytool  orbd  pack200  policytool  rmid  rmiregistry  servertool  tnameserv  unpack200
      
      • 關于免密登陸:
        • 實際上免密登陸會有倆次請求贸辈,并不是真的不需要密碼,假設A要免密登陸到B
        • A第一次請求會將自身的ip地址和公鑰發(fā)送給B(pub key)肠槽,B收到ip和公鑰之后與authorized_keys中的公鑰進行比較擎淤,若一致則認為該機器可以免密登陸到B,接著B會將自身的密碼發(fā)送給A
        • A第二次請求將用戶名和密碼發(fā)送給B秸仙,B響應登陸成功
        • 也就是說嘴拢,假如A要登陸到B,B就必須要有A的公鑰
      • 選擇一臺機器可以以免密登陸到其他4臺機器寂纪,當然自身也需要免密登陸[可選]席吴,如果不選的話赌结,在hdfs啟動過程中會暫停問你索要密碼,才會繼續(xù)孝冒,這里選的是nn節(jié)點[可選任意節(jié)點柬姚,不一定是那么name node],自身免密登陸庄涡,同時可以免密登陸到其他四臺機器
          # 安裝ssh(因為我的系統(tǒng)自帶ssh,所以nothing to do)###################################
          [root@localhost yum.repos.d]# yum install -y sshd
          Loaded plugins: fastestmirror
          Loading mirror speeds from cached hostfile
           * base: mirrors.aliyun.com
           * extras: mirrors.tuna.tsinghua.edu.cn
           * updates: mirrors.aliyun.com
          No package ssh available.
          Error: Nothing to do
          # 安裝rsync######################################################################
          [root@localhost yum.repos.d]# yum install -y rsync
          Loaded plugins: fastestmirror
          Loading mirror speeds from cached hostfile
           * base: mirrors.aliyun.com
           * extras: mirrors.tuna.tsinghua.edu.cn
           * updates: mirrors.aliyun.com
          Resolving Dependencies
          --> Running transaction check
          ---> Package rsync.x86_64 0:3.0.9-18.el7 will be installed
          --> Finished Dependency Resolution
        
          Dependencies Resolved
        
          =========================================================================================
           Package           Arch               Version                     Repository        Size
          =========================================================================================
          Installing:
           rsync             x86_64             3.0.9-18.el7                base             360 k
        
          Transaction Summary
          =========================================================================================
          Install  1 Package
        
          Total download size: 360 k
          Installed size: 732 k
          Downloading packages:
          rsync-3.0.9-18.el7.x86_64.rpm                                     | 360 kB  00:00:00     
          Running transaction check
          Running transaction test
          Transaction test succeeded
          Running transaction
            Installing : rsync-3.0.9-18.el7.x86_64                                             1/1 
            Verifying  : rsync-3.0.9-18.el7.x86_64                                             1/1 
        
          Installed:
            rsync.x86_64 0:3.0.9-18.el7                                                            
        
          Complete!
          [root@localhost yum.repos.d]# ssh localhost
          The authenticity of host 'localhost (::1)' can't be established.
          ECDSA key fingerprint is SHA256:3h7izAi6QdCeHwDrb8PdeeoMzaJH0zP4n75SQBxlSr8.
          ECDSA key fingerprint is MD5:3a:e3:ca:15:c7:24:cf:56:37:27:31:70:14:70:d5:01.
          Are you sure you want to continue connecting (yes/no)? yes
          Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
          root@localhost's password: 
          #生成秘鑰######################################################################
          [root@localhost yum.repos.d]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
          Generating public/private rsa key pair.
          Your identification has been saved in /root/.ssh/id_rsa.
          Your public key has been saved in /root/.ssh/id_rsa.pub.
          The key fingerprint is:
          SHA256:0kVvd1pHxg8NSCq84SRl2Yi0Zr48LjA8LNz4pxx3Cms root@localhost.localdomain
          The key's randomart image is:
          +---[RSA 2048]----+
          |     ...o+.....+o|
          |      .=o..o. .oo|
          |      = = o o ..=|
          |     + = = . . +o|
          |.oo   o S     .  |
          |.o*. . o         |
          | ..* .+.         |
          |  .E*oo.         |
          |  .+oo.          |
          +----[SHA256]-----+
          #配置authorized_keys量承,公鑰追加到本地的認證文件中#################################################
          [root@localhost yum.repos.d]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
          # 授權######################################################################
          [root@localhost yum.repos.d]# chmod 0600 ~/.ssh/authorized_keys
          # 自身免密登陸測試######################################################################
          [root@localhost yum.repos.d]# ssh localhost
          Last login: Wed Dec 20 22:36:50 2017 from 192.168.10.211
          # 退出######################################################################
          [root@localhost ~]# exit
          logout
          Connection to localhost closed.
          [root@localhost yum.repos.d]# 
        
    • 設置其他三臺機器可以使用219免密登陸[將219的公鑰拷貝給其他機器,并加入到認證文件穴店,同時自身也可以免登陸撕捍,如果不設置的話會有警告]
      • 示例
      [root@snn ~]# scp nn:/root/.ssh/id_rsa.pub /root/219_id_rsa.pub
      The authenticity of host 'nn (192.168.10.219)' can't be established.
      ECDSA key fingerprint is SHA256:3h7izAi6QdCeHwDrb8PdeeoMzaJH0zP4n75SQBxlSr8.
      ECDSA key fingerprint is MD5:3a:e3:ca:15:c7:24:cf:56:37:27:31:70:14:70:d5:01.
      Are you sure you want to continue connecting (yes/no)? yes
      Warning: Permanently added 'nn,192.168.10.219' (ECDSA) to the list of known hosts.
      root@nn's password: 
      id_rsa.pub                                                                      100%  389    80.7KB/s   00:00
      [root@dn1 ~]# cat /root/219_id_rsa.pub >> ~/.ssh/authorized_keys    
      
  • 配置hadoop根目錄/conf下的core-site.xml文件
<configuration>
    <!-- hdfs訪問接口地址,即name node的訪問地址 -->
     <property>
         <name>fs.default.name</name>
         <value>hdfs://nn:9000</value>
     </property>
    <!-- hadoop工作基礎存放地點泣洞,包括namenode的工作目錄 -->
     <property>
         <name>hadoop.tmp.dir</name>
         <value>/opt/hadoop</value>
     </property>
</configuration>
  • 配置hadoop根目錄下的hdfs-site.xml
<configuration>
     <property>
        <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
         <name>dfs.replication</name>
         <value>3</value>
     </property>
</configuration>
  • 配置hadoop根目錄下的slaves
# 配置數(shù)據(jù)節(jié)點忧风,ip或者主機名稱
dn1
dn2
dn3
  • 配置hadoop根目錄下的masters
# 配置second name node節(jié)點,ip或者主機名稱
snn
  • 那么球凰,name node主機是在哪配呢狮腿?看core-site.xml,配置了nn節(jié)點的訪問地址
  • 添加HADOOP_HOME到/etc/profile
    # 填你的hadoop安裝路徑
    export HADOOP_HOME=/root/hadoop-1.2.1
    export PATH=$HADOOP_HOME/bin:$PATH
    
  • source /etc/profile
  • hadoop namenode -format
  • start-dfs.sh
  • python安裝腳本
  • 服務器配置文件
ip,user,pwd,hostname,nodes
192.168.10.215,root,mico,dn1,DN
192.168.10.216,root,mico,dn2,DN
192.168.10.217,root,mico,dn3,DN
192.168.10.218,root,mico,snn,SNN
192.168.10.219,root,mico,nn,NN
  • 軟件配置文件
software
java-1.6.0-openjdk.x86_64
rsync
  • 版本和路徑配置文件
;軟件版本說明
[version]
;hadoop版本
hadoop=1.2.1
;openjdk的版本
openjdk=1.6.0

;服務器路徑相關
[path]
hadoop=/opt/
  • python 腳本
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# 關閉防火墻systemctl stop firewalld.service
#禁止防火墻開機自啟systemctl disable firewalld.service
#禁用selinux vi /etc/selinux/config
# selinux 生效setenforce 0
# 設定主機名
# 修改host文件
# 重啟 reboot
# 安裝jdk
# 免密登陸
# yum install -y ssh
# yum install -y rsync
# 生成公鑰ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# 追加到本地文件中
# 免密登陸測試

# 解壓hadoop壓縮包
# 配置hadoop-env.sh 配置JAVA_HOME
# 配置core-site.xml:
#     <configuration>
#         <!-- hdfs訪問接口地址,即name node的訪問地址 -->
#          <property>
#              <name>fs.default.name</name>
#              <value>hdfs://nn:9000</value>
#          </property>
#         <!-- hadoop工作基礎存放地點弟蚀,包括namenode的工作目錄 -->
#          <property>
#              <name>hadoop.tmp.dir</name>
#              <value>/opt/hadoop</value>
#          </property>
#     </configuration>
# 配置hdfs-site.xml
#     <configuration>
#          <property>
#             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
#              <name>dfs.replication</name>
#              <value>3</value>
#          </property>
#     </configuration>
# 配置數(shù)據(jù)節(jié)點slaves蚤霞,ip或者主機名稱
#     dn1
#     dn2
#     dn3
# 配置second name node節(jié)點masters,ip或者主機名稱
#     snn
# 將hadoop加入到系統(tǒng)環(huán)境變量
# hadoop namenode -format
# start-dfs.sh

import csv
import paramiko
import os
from www.colorprint.color_print import ColorPrint
import configparser
import urllib.request
from tqdm import tqdm
import tarfile
from lxml import etree


# 第一行插入文字
def insert_first_line(file,text):
    with open(file, 'r+',encoding="utf8") as f:
        content = f.read()
        f.seek(0, 0)
        f.write(text + content)


# 逐行寫文件
def write_lines(file, lines):
    with open(file, 'w', encoding='utf8') as f:
        for line in lines:
            if line == lines[-1]:
                f.write(line)
            else:
                f.write(line+"\n")


# 下載進度條
def process_hook(t):
    last_b = [0]

    def inner(b=1, bsize=1, tsize=None):
        """
        b  : int, optional
            Number of blocks just transferred [default: 1].
        bsize  : int, optional
            Size of each block (in tqdm units) [default: 1].
        tsize  : int, optional
            Total size (in tqdm units). If [default: None] remains unchanged.
        """
        if tsize is not None:
            t.total = tsize
        t.update((b - last_b[0]) * bsize)
        last_b[0] = b

    return inner


conf = configparser.ConfigParser()


# 獲取ini配置文件值
def read_ini(file, section, _name_):
    conf.read(file,encoding="utf8")
    return conf.get(section, _name_)


# 讀取csv文件
def read(file):
    config_ = []
    with open(file, encoding='utf8') as f:
        f_csv = csv.reader(f)
        next(f_csv)
        for row in f_csv:
            config_.append(row)
    return config_

servers = read("./server.csv")
data_nodes = []
name_node = []
second_name_node = []
hadoop_home = ""
hadoop_name = ""
hadoop_file_name = ""


# 返回一個字符串义钉,單個命令,返回該命令的所有行
def ssh(ip, user_name, pass_wd, cmd):
    result_str = ""
    err_str = ""
    try:
        ssh_client = paramiko.SSHClient()
        ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        ssh_client.connect(ip, 22, user_name, pass_wd, timeout=5)
        # print ("執(zhí)行遠程命令:服務器ip:%s,命令:%s" %(ip,cmd))
        std_in, stdout, stderr = ssh_client.exec_command(cmd)
        #           stdin.write("Y")   #簡單交互昧绣,輸入 ‘Y’
        out = stdout.readlines()
        err = stderr.readline()
        # 屏幕輸出
        for o in out:
            result_str += o
            print(o)
        for e in err:
            err_str += e
        if(len(err_str) != 0):
            print(err_str)
        # print('%s\t執(zhí)行完畢\n' % ip)
        # print("result_str:"+result_str)
        ssh_client.close()
    except Exception as e:
        print('%s\tError:%s\n' % (ip, e))

    return result_str


def upload(host_ip, username, password, local_path, remote_path):
    t = paramiko.Transport((host_ip, 22))
    t.connect(username=username, password=password)  # 登錄遠程服務器
    sftp = paramiko.SFTPClient.from_transport(t)  # sftp傳輸協(xié)議
    sftp.put(local_path, remote_path)
    t.close()


def download(host_ip, username, password, remote_path, local_path):
    t = paramiko.Transport((host_ip, 22))
    t.connect(username=username, password=password)  # 登錄遠程服務器
    sftp = paramiko.SFTPClient.from_transport(t)  # sftp傳輸協(xié)議
    src = remote_path
    des = local_path
    sftp.get(src, des)
    t.close()


# 關閉防火墻
def stop_firewall():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "systemctl stop firewalld.service")
        print(rs)


#禁止防火墻開機自啟systemctl disable firewalld.service
def disable_firewall():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "systemctl disable firewalld.service")
        print(rs)


# 禁用selinux vi /etc/selinux/config sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
def edit_selinux_config():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config")
        print(rs)


# selinux 生效setenforce 0
def selinux_config_effective():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "setenforce 0")
        print(rs)


# 設定主機名
def set_host_name():
    for l in servers:
        rs = ssh(l[0], l[1], l[2], "hostnamectl set-hostname "+ l[3])
        print(rs)


# 修改host文件
def edit_host_file():
    for l in servers:
        cmd = "echo \'"+l[0]+"  "+l[3]+"    "+l[3]+"'>>/etc/hosts"
        rs = ssh(l[0], l[1], l[2], cmd)
        print(rs)


# 重啟 reboot
def reboot():
    for l in servers:
        cmd = "reboot"
        ssh(l[0], l[1], l[2], cmd)


# 等待啟動
def wait_start_up():
    while(True):
        all_str = ""
        try:
            for l in servers:
                rs = ssh(l[0], l[1], l[2], "ls /etc/hosts")
                if rs == '':
                    rs = 'ERROR'
                all_str += rs
        except Exception as e:
            continue
        else:
            if all_str.find("ERROR")!=-1:
                continue
            else:
                break


def execute_cmd(cmd):
    for l in servers:
        ssh(l[0], l[1], l[2], cmd)


# 安裝jdk yum -y install java-1.6.0-openjdk.x86_64
# yum install -y ssh
# yum install -y rsync
def install_software():
    yum_y_install_s = "yum -y install " + "rsync"
    execute_cmd(yum_y_install_s)
    jdk_version = read_ini("./config.ini", "version", "openjdk")
    yum_y_install_s = "yum -y install " + "yum -y install java-"+jdk_version+"-openjdk.x86_64"
    execute_cmd(yum_y_install_s)


# 免密登陸,nn節(jié)點能免密登陸到其他機器
def silent_login():
    ColorPrint.print_info("數(shù)據(jù)節(jié)點:"+str(data_nodes))
    ColorPrint.print_info("NameNode:"+str(name_node))
    ColorPrint.print_info("SecondNameNode:"+str(second_name_node))
    gen_pubkey()
    for l in servers:
        if name_node[0] == l[0]:
            continue
        else:
            append_auth_keys(name_node,l)


# 復制source 的公鑰,傳到其des服務器捶闸,使source 可以免密登陸到des服務器
def append_auth_keys(source,des):
    # 復制source 的公鑰夜畴,傳到其des服務器,使source 可以免密登陸到des服務器
    ColorPrint.print_info(str(source[0]) + "公鑰復制到:" + str(des[0]))
    download(source[0], source[1], source[2], "/" + source[1] + "/.ssh/id_rsa.pub",source[0] + "id_rsa.pub")
    local_pub = source[0] + "id_rsa.pub"
    up_pub = "/" + source[1] + "/.ssh/" + source[0] + "id_rsa.pub"
    upload(des[0], des[1], des[2], local_pub, up_pub)
    append_keys = "cat /" + source[1] + "/.ssh/" + source[0] + "id_rsa.pub >> ~/.ssh/authorized_keys"
    ssh(des[0], des[1], des[2], append_keys)
    ColorPrint.print_info(str(source[0]) + "公鑰復制到:" + str(des[0]) + "成功删壮!")
    os.remove(local_pub)
    remove_des_keys = "rm -f " + up_pub
    ssh(des[0], des[1], des[2], remove_des_keys)


# 所有節(jié)點生成公鑰
def gen_pubkey():
    ColorPrint.print_info("所有節(jié)點生成公鑰....")
    for l in servers:
        rm_rsa = "rm -f /" + l[1] + "/.ssh/id_rsa*"
        ssh(l[0], l[1], l[2], rm_rsa)
    for l in servers:
        gen_rsa = "ssh-keygen -t rsa -P '' -f /" + l[1] + "/.ssh/id_rsa"
        ssh(l[0], l[1], l[2], gen_rsa)
    ColorPrint.print_info("所有節(jié)點生成公鑰完成....")
    ColorPrint.print_info("所有節(jié)點自身公鑰添加到信任文件....")
    for l in servers:
        append_auth = "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
        ssh(l[0], l[1], l[2], append_auth)
    ColorPrint.print_info("所有節(jié)點自身公鑰添加到信任文件完成....")


def analysis_nodes():
    global name_node
    global second_name_node
    for l in servers:
        if 'NN' == l[4]:
            name_node = l
        elif 'SNN' == l[4]:
            second_name_node = l
        else:
            data_nodes.append(l)


def find_jdk(dest):
    find = "find / -name 'java' -type f -perm -111 |awk -F'bin' '{print $1}'"
    rs = ssh(dest[0],dest[1],dest[2],find)
    return rs


def fetch_hadoop():
    global hadoop_name
    global hadoop_file_name
    base_url = "http://mirrors.hust.edu.cn/apache/hadoop/common/"
    hadoop_version = read_ini("./config.ini", "version", "hadoop")
    base_name = "hadoop-"+hadoop_version
    hadoop_name = base_name
    base_name_dir = base_name + "/"
    base_url += base_name_dir
    file_name = base_name + ".tar.gz"
    hadoop_file_name = file_name
    base_url += file_name

    with tqdm(unit='B', unit_scale=True, leave=True, miniters=1,desc=file_name) as t:
        urllib.request.urlretrieve(base_url, filename=file_name, reporthook=process_hook(t), data=None)


# 解壓hadoop壓縮包
def un_tar_hadoop(file_name, base_dir="./"):
    with tarfile.open(file_name) as tar:
        tar.extractall(path=base_dir)
    # tar.extractall(path="./"+base_dir)
    os.remove(file_name)


# 編輯hadoop配置文件hadoop-env.sh
# export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.41.x86_64/jre
def edit_hadoop_env(file_name,java_home):
    insert_first_line(file_name,"export JAVA_HOME="+java_home)


# 配置core-site.xml:
#     <configuration>
#         <!-- hdfs訪問接口地址贪绘,即name node的訪問地址 -->
#          <property>
#              <name>fs.default.name</name>
#              <value>hdfs://nn:9000</value>
#          </property>
#         <!-- hadoop工作基礎存放地點,包括namenode的工作目錄 -->
#          <property>
#              <name>hadoop.tmp.dir</name>
#              <value>/opt/hadoop</value>
#          </property>
#     </configuration>
def edit_core_site(file_path):
    tree = xml_parser(file_path)
    root = tree.getroot()
    name_property_node = gen_xml_node(root,'property')
    dir_property_node = gen_xml_node(root,'property')

    #         <!-- hdfs訪問接口地址央碟,即name node的訪問地址 -->
    #          <property>
    #              <name>fs.default.name</name>
    #              <value>hdfs://nn:9000</value>
    #          </property>
    gen_xml_node(name_property_node,'name','fs.default.name')
    gen_xml_node(name_property_node,'value','hdfs://nn:9000')
    #         <!-- hadoop工作基礎存放地點税灌,包括namenode的工作目錄 -->
    #          <property>
    #              <name>hadoop.tmp.dir</name>
    #              <value>/opt/hadoop</value>
    #          </property>
    gen_xml_node(dir_property_node,'name','hadoop.tmp.dir')
    gen_xml_node(dir_property_node,'value','/opt/hadoop')
    tree.write(file_path,pretty_print=True,xml_declaration = True, encoding='utf8')


def xml_parser(file_path):
    parser = etree.XMLParser(encoding="utf8", remove_blank_text=True)
    tree = etree.parse(file_path,parser)
    return tree


def gen_xml_node(parent_node, name, value=''):
    node = etree.Element(name)
    node.text = value
    parent_node.append(node)
    return node


# 配置hdfs-site.xml
#     <configuration>
#          <property>
#             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
#              <name>dfs.replication</name>
#              <value>3</value>
#          </property>
#     </configuration>\
def edit_hdfs_site(file_path):
    tree = xml_parser(file_path)
    root = tree.getroot()
    replication_node = gen_xml_node(root, 'property')
    #          <property>
    #             <!-- hdfs 的副本數(shù) 小于或等于datanode的節(jié)點數(shù)量 -->
    #              <name>dfs.replication</name>
    #              <value>3</value>
    #          </property>
    gen_xml_node(replication_node, 'name', 'dfs.replication')
    gen_xml_node(replication_node, 'value', '3')
    tree.write(file_path, pretty_print=True, xml_declaration=True, encoding='utf8')


# 配置數(shù)據(jù)節(jié)點slaves,ip或者主機名稱
#     dn1
#     dn2
#     dn3
def edit_slaves(file_path, data_nodes):
    slaves = []
    for d in data_nodes:
        slaves.append(str(d[3]))
    write_lines(file_path,slaves)


# 配置second name node節(jié)點masters亿虽,ip或者主機名稱
#     snn
def edit_masters(file_path, second_name_node):
    masters = [str(second_name_node[3])]
    write_lines(file_path,masters)


# 壓縮編輯后的hadoop
def tar_hadoop(file_path,tar_name):
    tar_file(file_path,tar_name)


# 統(tǒng)計文件數(shù)量
def file_count(folder):
    file_counter = 0
    for dir_path, dirs, files in (os.walk(folder)):
        file_counter += len(files)
    return file_counter


# 壓縮文件
def tar_file(input_path, tar_file_name):
    tar = tarfile.open(tar_file_name, "w:gz")
    count = file_count(input_path)
    with tqdm(total=count, unit='file',desc="壓縮文件到"+tar_file_name) as pbar:
        for dir_path, dirs, files in os.walk(input_path):
            for filename in files:
                full_path = os.path.join(dir_path, filename)
                tar.add(full_path)
                pbar.update(1)
    tar.close()


########################################################################未測試
# 上傳壓縮后的hadoop到服務器
def upload_hadoop(local_path):
    global hadoop_home
    hadoop_prefix = read_ini("./config.ini", "path", "hadoop")
    sep = local_path.rfind("/")
    if sep != -1:
        file_name = local_path[sep+1:]
        hadoop_home = hadoop_prefix + file_name
    else:
        hadoop_home = hadoop_prefix + local_path
    for l in servers:
        upload(l[0], l[1], l[2], local_path, hadoop_home)


# 解壓縮服務器上的hadoop
def extract_server_hadoop(hadoop_tar_path):
    hadoop_prefix = read_ini("./config.ini", "path", "hadoop")
    for l in servers:
        cmd = "tar -C " + hadoop_prefix + " -xvf " + hadoop_tar_path
        ssh(l[0], l[1], l[2], cmd)


# 將hadoop加入到系統(tǒng)環(huán)境變量 sed -i '$a\hadoop_home' /etc/profile
def add_hadoop_home(hadoop_home):
    hadoop = hadoop_home.replace(".tar.gz","")
    hadoop_ = "export HADOOP_HOME=" + hadoop
    path = "export PATH=$HADOOP_HOME/bin:$PATH"
    execute_cmd("sed -i '$a\\" + hadoop_ + "' /etc/profile")
    execute_cmd("sed -i '$a\\" + path + "' /etc/profile")
    execute_cmd("source /etc/profile")
    bin_ = "chmod -R +x " + hadoop + "/bin/"
    execute_cmd(bin_)


# 初始化hdfs文件系統(tǒng) hadoop namenode -format
def format_hdfs(name_nodes):
    ssh(name_nodes[0], name_nodes[1], name_nodes[2], "hadoop namenode -format")


# 啟動hdfs start-dfs.sh
def start_hdfs(name_nodes):
    ssh(name_nodes[0], name_nodes[1], name_nodes[2], "start-dfs.sh")

if __name__ == "__main__":
    # # 關閉防火墻
    # stop_firewall()
    # # 禁用防火墻
    # disable_firewall()
    # # 配置禁用selinux
    # edit_selinux_config()
    # # selinux設置生效
    # selinux_config_effective()
    # # 編輯host文件菱涤,
    # edit_host_file();
    # # 重啟
    # reboot()
    # # 等待重啟
    # wait_start_up()
    # # 安裝軟件
    # install_software()
    # # NameNode免密登陸到其他其他
    # silent_login()
    # # 分析各個節(jié)點屬性
    analysis_nodes()
    ColorPrint.print_info("數(shù)據(jù)節(jié)點:" + str(data_nodes))
    ColorPrint.print_info("NameNode:" + str(name_node))
    ColorPrint.print_info("SecondNameNode:" + str(second_name_node))
    # 查找jdk版本洛勉,各個節(jié)點都是裝的open_jdk,路徑都一樣
    jdk_home = find_jdk(name_node)
    print("jdk",jdk_home)
    # # 下載hadoop
    fetch_hadoop()
    # # 解壓hadoop
    un_tar_hadoop(hadoop_file_name)
    # edit_hadoop_env("./testData/hadoop-env.sh",jdk_home)
    # edit_core_site("./testData/core-site.xml")
    # edit_hdfs_site("./testData/hdfs-site.xml")
    # edit_slaves("./testData/slaves",data_nodes)
    # edit_masters("./testData/masters",second_name_node)
    tar_hadoop(hadoop_name + "/",hadoop_file_name)
    upload_hadoop(hadoop_file_name)
    extract_server_hadoop(hadoop_home)
    add_hadoop_home(hadoop_home)
    format_hdfs(name_node)
    start_hdfs(name_node)
最后編輯于
?著作權歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末粘秆,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子收毫,更是在濱河造成了極大的恐慌攻走,老刑警劉巖殷勘,帶你破解...
    沈念sama閱讀 212,816評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異昔搂,居然都是意外死亡玲销,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,729評論 3 385
  • 文/潘曉璐 我一進店門巩趁,熙熙樓的掌柜王于貴愁眉苦臉地迎上來痒玩,“玉大人,你說我怎么就攤上這事议慰。” “怎么了奴曙?”我有些...
    開封第一講書人閱讀 158,300評論 0 348
  • 文/不壞的土叔 我叫張陵别凹,是天一觀的道長。 經(jīng)常有香客問我洽糟,道長炉菲,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,780評論 1 285
  • 正文 為了忘掉前任坤溃,我火速辦了婚禮拍霜,結果婚禮上,老公的妹妹穿的比我還像新娘薪介。我一直安慰自己祠饺,他們只是感情好,可當我...
    茶點故事閱讀 65,890評論 6 385
  • 文/花漫 我一把揭開白布汁政。 她就那樣靜靜地躺著道偷,像睡著了一般。 火紅的嫁衣襯著肌膚如雪记劈。 梳的紋絲不亂的頭發(fā)上勺鸦,一...
    開封第一講書人閱讀 50,084評論 1 291
  • 那天,我揣著相機與錄音目木,去河邊找鬼换途。 笑死,一個胖子當著我的面吹牛刽射,可吹牛的內(nèi)容都是我干的军拟。 我是一名探鬼主播,決...
    沈念sama閱讀 39,151評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼柄冲,長吁一口氣:“原來是場噩夢啊……” “哼吻谋!你這毒婦竟也來了?” 一聲冷哼從身側響起现横,我...
    開封第一講書人閱讀 37,912評論 0 268
  • 序言:老撾萬榮一對情侶失蹤漓拾,失蹤者是張志新(化名)和其女友劉穎阁最,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體骇两,經(jīng)...
    沈念sama閱讀 44,355評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡速种,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,666評論 2 327
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了低千。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片配阵。...
    茶點故事閱讀 38,809評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖示血,靈堂內(nèi)的尸體忽然破棺而出棋傍,到底是詐尸還是另有隱情,我是刑警寧澤难审,帶...
    沈念sama閱讀 34,504評論 4 334
  • 正文 年R本政府宣布瘫拣,位于F島的核電站,受9級特大地震影響告喊,放射性物質(zhì)發(fā)生泄漏麸拄。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 40,150評論 3 317
  • 文/蒙蒙 一黔姜、第九天 我趴在偏房一處隱蔽的房頂上張望拢切。 院中可真熱鬧,春花似錦秆吵、人聲如沸淮椰。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,882評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽实苞。三九已至,卻和暖如春烈疚,著一層夾襖步出監(jiān)牢的瞬間黔牵,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,121評論 1 267
  • 我被黑心中介騙來泰國打工爷肝, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留猾浦,地道東北人。 一個月前我還...
    沈念sama閱讀 46,628評論 2 362
  • 正文 我出身青樓灯抛,卻偏偏與公主長得像金赦,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子对嚼,可洞房花燭夜當晚...
    茶點故事閱讀 43,724評論 2 351