快速入手
本節(jié)將通過(guò)RPM安裝物理機(jī)版本的一個(gè)Centos/Redhat 7.x單節(jié)點(diǎn)集群葫哗。假設(shè)我們安裝的服務(wù)器hostname為oushu(可以通過(guò)命令:hostname 直接獲取鳞疲,請(qǐng)將文中所有出現(xiàn)的oushu替換為實(shí)際的hostname)烛芬。此次部署大約需要您30分鐘時(shí)間。
安裝準(zhǔn)備
首先使用root登錄大脉。 查看有無(wú)avx指令:
cat /proc/cpuinfo | grep avx
安裝oushu yum源:
#Redhat/CentOS 7.0, 7.1, 7.2系統(tǒng)并且包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
#Redhat/CentOS 7.0, 7.1, 7.2系統(tǒng)但不包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
#Redhat/CentOS 7.3系統(tǒng)并且包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
#Redhat/CentOS 7.3系統(tǒng)但不包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
#Redhat/CentOS 7.4系統(tǒng)并且包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
#Redhat/CentOS 7.4系統(tǒng)但不包含avx指令請(qǐng)配置以下YUM源:
wget -P /etc/yum.repos.d/ $獲取的repo url
禁用selinux:
sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/configsetenforce 0
關(guān)閉防火墻:
systemctl stop iptablessystemctl disable iptablessystemctl stop firewalldsystemctl disable firewalld
安裝Java:
yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-develmkdir -p /usr/java//注意查看本機(jī)的java版本ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-2.b16.el7_4.x86_64? /usr/java/default
安裝HDFS
安裝HDFS并且創(chuàng)建其使用的目錄,這里我們假設(shè)我們的機(jī)器上有兩個(gè)數(shù)據(jù)盤(pán),分別mount在/data1和/data2目錄太惠,如果您有多塊盤(pán),下面的目錄創(chuàng)建以及配置文件需要做相應(yīng)的更改疲憋。尤其對(duì)HDFS的數(shù)據(jù)目錄以及OushuDB的臨時(shí)文件目錄位置凿渊。
#由于hadoop依賴于特定版本的snappy,請(qǐng)先卸載snappy確保安裝的順利進(jìn)行yum -y remove snappy#安裝HDFS RPM,RPM安裝會(huì)自動(dòng)創(chuàng)建hdfs用戶yum install -y hadoop hadoop-hdfs#在/data1上創(chuàng)建NameNode目錄mkdir -p /data1/hdfs/namenode#在每塊盤(pán)上創(chuàng)建DataNode數(shù)據(jù)目錄埃脏,并更改權(quán)限mkdir -p /data1/hdfs/datanodechmod -R 755 /data1/hdfschown -R hdfs:hadoop /data1/hdfsmkdir -p /data2/hdfs/datanodechmod -R 755 /data2/hdfschown -R hdfs:hadoop /data2/hdfs
復(fù)制下列文件到/etc/hadoop/conf/中(遇到覆蓋提示搪锣,請(qǐng)輸入y,表示確認(rèn)覆蓋)
http://www.oushu.com/docs/ch/_downloads/908bee114673dff44292d2b51ed5a1ce/core-site.xml.
http://www.oushu.com/docs/ch/_downloads/a57b214c41f418570548204fdf5089b3/hdfs-site.xml.
http://www.oushu.com/docs/ch/_downloads/5caeda7d6d35f2ab18438c8994e855c1/hadoop-env.sh.
編輯/etc/hadoop/conf/core-site.xml文件中的fs.defaultFS屬性彩掐,其他系統(tǒng)通過(guò)這個(gè)url來(lái)訪問(wèn)HDFS,注:在做format之前构舟,請(qǐng)確認(rèn)已經(jīng)將core-site.xml中fs.defaultFS的值由oushu替換成hostname。:
<property><name>fs.defaultFS</name><value>hdfs://oushu:9000</value></property>
編輯 /etc/hadoop/conf/hadoop-env.sh堵幽,加入下面參數(shù)狗超。這些參數(shù)配置了Java Home,Hadoop配置文件朴下,日志文件目錄努咐,以及JVM選項(xiàng)。根據(jù)存儲(chǔ)的HDFS數(shù)據(jù)量大小殴胧,需要適當(dāng)修改NameNode的-Xmx值渗稍。HDFS數(shù)據(jù)量越大,-Xmx值應(yīng)該設(shè)的越大团滥。
export JAVA_HOME="/usr/java/default"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"
export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k"
export HADOOP_LOG_DIR=/var/log/hadoop/$USER
因?yàn)?etc/hadoop/conf/hdfs-site.xml中默認(rèn)使用/data1和/data2兩塊盤(pán)竿屹,如果你有多塊盤(pán),你需要更改dfs.data.dir屬性惫撰,使得HDFS用到所有盤(pán):
<property><name>dfs.data.dir</name><value>/data1/hdfs/datanode,/data2/hdfs/datanode</value><final>true</final></property>
格式化NameNode羔沙,并啟動(dòng)NameNode和DataNode。
注:在format過(guò)程中如果詢問(wèn)是否要format厨钻,請(qǐng)輸入y扼雏,表示確認(rèn)。
sudo -u -E hdfs hdfs namenode -formatsudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenodesudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode
HDFS日志在/var/log/hadoop/hdfs/中夯膀。如果因?yàn)榕渲贸鲥e(cuò)诗充,可以查看錯(cuò)誤日志,并依據(jù)改正诱建。
檢查hdfs是否成功運(yùn)行:
su - hdfshdfs dfsadmin -reporthdfs dfs -mkdir /testnodehdfs dfs -put /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh /testnode/hdfs dfs -ls -R /
你也可以查看HDFS web界面:http://oushu:50070/
安裝OushuDB
安裝OushuDB RPM蝴蜓,OushuDB RPM安裝會(huì)自動(dòng)創(chuàng)建gpadmin用戶。
yum install -y hawq
在配置文件/etc/sysctl.conf添加內(nèi)容
kernel.shmmax = 1000000000kernel.shmmni = 4096kernel.shmall = 4000000000kernel.sem = 250 512000 100 2048kernel.sysrq = 1kernel.core_uses_pid = 1kernel.msgmnb = 65536kernel.msgmax = 65536kernel.msgmni = 2048net.ipv4.tcp_syncookies = 0net.ipv4.conf.default.accept_source_route = 0net.ipv4.tcp_tw_recycle = 1net.ipv4.tcp_max_syn_backlog = 200000net.ipv4.conf.all.arp_filter = 1net.ipv4.ip_local_port_range = 10000 65535net.core.netdev_max_backlog = 200000fs.nr_open = 3000000kernel.threads-max = 798720kernel.pid_max = 798720# increase networknet.core.rmem_max=2097152net.core.wmem_max=2097152net.core.somaxconn=4096
使系統(tǒng)配置生效:
sysctl -p
創(chuàng)建OushuDB本地元數(shù)據(jù)目錄和臨時(shí)文件目錄:
#創(chuàng)建OushuDB本地元數(shù)據(jù)目錄俺猿,下面兩個(gè)目錄分別為master和segment使用mkdir -p /data1/hawq/masterddmkdir -p /data1/hawq/segmentdd#創(chuàng)建OushuDB臨時(shí)文件目錄茎匠,每塊盤(pán)需要?jiǎng)?chuàng)建一個(gè)臨時(shí)文件目錄,這樣可以讓OushuDB使用到所有盤(pán)押袍。mkdir -p /data1/hawq/tmpchmod -R 755 /data1/hawqchown -R gpadmin:gpadmin /data1/hawqmkdir -p /data2/hawq/tmpchmod -R 755 /data2/hawqchown -R gpadmin:gpadmin /data2/hawq
在HDFS上創(chuàng)建OushuDB數(shù)據(jù)目錄:
sudo -u hdfs hdfs dfs -mkdir -p /hawq_defaultsudo -u hdfs hdfs dfs -chown -R gpadmin /hawq_default
編輯/usr/local/hawq/etc/slaves诵冒,去掉文件中的localhost,并加入oushu谊惭。slaves文件中存放所有slave節(jié)點(diǎn)的地址汽馋,每行一個(gè)節(jié)點(diǎn)侮东。修改后文件為:
oushu
編輯/usr/local/hawq/etc/hawq-site.xml, 因?yàn)?usr/local/hawq/etc/hawq-site.xml中默認(rèn)使用/data1和/data2兩塊盤(pán),如果你有多塊盤(pán)豹芯,你需要更改hawq_master_temp_directory和hawq_segment_temp_directory值悄雅,用到所有盤(pán):
<property><name>hawq_master_address_host</name><value>oushu</value><description>The host name of hawq master.</description></property><property><name>hawq_dfs_url</name><value>oushu:9000/hawq_default</value><description>URL for accessing HDFS.</description></property><property><name>magma_nodes_url</name><value>localhost:6666</value><description>urls for accessing magma.</description></property><property><name>hawq_master_directory</name><value>/data1/hawq/masterdd</value><description>The directory of hawq master.</description></property><property><name>hawq_segment_directory</name><value>/data1/hawq/segmentdd</value><description>The directory of hawq segment.</description></property><property><name>hawq_master_temp_directory</name><value>/data1/hawq/tmp,/data2/hawq/tmp</value><description>The temporary directory reserved for hawq master. Note: please DONOT add " " between directries.</description></property><property><name>hawq_segment_temp_directory</name><value>/data1/hawq/tmp,/data2/hawq/tmp</value><description>The temporary directory reserved for hawq segment. Note: please DONOT add " " between directories.</description></property>
OushuDB4.0版本新增Magma的單獨(dú)配置和啟停功能,使用magam服務(wù)時(shí)铁蹈,首先創(chuàng)建magma node數(shù)據(jù)目錄:
# 創(chuàng)建mamga node數(shù)據(jù)目錄mkdir -p /data1/hawq/magma_segmentddmkdir -p /data2/hawq/magma_segmentddchown -R gpadmin:gpadmin /data1/hawqchown -R gpadmin:gpadmin /data2/hawq
然后編輯配置/usr/local/hawq/etc/magma-site.xml:
<property><name>nodes_file</name><value>slaves</value><description>The magma nodes file name at GPHOME/etc</description></property><property><name>node_data_directory</name><value>file:///data1/hawq/magma_segmentdd,file:///data2/hawq/magma_segmentdd</value><description>The data directory for magma node</description></property><property><name>node_log_directory</name><value>~/hawq-data-directory/segmentdd/pg_log</value><description>The log directory for magma node</description></property><property><name>node_address_port</name><value>6666</value><description>The port magma node listening</description></property><property><name>magma_range_number</name><value>2</value></property><property><name>magma_replica_number</name><value>3</value></property><property><name>magma_datadir_capacity</name><value>3</value></property><property><name>compact_trigger_ap_ratio_limit</name><value>0.2</value><description>The threshold of triggering compact in MAGMAAP format.</description></property><property><name>compact_trigger_tp_ratio_limit</name><value>0.5</value><description>The threshold of triggering compact in MAGMAAP catalog</description></property>
以gpadmin用戶登錄:
su - gpadmin
設(shè)置免密碼ssh:
source /usr/local/hawq/greenplum_path.shhawq ssh-exkeys -h oushu
初始化OushuDB宽闲,在詢問(wèn)是否初始化時(shí),請(qǐng)輸入y握牧,表示確認(rèn)初始化便锨。
hawq init cluster? //OushuDB4.0 默認(rèn)不啟動(dòng)magma服務(wù)
hawq init cluster --with_magma? //OushuDB4.0新增,3.X版本不支持該選項(xiàng)
// OushuDB4.0版本新增--with_magma選項(xiàng)我碟,但只有hawq init|start|stop cluster命令可以帶--with_magma選項(xiàng)。
OushuDB管理工具日志在/home/gpadmin/hawqAdminLogs/中姚建,OushuDB master日志和segment日志分別在/data1/hawq/masterdd/pg_log/ 和/data1/hawq/segmentdd/pg_log/中矫俺。如果因?yàn)榕渲贸鲥e(cuò),可以查看錯(cuò)誤日志掸冤,并依據(jù)改正厘托。
檢查OushuDB是否運(yùn)行正常:
su - gpadminsource /usr/local/hawq/greenplum_path.shpsql -d postgresselect * from gp_segment_configuration;? //確定所有節(jié)點(diǎn)是up狀態(tài)create table t(i int);insert into t select generate_series(1,1000);select count(*) from t;
體驗(yàn)新執(zhí)行器
本章節(jié)通過(guò)TPCH lineitem 表來(lái)展示新執(zhí)行器的使用。
建立e_lineitem外部表用來(lái)生成TPCH lineitem 數(shù)據(jù),
CREATE EXTERNAL WEB TABLE E_LINEITEM ( L_ORDERKEY? ? INT8 ,L_PARTKEY? ? INTEGER ,L_SUPPKEY? ? INTEGER ,L_LINENUMBER? INTEGER ,L_QUANTITY? ? FLOAT ,L_EXTENDEDPRICE? FLOAT ,L_DISCOUNT? ? FLOAT ,L_TAX? ? ? ? FLOAT ,L_RETURNFLAG? VARCHAR(1) ,L_LINESTATUS? VARCHAR(1) ,L_SHIPDATE? ? TEXT ,L_COMMITDATE? TEXT ,L_RECEIPTDATE TEXT ,L_SHIPINSTRUCT CHAR(25) ,L_SHIPMODE? ? VARCHAR(10) ,L_COMMENT? ? ? VARCHAR(44) )EXECUTE 'bash -c "$GPHOME/bin/dbgen -b $GPHOME/bin/dists.dss -T L -s 1 -N 6 -n $((GP_SEGMENT_ID + 1))"'on 6 format 'text' (delimiter '|');
創(chuàng)建ORC 表
CREATE TABLE lineitem(? L_ORDERKEY? ? INT8,L_PARTKEY? ? INTEGER,L_SUPPKEY? ? INTEGER,L_LINENUMBER? INTEGER,L_QUANTITY? ? FLOAT,L_EXTENDEDPRICE? FLOAT,L_DISCOUNT? ? FLOAT,L_TAX? ? ? ? FLOAT,L_RETURNFLAG? TEXT,L_LINESTATUS? TEXT,L_SHIPDATE? ? TEXT,L_COMMITDATE? TEXT,L_RECEIPTDATE TEXT,L_SHIPINSTRUCT TEXT,L_SHIPMODE? ? TEXT,L_COMMENT? ? ? TEXT)WITH (APPENDONLY = true, OIDS = FALSE, ORIENTATION = orc);
插入數(shù)據(jù)
INSERT INTO lineitem SELECT * FROM e_lineitem;
從下面的例子可以看到新執(zhí)行器對(duì)于性能的大幅改進(jìn)稿湿。
-----獲取表行數(shù)------postgres=# set new_executor = on;SETpostgres=# SELECT COUNT(*) FROM lineitem;count---------6001215(1 row)Time: 17.006 mspostgres=# set new_executor = off;SETpostgres=# SELECT COUNT(*) FROM lineitem;count---------6001215(1 row)Time: 213.248 ms-----TPCH 查詢 1 ------postgres=# set new_executor = on;SETpostgres=#? SELECTl_returnflag,l_linestatus,sum(l_quantity)::bigint as sum_qty,sum(l_extendedprice)::bigint as sum_base_price,sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,avg(l_quantity)::bigint as avg_qty,avg(l_extendedprice)::bigint as avg_price,avg(l_discount)::bigint as avg_disc,count(*) as count_orderFROMlineitemWHEREl_shipdate <= '1998-08-20'GROUP BYl_returnflag,l_linestatus;l_returnflag | l_linestatus | sum_qty? | sum_base_price | sum_disc_price |? sum_charge? | avg_qty | avg_price | avg_disc | count_order--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------R? ? ? ? ? ? | F? ? ? ? ? ? | 37719753 |? ? 56568041381 |? ? 53741292685 |? 55889619120 |? ? ? 26 |? ? 38251 |? ? ? ? 0 |? ? 1478870N? ? ? ? ? ? | F? ? ? ? ? ? |? 991417 |? ? 1487504710 |? ? 1413082168 |? 1469649223 |? ? ? 26 |? ? 38284 |? ? ? ? 0 |? ? ? 38854A? ? ? ? ? ? | F? ? ? ? ? ? | 37734107 |? ? 56586554401 |? ? 53758257135 |? 55909065223 |? ? ? 26 |? ? 38273 |? ? ? ? 0 |? ? 1478493N? ? ? ? ? ? | O? ? ? ? ? ? | 73808911 |? 110700990251 |? 105167436999 | 109377979031 |? ? ? 26 |? ? 38248 |? ? ? ? 0 |? ? 2894278(4 rows)Time: 234.376 mspostgres=# set new_executor = off;SETpostgres=#? SELECTl_returnflag,l_linestatus,sum(l_quantity)::bigint as sum_qty,sum(l_extendedprice)::bigint as sum_base_price,sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,avg(l_quantity)::bigint as avg_qty,avg(l_extendedprice)::bigint as avg_price,avg(l_discount)::bigint as avg_disc,count(*) as count_orderFROMlineitemWHEREl_shipdate <= '1998-08-20'GROUP BYl_returnflag,l_linestatus;l_returnflag | l_linestatus | sum_qty? | sum_base_price | sum_disc_price |? sum_charge? | avg_qty | avg_price | avg_disc | count_order--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------R? ? ? ? ? ? | F? ? ? ? ? ? | 37719753 |? ? 56568041381 |? ? 53741292685 |? 55889619120 |? ? ? 26 |? ? 38251 |? ? ? ? 0 |? ? 1478870N? ? ? ? ? ? | F? ? ? ? ? ? |? 991417 |? ? 1487504710 |? ? 1413082168 |? 1469649223 |? ? ? 26 |? ? 38284 |? ? ? ? 0 |? ? ? 38854A? ? ? ? ? ? | F? ? ? ? ? ? | 37734107 |? ? 56586554401 |? ? 53758257135 |? 55909065223 |? ? ? 26 |? ? 38273 |? ? ? ? 0 |? ? 1478493N? ? ? ? ? ? | O? ? ? ? ? ? | 73808911 |? 110700990251 |? 105167436999 | 109377979031 |? ? ? 26 |? ? 38248 |? ? ? ? 0 |? ? 2894278(4 rows)Time: 2341.147 ms