Hive之外部分區(qū)表

本文介紹了如何在Hive里新建一個外部分區(qū)表并加載數(shù)據(jù)

1.建表

# 使用數(shù)據(jù)庫
use blog;

# 創(chuàng)建外部分區(qū)表
create external table external_blog_record(
    host string comment "主機(jī)",
    app string comment "應(yīng)用",
    source string comment "來源",
    remote_addr string comment "訪問IP",
    time_iso6401 string comment "訪問時間",
    http_host string comment "域名",
    request_method string comment "請求方式",
    request_url string comment "請求地址",
    request_protocol string comment "請求協(xié)議",
    request_time string comment "請求耗時",
    status string comment "請求狀態(tài)",
    body_byte_sents string comment "內(nèi)容體大小",
    upstream_addr string comment "轉(zhuǎn)發(fā)服務(wù)器地址",
    upstream_response_time string comment "轉(zhuǎn)發(fā)響應(yīng)耗時",
    upstream_status string comment "轉(zhuǎn)發(fā)狀態(tài)",
    http_referer string comment "來源地址",
    http_user_agent string comment "瀏覽器類型",
    res_type string comment "資源類型：首頁树碱、文章浅妆、類別、其他"
) 
comment "日志原始記錄外部分區(qū)表"
partitioned by (day string) 
row format delimited fields terminated by '\t' 
location '/log/blog';

新建一個名為external_blog_record的數(shù)據(jù)庫表并制定分區(qū)參數(shù)day壳嚎，數(shù)據(jù)的格式用'\t'分隔桐智，數(shù)據(jù)的目錄存放在HDFS的'/log/blog'目錄下。

2.查看分區(qū)

# 查看表分區(qū)
show partitions external_blog_record;

分區(qū)列表

可以看到目前表里面已經(jīng)存在很多分區(qū)了烟馅，查看HDFS的目錄

hdfs中的分區(qū)

每個分區(qū)下對應(yīng)存放這日志文件说庭。

分區(qū)中的日志文件

3.新增分區(qū)

只需要在 /log/blog 下新建day=XXX 即可，但是這樣新建的分區(qū)并沒有和Hive關(guān)聯(lián)起來焙糟，必須運行如下命令口渔，使分區(qū)與Hive關(guān)聯(lián)起來样屠。

msck repair table external_blog_record;

就可以用上面的查看分區(qū)的命令查看是否新建成功穿撮。

4.查詢分區(qū)下的記錄

hive> select count(*) from external_blog_record where day=20181122;
Query ID = hadoop_20181123144713_2b8b197a-c09b-4bf6-8ad3-b88cbd1ee4ca
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1542348923310_0146, Tracking URL = http://hadoop1:8088/proxy/application_1542348923310_0146/
Kill Command = /opt/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1542348923310_0146
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-11-23 14:47:25,697 Stage-1 map = 0%,  reduce = 0%
2018-11-23 14:47:32,237 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.49 sec
2018-11-23 14:47:39,923 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.62 sec
MapReduce Total cumulative CPU time: 2 seconds 620 msec
Ended Job = job_1542348923310_0146
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.62 sec   HDFS Read: 414396 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 620 msec
OK
1561
Time taken: 27.642 seconds, Fetched: 1 row(s)

5.附錄

利用MapperReduce來定時合并小文件并加載到Hive分區(qū)表里

/**
 * 合并日志文件并加載到Hive分區(qū)表
 */
public class MergeSmallFileAndLoadIntoHive {

    private static final Logger LOG = LoggerFactory.getLogger(MergeSmallFileAndLoadIntoHive.class);

    static class SmallFileCombinerMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
        NullWritable v = NullWritable.get();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            context.write(value, v);
        }
    }

    public static void main(String[] args) throws Exception {

        boolean test = false;
        String logPath;
        String patition;

        if (test) {
            patition = "day=20181114";

            // Linux
            logPath = "/log/blog";

            // Windows
            logPath = "D:" + File.separator + "hadoop" + File.separator + "blog";
        } else {
            if (args == null || args.length < 2) {
                throw new RuntimeException("\"參數(shù)的長度不正確,參考：[java -jar xxxx.jar me.jinkun.mr.merge.MergeSmallFileAndLoadIntoHive /log/blog day=20181116]\"");
            }

            logPath = args[0];
            patition = args[1];
        }


        String tempInPath = logPath + File.separator + "temp" + File.separator + patition + File.separator + "in";
        String tempOutPath = logPath + File.separator + "temp" + File.separator + patition + File.separator + "out";

        //權(quán)限問題
        System.setProperty("HADOOP_USER_NAME", "hadoop");

        Configuration conf = new Configuration();
        if (!test) {
            conf.set("fs.defaultFS", "hdfs://hadoop1:9000");
        }

        // 1.獲取當(dāng)天臨時保存的日志
        List<Path> paths = new ArrayList<>();
        long currentTimeMillis = System.currentTimeMillis();
        FileSystem fs = FileSystem.get(conf);
        FileStatus[] fileStatuses = fs.listStatus(new Path(tempInPath));
        for (FileStatus fileStatus : fileStatuses) {
            if (fileStatus.isDirectory()) {
                Path path = fileStatus.getPath();
                String name = fileStatus.getPath().getName();
                if (!name.startsWith("delete") &&
                        name.compareTo(String.valueOf(currentTimeMillis)) < 0) {
                    paths.add(path);
                }
                LOG.info("文件夾名為：" + name);
            }
        }

        if (paths.size() == 0) {
            LOG.info("暫無可以合并的文件夾!不提交JOB");
            System.exit(0);
        }

        Job job = Job.getInstance(conf);
        job.setJarByClass(MergeSmallFileAndLoadIntoHive.class);
        job.setMapperClass(SmallFileCombinerMapper.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);

        // 2.合并小文件到臨時文件夾
        job.setInputFormatClass(CombineTextInputFormat.class);
        CombineTextInputFormat.setMaxInputSplitSize(job, 1024 * 1024 * 128);//128M
        CombineTextInputFormat.setInputPaths(job, paths.toArray(new Path[paths.size()]));
        Path tempResultPath = new Path(tempOutPath);
        FileOutputFormat.setOutputPath(job, tempResultPath);

        job.setNumReduceTasks(0);

        boolean flag = job.waitForCompletion(true);

        // 如果成功
        if (flag) {

            // 3.將合并后的文件移動到Hive的分區(qū)表
            int index = 0;
            FileStatus[] resultStatus = fs.listStatus(tempResultPath);
            for (FileStatus fileStatus : resultStatus) {
                Path path = fileStatus.getPath();
                if (path.getName().startsWith("part")) {
                    fs.rename(path, new Path(logPath + File.separator + patition + File.separator + currentTimeMillis + "." + index + ".log"));
                    index++;
                }
            }
            fs.delete(tempResultPath, true);

            // 4.標(biāo)記合并過的文件夾為已經(jīng)刪除
            for (Path path : paths) {
                fs.rename(path, new Path(path.getParent(), "delete_" + path.getName()));
            }

            fs.close();
        }
    }
}

執(zhí)行腳本

#!/bin/sh
day=`date '+%Y%m%d'`
echo "提交合并任務(wù) $day"
nohup /opt/soft/hadoop-2.7.3/bin/hadoop jar /opt/soft-install/schedule/mapreduce-1.0.jar me.jinkun.mr.merge.MergeSmallFileAndLoadIntoHive /log/blog day=$day > nohup.log 2>&1 &

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市痪欲，隨后出現(xiàn)的幾起案子悦穿，更是在濱河造成了極大的恐慌，老刑警劉巖业踢，帶你破解...
沈念sama閱讀 221,198評論 6贊 514
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件栗柒，死亡現(xiàn)場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)瞬沦，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,334評論 3贊 398
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門太伊，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人逛钻，你說我怎么就攤上這事僚焦。” “怎么了曙痘？”我有些...
開封第一講書人閱讀 167,643評論 0贊 360
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵芳悲，是天一觀的道長。經(jīng)常有香客問我边坤，道長名扛，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 59,495評論 1贊 296
?港島之戀（遺憾婚禮）
正文為了忘掉前任茧痒，我火速辦了婚禮肮韧，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘旺订。我一直安慰自己惹苗，他們只是感情好，可當(dāng)我...
茶點故事閱讀 68,502評論 6贊 397
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布耸峭。她就那樣靜靜地躺著桩蓉，像睡著了一般。火紅的嫁衣襯著肌膚如雪劳闹。梳的紋絲不亂的頭發(fā)上院究，一...
開封第一講書人閱讀 52,156評論 1贊 308
城市分裂傳說
那天，我揣著相機(jī)與錄音本涕，去河邊找鬼业汰。笑死，一個胖子當(dāng)著我的面吹牛菩颖，可吹牛的內(nèi)容都是我干的样漆。我是一名探鬼主播，決...
沈念sama閱讀 40,743評論 3贊 421
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼晦闰，長吁一口氣：“原來是場噩夢啊……” “哼放祟！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起呻右，我...
開封第一講書人閱讀 39,659評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤跪妥，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后声滥，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體眉撵，經(jīng)...
沈念sama閱讀 46,200評論 1贊 319
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 38,282評論 3贊 340
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了纽疟。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片罐韩。...
茶點故事閱讀 40,424評論 1贊 352
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖污朽，靈堂內(nèi)的尸體忽然破棺而出伴逸，到底是詐尸還是另有隱情，我是刑警寧澤膘壶，帶...
沈念sama閱讀 36,107評論 5贊 349
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布错蝴，位于F島的核電站，受9級特大地震影響颓芭，放射性物質(zhì)發(fā)生泄漏顷锰。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,789評論 3贊 333
男人毒藥：我在死后第九天來索命
文/蒙蒙一亡问、第九天我趴在偏房一處隱蔽的房頂上張望官紫。院中可真熱鬧，春花似錦州藕、人聲如沸束世。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,264評論 0贊 23
一樁弒父案床玻，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽毁涉。三九已至，卻和暖如春锈死，著一層夾襖步出監(jiān)牢的瞬間贫堰，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,390評論 1贊 271
情欲美人皮
我被黑心中介騙來泰國打工待牵，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留其屏，地道東北人。一個月前我還...
沈念sama閱讀 48,798評論 3贊 376
代替公主和親
正文我出身青樓缨该，卻偏偏與公主長得像偎行，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子贰拿，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 45,435評論 2贊 359