TimeScaleDB 簡單摘要

時序數(shù)據(jù)庫分析 - TimescaleDB時序數(shù)據(jù)庫介紹

A hypertable can be partitioned by additional columns as well -- such as a device identifier, server or container id, user or customer id, location, stock ticker symbol, and so forth. Such partitioning on this additional column typically employs hashing (mapping all devices or servers into a specific number of hash buckets), although interval-based partitioning can be employed here as well. We sometimes refer to hypertables partitioned by both time and this additional dimension as "time and space" partitions.

This time-and-space partitioning is primarily used for distributed hypertables. With such two-dimensional partitioning, each time interval will also be partitioned across multiple nodes comprising the distributed hypertables. In such cases, for the same hour, information about some portion of the devices will be stored on each node. This allows multi-node TimescaleDB to parallelize inserts and queries for data during that time interval.

Each chunk is implemented using a standard database table. (In PostgreSQL internals, the chunk is actually a "child table" of the "parent" hypertable.) A chunk includes constraints that specify and enforce its partitioning ranges, e.g., that the time interval of the chunk covers ['2020-07-01 00:00:00+00', '2020-07-02 00:00:00+00'), and all rows included in the chunk must have a time value within that range. Any space partitions will be reflected as chunk constraints as well. As these ranges and partitions are non-overlapping, all chunks in a hypertable are disjoint in their partitioning dimensional space.

Local indexes. Indexes are built on each chunk independently, rather than a global index across all data. This similarly ensures that both data and indexes from the latest chunks typically reside in memory, so that updating indexes when inserting data remains fast. And TimescaleDB can still ensure global uniqueness on keys that include any partitioning keys, given the disjoint nature of its chunks, i.e., given a unique (device_id, timestamp) primary key, first identify the corresponding chunk given constraints, then use one of that chunk's index to ensure uniqueness. But this remains simple to use with TimecaleDB's hypertable abstraction: Users simply create an index on the hypertable, and these operations (and configurations) are pushed down to both existing and new chunks.

Distributed hypertables

TimescaleDB supports distributing hypertables across multiple nodes (i.e., a cluster) by leveraging the same hypertable and chunk primitives as described above. This allows TimescaleDB to scale inserts and queries beyond the capabilities of a single TimescaleDB instance.

Distributed hypertables and regular hypertables look very similar, with the main difference being that distributed chunks are not stored locally. There are also some features of regular hypertables that distributed hypertables do not support (see section on current limitations).

Scaling distributed hypertables

As time-series data grows, a common use case is to add data nodes to expand the storage and compute capacity of distributed hypertables. Thus, TimescaleDB can be elastically scaled out by simply adding data nodes to a distributed database.

As mentioned earlier, TimescaleDB can (and will) adjust the number of space partitions as new data nodes are added. Although existing chunks will not have their space partitions updated, the new settings will be applied to newly created chunks. Because of this behavior, we do not need to move data between data nodes when the cluster size is increased, and simply update how data is distributed for the next time interval. Writes for new incoming data will leverage the new partitioning settings, while the access node can still support queries across all chunks (even those that were created using the old partitioning settings). Do note that although the number of space partitions can be changed, the column on which the data is partitioned can not be changed.

Data Retention

An intrinsic part of time-series data is that new data is accumulated and old data is rarely, if ever, updated and the relevance of the data diminishes over time. It is therefore often desirable to delete old data to save disk space.

Hypertable limitations

Foreign key constraints referencing a hypertable are not supported.
Time dimensions (columns) used for partitioning cannot have NULL values.
Unique indexes must include all columns that are partitioning dimensions.
UPDATE statements that move values between partitions (chunks) are not supported. This includes upserts (INSERT ... ON CONFLICT UPDATE).

Create your first hypertable

Creating a hypertable is a two step process. First we execute a CREATE TABLE statement to create a regular relational table. Second, we execute a SELECT statement using the function create_hypertable and specifying the name of the table we want to turn into a hypertable, as well as the name of the time column in that table, which is a required parameter.

How hypertables help with times series data

Hypertables help speed up ingest rates, since data is only inserted into the current chunk, leaving data in the other chunks untouched. Contrast this with inserting data into a single table, which will become bigger and more bloated as more data is ingested.

Hypertables help speed up queries, since only specific chunks are queried thanks to the automatic indexing by time and/or space.

Accessing the dataset

-- copy data from weather_data.csv into weather_metrics
\copy weather_metrics (time, timezone_shift, city_name, temp_c, feels_like_c, temp_min_c, temp_max_c, pressure_hpa, humidity_percent, wind_speed_ms, wind_deg, rain_1h_mm, rain_3h_mm, snow_1h_mm, snow_3h_mm, clouds_percent, weather_type_id) from './weather_data.csv' CSV HEADER;

create_hypertable()

SELECT create_hypertable('conditions', 'time', 'location', 4);


SELECT create_hypertable('conditions', 'time', 'location', 4, partitioning_func => 'location_hash');

Reading data

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市揩晴,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖反砌,帶你破解...
    沈念sama閱讀 211,817評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件潮改,死亡現(xiàn)場離奇詭異徙菠,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)载弄,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,329評論 3 385
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來撵颊,“玉大人宇攻,你說我怎么就攤上這事∏匮保” “怎么了尺碰?”我有些...
    開封第一講書人閱讀 157,354評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長译隘。 經(jīng)常有香客問我亲桥,道長,這世上最難降的妖魔是什么固耘? 我笑而不...
    開封第一講書人閱讀 56,498評論 1 284
  • 正文 為了忘掉前任题篷,我火速辦了婚禮,結(jié)果婚禮上厅目,老公的妹妹穿的比我還像新娘番枚。我一直安慰自己,他們只是感情好损敷,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,600評論 6 386
  • 文/花漫 我一把揭開白布葫笼。 她就那樣靜靜地躺著,像睡著了一般拗馒。 火紅的嫁衣襯著肌膚如雪路星。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,829評論 1 290
  • 那天诱桂,我揣著相機(jī)與錄音洋丐,去河邊找鬼。 笑死挥等,一個胖子當(dāng)著我的面吹牛友绝,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播肝劲,決...
    沈念sama閱讀 38,979評論 3 408
  • 文/蒼蘭香墨 我猛地睜開眼迁客,長吁一口氣:“原來是場噩夢啊……” “哼郭宝!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起哲泊,我...
    開封第一講書人閱讀 37,722評論 0 266
  • 序言:老撾萬榮一對情侶失蹤剩蟀,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后切威,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體育特,經(jīng)...
    沈念sama閱讀 44,189評論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,519評論 2 327
  • 正文 我和宋清朗相戀三年先朦,在試婚紗的時候發(fā)現(xiàn)自己被綠了缰冤。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,654評論 1 340
  • 序言:一個原本活蹦亂跳的男人離奇死亡喳魏,死狀恐怖棉浸,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情刺彩,我是刑警寧澤迷郑,帶...
    沈念sama閱讀 34,329評論 4 330
  • 正文 年R本政府宣布,位于F島的核電站创倔,受9級特大地震影響嗡害,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜畦攘,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,940評論 3 313
  • 文/蒙蒙 一霸妹、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧知押,春花似錦叹螟、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,762評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至静盅,卻和暖如春有缆,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背温亲。 一陣腳步聲響...
    開封第一講書人閱讀 31,993評論 1 266
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留杯矩,地道東北人栈虚。 一個月前我還...
    沈念sama閱讀 46,382評論 2 360
  • 正文 我出身青樓,卻偏偏與公主長得像史隆,于是被迫代替她去往敵國和親魂务。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,543評論 2 349

推薦閱讀更多精彩內(nèi)容