日均20億筆記錄的數(shù)據(jù)庫優(yōu)化問題

I've written an application with a SQL Server backend that collects and stores and extremely large amount of records. I've calculated that, at the peak, the average amount of records is somewhere in the avenue of 3-4 billion per day (20 hours of operation).

My original solution (before I'd done the actual calculation of the data) was to have my application inserting records into the same table that is queried by my clients. That crashed and burned fairly quickly, obviously, because it's impossible to query a table that's having that many records inserted.

My second solution was to use 2 databases, one for data received by the application and one for client-ready data.

My application would receive data, chunk it into batches of ~100k records and bulk-insert into the staging table. After ~100k records the application would, on the fly, create another staging table with the same schema as before, and begin inserting into that table. It would create a record in a jobs table with the name of the table that has 100k records and a stored procedure on the SQL Server side would move the data from the staging table(s) to client-ready production table, and then drop the table temporary table created by my application.

Both databases have the same set of 5 tables with the same schema, except the staging database which has the jobs table. The staging database has no integrity constraints, key, indexes etc... on the table where the bulk of records will reside. Shown below, the table name is SignalValues_staging. The goal was to have my application slam the data into SQL Server as quickly as possible. The workflow of creating tables on the fly so they can easily be migrated works pretty well.

The following is the 5 relevant tables from my staging database, plus my jobs table:

Staging tables The stored procedure I have written handles the moving of the data from all of the staging tables and inserting it into production. Below is the part of my stored procedure that inserts into production from the staging tables:

-- Signalvalues jobs table.

SELECT *

? ? ? ,ROW_NUMBER() OVER (ORDER BY JobId) AS 'RowIndex'

INTO #JobsToProcess

FROM

(

? ? SELECT JobId

? ? ? ? ? ,ProcessingComplete?

? ? ? ? ? ,SignalValueStagingTableName AS 'TableName'

? ? ? ? ? ,(DATEDIFF(SECOND, (SELECT last_user_update

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? FROM sys.dm_db_index_usage_stats

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? WHERE database_id = DB_ID(DB_NAME())

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? AND OBJECT_ID = OBJECT_ID(SignalValueStagingTableName))

? ? ? ? ? ? ? ? ? ? ,GETUTCDATE())) SecondsSinceLastUpdate

? ? FROM SignalValueJobs

) cte

WHERE cte.ProcessingComplete = 1

? OR cte.SecondsSinceLastUpdate >= 120

DECLARE @i INT = (SELECT COUNT(*) FROM #JobsToProcess)

DECLARE @jobParam UNIQUEIDENTIFIER

DECLARE @currentTable NVARCHAR(128)

DECLARE @processingParam BIT

DECLARE @sqlStatement NVARCHAR(2048)

DECLARE @paramDefinitions NVARCHAR(500) = N'@currentJob UNIQUEIDENTIFIER, @processingComplete BIT'

DECLARE @qualifiedTableName NVARCHAR(128)

WHILE @i > 0

BEGIN

? ? SELECT @jobParam = JobId, @currentTable = TableName, @processingParam = ProcessingComplete

? ? FROM #JobsToProcess

? ? WHERE RowIndex = @i

? ? SET @qualifiedTableName = '[Database_Staging].[dbo].['+@currentTable+']'

? ? SET @sqlStatement = N'

? ? ? ? --Signal values staging table.

? ? ? ? SELECT svs.* INTO #sValues

? ? ? ? FROM '+ @qualifiedTableName +' svs

? ? ? ? INNER JOIN SignalMetaData smd

? ? ? ? ? ? ON smd.SignalId = svs.SignalId?

? ? ? ? INSERT INTO SignalValues SELECT * FROM #sValues

? ? ? ? SELECT DISTINCT SignalId INTO #uniqueIdentifiers FROM #sValues

? ? ? ? DELETE c FROM '+ @qualifiedTableName +' c INNER JOIN #uniqueIdentifiers u ON c.SignalId = u.SignalId

? ? ? ? DROP TABLE #sValues

? ? ? ? DROP TABLE #uniqueIdentifiers

? ? ? ? IF NOT EXISTS (SELECT TOP 1 1 FROM '+ @qualifiedTableName +') --table is empty

? ? ? ? BEGIN

? ? ? ? ? ? -- processing is completed so drop the table and remvoe the entry

? ? ? ? ? ? IF @processingComplete = 1

? ? ? ? ? ? BEGIN

? ? ? ? ? ? ? ? DELETE FROM SignalValueJobs WHERE JobId = @currentJob

? ? ? ? ? ? ? ? IF '''+@currentTable+''' <> ''SignalValues_staging''

? ? ? ? ? ? ? ? BEGIN

? ? ? ? ? ? ? ? ? ? DROP TABLE '+ @qualifiedTableName +'

? ? ? ? ? ? ? ? END

? ? ? ? ? ? END

? ? ? ? END

? ? '

? ? EXEC sp_executesql @sqlStatement, @paramDefinitions, @currentJob = @jobParam, @processingComplete = @processingParam;

? ? SET @i = @i - 1

END

DROP TABLE #JobsToProcess

I use sp_executesql because the table names for the staging tables come as text from the records in the jobs table.

This stored procedure runs every 2 seconds using the trick I learned from this dba.stackexchange.com post.

The problem I cannot for the life of me resolve is the speed at which the inserts into production are performed. My application creates temporary staging tables and fills them with records incredibly quickly. The insert into production cannot keep up with the amount of tables and eventually there's a surplus of tables into the thousands. The only way I've ever been able to keep up with the incoming data is to remove all keys, indexes, constraints etc... on the production SignalValues table. The problem I then face is that the table ends up with so many records it becomes impossible to query.

I've tried partitioning the table using the [Timestamp] as a partitioning column to no avail. Any form of indexing at all slows the inserts so much that they can't keep up. In addition, I'd need to create thousands of partitions (one every minute? hour?) years in advance. I couldn't figure out how to create them on the fly

I tried creating partitioning by adding a computed column to the table called TimestampMinute whose value was, on INSERT, DATEPART(MINUTE, GETUTCDATE()). Still too slow.

I've tried making it a Memory-Optimized Table as per this Microsoft article. Maybe I don't understand how to do it, but the MOT made the inserts slower somehow.

I've checked the Execution Plan of the stored procedure and found that (I think?) the most intensive operation is

SELECT svs.* INTO #sValues

FROM '+ @qualifiedTableName +' svs

INNER JOIN SignalMetaData smd

? ? ON smd.SignalId = svs.SignalId

To me this doesn't make sense: I've added wall-clock logging to the stored procedure that proved otherwise.

In terms of time-logging, that particular statement above executes in ~300ms on 100k records.

The statement

INSERT INTO SignalValues SELECT * FROM #sValues

executes in 2500-3000ms on 100k records. Deleting from the table the records affected, per:

DELETE c FROM '+ @qualifiedTableName +' c INNER JOIN #uniqueIdentifiers u ON c.SignalId = u.SignalId

takes another 300ms.

How can I make this faster? Can SQL Server handle into the billions of records per day?

If it's relevant, this is SQL Server 2014 Enterprise x64.

Hardware Configuration:

I forgot to include hardware in the first pass of this question. My bad.

I'll preface this with these statements: I know I am losing some performance because of my hardware configuration. I've tried many times but because of budget, C-Level, the alignment of the planets, etc... there's nothing I can do to get a better setup unfortunately. The server is running on a virtual machine and I can't even increase the memory because we simply don't have any more.

Here's my system information:

System Info

The storage is attached to the VM server via iSCSI interface to a NAS box (This will degrade performance). The NAS box has 4 drives in a RAID 10 configuration. They're 4TB WD WD4000FYYZ spinning disk drives with 6GB/s SATA interface. The server only has one data-store configured so tempdb and my database are on the same datastore.

Max DOP is zero. Should I change this to a constant value or just let SQL Server handle it? I read up on RCSI: Am I correct in assuming that the only benefit from RCSI comes with row updates? There will never be updates to any of these particular records, they'll be INSERTed and SELECTed. Will RCSI still benefit me?

My tempdb is 8mb. Based on the answer below from jyao, I changed the #sValues to a regular table to avoid tempdb altogether. Performance was about the same though. I will try increasing the size and growth of tempdb, but given that the size of #sValues will more or less always be the same size I don't anticipate much gain.

I have taken an execution plan that I've attached below. This execution plan is one iteration of a staging table -- 100k records. The execution of the query was fairly quick, around 2 seconds, but keep in mind that this is without indexes on the SignalValues table and the SignalValues table, the target of the INSERT, has no records in it.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌校套,老刑警劉巖擅这,帶你破解...
    沈念sama閱讀 219,366評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件讽挟,死亡現(xiàn)場離奇詭異楣导,居然都是意外死亡闰歪,警方通過查閱死者的電腦和手機榛泛,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,521評論 3 395
  • 文/潘曉璐 我一進(jìn)店門蝌蹂,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人曹锨,你說我怎么就攤上這事孤个。” “怎么了沛简?”我有些...
    開封第一講書人閱讀 165,689評論 0 356
  • 文/不壞的土叔 我叫張陵齐鲤,是天一觀的道長。 經(jīng)常有香客問我椒楣,道長给郊,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,925評論 1 295
  • 正文 為了忘掉前任捧灰,我火速辦了婚禮丑罪,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘凤壁。我一直安慰自己吩屹,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 67,942評論 6 392
  • 文/花漫 我一把揭開白布拧抖。 她就那樣靜靜地躺著煤搜,像睡著了一般。 火紅的嫁衣襯著肌膚如雪唧席。 梳的紋絲不亂的頭發(fā)上擦盾,一...
    開封第一講書人閱讀 51,727評論 1 305
  • 那天,我揣著相機與錄音淌哟,去河邊找鬼迹卢。 笑死,一個胖子當(dāng)著我的面吹牛徒仓,可吹牛的內(nèi)容都是我干的腐碱。 我是一名探鬼主播,決...
    沈念sama閱讀 40,447評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼掉弛,長吁一口氣:“原來是場噩夢啊……” “哼症见!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起殃饿,我...
    開封第一講書人閱讀 39,349評論 0 276
  • 序言:老撾萬榮一對情侶失蹤谋作,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后乎芳,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體遵蚜,經(jīng)...
    沈念sama閱讀 45,820評論 1 317
  • 正文 獨居荒郊野嶺守林人離奇死亡帖池,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,990評論 3 337
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了吭净。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片睡汹。...
    茶點故事閱讀 40,127評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖攒钳,靈堂內(nèi)的尸體忽然破棺而出帮孔,到底是詐尸還是另有隱情,我是刑警寧澤不撑,帶...
    沈念sama閱讀 35,812評論 5 346
  • 正文 年R本政府宣布文兢,位于F島的核電站,受9級特大地震影響焕檬,放射性物質(zhì)發(fā)生泄漏姆坚。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,471評論 3 331
  • 文/蒙蒙 一实愚、第九天 我趴在偏房一處隱蔽的房頂上張望兼呵。 院中可真熱鬧,春花似錦腊敲、人聲如沸击喂。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,017評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽懂昂。三九已至,卻和暖如春没宾,著一層夾襖步出監(jiān)牢的瞬間凌彬,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,142評論 1 272
  • 我被黑心中介騙來泰國打工循衰, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留铲敛,地道東北人。 一個月前我還...
    沈念sama閱讀 48,388評論 3 373
  • 正文 我出身青樓会钝,卻偏偏與公主長得像伐蒋,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子顽素,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 45,066評論 2 355

推薦閱讀更多精彩內(nèi)容