1.Download StackExchange's open data
https://archive.org/details/stackexchange
2.Importing and Process data from XML files into SQL Server tables
https://www.mssqltips.com/sqlservertip/2899/importing-and-processing-data-from-xml-files-into-sql-server-tables/
step1.Importing XML data from XML file using OPENROWSET
對于第一個腳本稽亏,第一個創(chuàng)建數(shù)據(jù)庫的命令我選擇了注釋掉改為手動創(chuàng)建該數(shù)據(jù)庫,因為使用該命令創(chuàng)建的權限方面貌似有點問題。這個階段遇到的第二個問題是導入的xml文件過大聪廉,sql server默認的配置是導入的xml文件有2M限制旁振,在工具-選項里可以設置一下囱淋,設置完畢記得重啟sql server嗜闻。雖然這個問題解決了斩芭,但在導入400M左右的xml文件時乏冀,sql server報“System.OutOfMemoryException”的異常蝶糯,猜測可能跟xml文件有關,因為在跑普通的table時辆沦,4G左右的表也沒有出現(xiàn)這個問題昼捍,暫時先不解決了。
step2.Process XML data using OPENXML function
First call the sp_xml_preparedocument stored procedure by specifying the XML data which will then output the handle of the XML data that it has prepared and stored in internal cache.Then we will use the handle returned by the sp_xml_preparedocument stored procedure in the OPENXML function to open the XML data and read it.
這個過程按照XML層級的不同以及自己數(shù)據(jù)提取需求的不同肢扯,要針對性的寫屬性妒茬,外部架構都是一樣的,關鍵是屬性提取那里蔚晨,目錄都是寫到需要提取的最深層級乍钻,然后用'../'來返回上一級(父親節(jié)點,對這里是以樹的形式存儲臨時數(shù)據(jù)的)铭腕。"EXEC sp_xml_removedocument @hDoc"語句用來釋放內(nèi)存银择。
step3.把查詢結果存到新表中,以下是代碼示例
USE Badges
GO
DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)
SELECT @XML = XMLData FROM XMLwithOpenXMLBadges
EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML
CREATE TABLE Badges
(
UserId varchar(50),
Name varchar(100),
Date datetime
);
INSERT INTO Badges
SELECT UserId,Name,Date
FROM OPENXML(@hDoc,'badges/row')
WITH
(
UserId varchar '@UserId',
Name varchar '@Name',
Date datetime '@Date'
)
EXEC sp_xml_removedocument @hDoc
GO
谨履、