減輕內存負擔媚媒，在 pymysql 中使用 SSCursor 查詢結果集較大的 SQL

減輕內存負擔，在 pymysql 中使用 SSCursor 查詢結果集較大的 SQL

前言

默認情況下曙寡，使用 pymysql 查詢數據使用的游標類是 Cursor糠爬，比如：

import pymysql.cursors

# 連接數據庫
connection = pymysql.connect(host='localhost',
                             user='user',
                             password='passwd',
                             db='db',
                             charset='utf8mb4')

try:
    with connection.cursor() as cursor:
        # 讀取所有數據
        sql = "SELECT `id`, `password` FROM `users` WHERE `email`=%s"
        cursor.execute(sql, ('webmaster@python.org',))
        result = cursor.fetchall()
        print(result)
finally:
    connection.close()

這種寫法會將查詢到的所有數據寫入內存中，若在結果較大的情況下举庶，會對內存造成很大的壓力执隧，所幸 pymysql 實現了一種 SSCursor 游標類，它允許將查詢結果按需返回户侥，而不是一次性全部返回導致內存使用量飆升镀琉。

SSCursor

官方文檔的解釋為：

Unbuffered Cursor, mainly useful for queries that return a lot of data,
or for connections to remote servers over a slow network.

Instead of copying every row of data into a buffer, this will fetch
rows as needed. The upside of this is the client uses much less memory,
and rows are returned much faster when traveling over a slow network
or if the result set is very big.

There are limitations, though. The MySQL protocol doesn't support
returning the total number of rows, so the only way to tell how many rows
there are is to iterate over every row returned. Also, it currently isn't
possible to scroll backwards, as only the current row is held in memory.

大致翻譯為：無緩存的游標，主要用于查詢大量結果集或網絡連接較慢的情況蕊唐。不同于普通的游標類將每一行數據寫入緩存的操作屋摔，該游標類會按需讀取數據，這樣的好處是客戶端消耗的內存較小替梨，而在網絡連接較慢或結果集較大的情況下钓试，數據的返回也會更快署尤。當然，缺點就是它不支持返回結果的行數（也就是調用 rowcount 屬性將不會得到正確的結果亚侠，一共有多少行數據則需要全部迭代完成才能知道）曹体，當然它也不支持往回讀取數據（這也很好理解，畢竟是生成器嘛）硝烂。

它的寫法如下：

from pymysql.cursors import SSCursor

connection = pymysql.connect(host='localhost',
                             user='user',
                             password='passwd',
                             db='db',
                             charset='utf8mb4')

# 創(chuàng)建游標
cur = connection.cursor(SScursor)
cur.execute('SELECT * FROM test_table')

# 讀取數據
# 此時的 cur 對內存消耗相對 Cursor 類來說簡直微不足道
for data in cur:
    print(data)

本質上對所有游標類的迭代都是在不斷的調用 fetchone 方法箕别，不同的是 SSCursor 對 fetchone 方法的實現不同罷了。這一點查看源碼即可發(fā)現：
Cursor 類 fetchone 方法源碼（可見它是在根據下標獲取列表中的某條數據）：

image

SSCursor 類 fetchone 方法源碼（讀取數據并不做緩存）：

pylittleimage-20201025192825032.png

跳坑

當然滞谢，如果沒有坑就沒必要為此寫一篇文章了串稀，開開心心的用著不香嗎。經過多次使用狮杨，發(fā)現在使用 SSCursor 游標類（以及其子類 SSDictCursor）時母截，需要特別注意以下兩個問題：

1. 讀取數據間隔問題

每條數據間的讀取間隔若超過 60s，可能會造成異常橄教，這是由于 MySQL 的 NET_WRITE_TIMEOUT 設置引起的錯誤（該設置值默認為 60）清寇，如果讀取的數據有處理時間較長的情況，那么則需要考慮更改 MySQL 的相關設置了护蝶。（tips: 使用 sql SET NET_WRITE_TIMEOUT = xx 更改該設置或修改 MySQL配置文件）

2. 讀取數據時對數據庫的其它操作行為

因為 SSCursor 是沒有緩存的华烟，只要結果集沒有被讀取完成，就不能使用該游標綁定的連接進行其它數據庫操作（包括生成新的游標對象）持灰，如果需要做其它操作盔夜，應該使用新的連接。比如：

from pymysql.cursors import SSCursor

def connect():
    connection = pymysql.connect(host='localhost',
                                 user='user',
                                 password='passwd',
                                 db='db',
                                 charset='utf8mb4')
    return connection

conn1 = connect()
conn2 = connect()

cur1 = conn1.cursor(SScursor)
cur2 = conn1.cursor()

with conn1.cursor(SSCursor) as ss_cur, conn2.cursor() as cur:
    try:
        ss_cur.execute('SELECT id, name FROM test_table')

        for data in ss_cur:
            # 使用 conn2 的游標更新數據
            if data[0] == 15:
                cur.execute('UPDATE tset_table SET name="kingron" WHERE id=%s', args=[data[0])

            print(data)
    finally:
        conn1.close()
        conn2.close()

參考

Cursor Objects — PyMySQL 0.7.2 documentation
Using SSCursor (streaming cursor) to solve Python using pymysql to query large amounts of data leads to memory usage is too high

最后編輯于：2020.10.25 21:07:29

?著作權歸作者所有,轉載或內容合作請聯系作者