問題描述
如何在 pymongo 中進行批量更新插入?我想更新一堆條目,一次做一個非常慢.
How can I do a bulk upsert in pymongo? I want to Update a bunch of entries and doing them one at a time is very slow.
幾乎相同問題的答案在這里:在 MongoDB 中批量更新/upsert?一個>
The answer to an almost identical question is here: Bulk update/upsert in MongoDB?
接受的答案實際上并沒有回答問題.它只是提供了 mongo CLI 的鏈接以進行導入/導出.
The accepted answer doesn't actually answer the question. It simply gives a link to the mongo CLI for doing import/exports.
我也愿意解釋為什么做批量 upsert 是不可能的/不是最佳實踐,但請解釋解決此類問題的首選解決方案是什么.
I would also be open to someone explaining why doing a bulk upsert is no possible / no a best practice, but please explain what the preferred solution to this sort of problem is.
推薦答案
現代版本的 pymongo(大于 3.x)將批量操作包裝在一致的接口中,該接口在服務器版本不支持批量操作的情況下降級.這在 MongoDB 官方支持的驅動程序中現在是一致的.
Modern releases of pymongo ( greater than 3.x ) wrap bulk operations in a consistent interface that downgrades where the server release does not support bulk operations. This is now consistent in MongoDB officially supported drivers.
所以編碼的首選方法是使用 bulk_write()
改為使用 UpdateOne
其他適當的操作動作代替.現在當然首選使用自然語言列表而不是特定的構建器
So the preferred method for coding is to use bulk_write()
instead, where you use an UpdateOne
other other appropriate operation action instead. And now of course it is preferred to use the natural language lists rather than a specific builder
舊文檔的直接翻譯:
from pymongo import UpdateOne
operations = [
UpdateOne({ "field1": 1},{ "$push": { "vals": 1 } },upsert=True),
UpdateOne({ "field1": 1},{ "$push": { "vals": 2 } },upsert=True),
UpdateOne({ "field1": 1},{ "$push": { "vals": 3 } },upsert=True)
]
result = collection.bulk_write(operations)
或者經典的文檔轉換循環:
Or the classic document transformation loop:
import random
from pymongo import UpdateOne
random.seed()
operations = []
for doc in collection.find():
# Set a random number on every document update
operations.append(
UpdateOne({ "_id": doc["_id"] },{ "$set": { "random": random.randint(0,10) } })
)
# Send once every 1000 in batch
if ( len(operations) == 1000 ):
collection.bulk_write(operations,ordered=False)
operations = []
if ( len(operations) > 0 ):
collection.bulk_write(operations,ordered=False)
返回的結果是BulkWriteResult
將包含匹配和更新文檔的計數器以及發生的任何更新插入"的返回 _id
值.
對于批量操作數組的大小存在一些誤解.發送到服務器的實際請求不能超過 16MB BSON 限制,因為該限制也適用于發送到使用 BSON 格式的服務器的請求".
There is a bit of a misconception about the size of the bulk operations array. The actual request as sent to the server cannot exceed the 16MB BSON limit since that limit also applies to the "request" sent to the server which is using BSON format as well.
但是,這并不能控制您可以構建的請求數組的大小,因為實際操作無論如何只會以 1000 個批次發送和處理.唯一真正的限制是這 1000 條操作指令本身實際上并不會創建大于 16MB 的 BSON 文檔.這確實是一項艱巨的任務.
However that does not govern the size of the request array that you can build, as the actual operations will only be sent and processed in batches of 1000 anyway. The only real restriction is that those 1000 operation instructions themselves do not actually create a BSON document greater than 16MB. Which is indeed a pretty tall order.
批量方法的一般概念是減少流量",因為一次發送許多東西并且只處理一個服務器響應.減少附加到每個更新請求的開銷可以節省大量時間.
The general concept of bulk methods is "less traffic", as a result of sending many things at once and only dealing with one server response. The reduction of that overhead attached to every single update request saves lots of time.
這篇關于pymongo 中的快速或批量更新的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!