問題描述
我有一個普通的 Python 列表,其中包含(多維)numPy 數組,所有數組都具有相同的形狀和相同數量的值.列表中的一些數組與之前的數組重復.
I have an ordinary Python list that contains (multidimensional) numPy arrays, all of the same shape and with the same number of values. Some of the arrays in the list are duplicates of earlier ones.
我的問題是我想刪除所有重復項,但是數據類型是 numPy 數組這一事實使這有點復雜......
I have the problem that I want to remove all the duplicates, but the fact that the data type is numPy arrays complicates this a bit...
? 我不能使用 set(),因為 numPy 數組不可散列.
? 我無法在插入過程中檢查重復項,因為數組是由函數批量生成并使用 .extend() 添加到列表中的.
? numPy 數組在不使用 numPy 自己的函數之一的情況下無法直接進行比較,所以我不能只使用if x in list"...
? 列表的內容需要在進程結束時保留為numPy 數組;我可以比較轉換為嵌套列表的數組的副本,但我不能將數組永久轉換為直接的 python 列表.
? I can't use set() as numPy arrays are not hashable.
? I can't check for duplicates during insertion, as the arrays are generated in batches by a function and added to the list with .extend().
? numPy arrays aren't directly comparable without resorting to one of numPy's own functions, so I can't just go something that uses "if x in list"...
? The contents of the list need to remain numPy arrays at the end of the process; I could compare copies of the arrays converted to nested lists, but I can't convert the arrays to straight python lists permanently.
關于如何在這里有效地刪除重復項有什么建議嗎?
Any suggestions on how I can remove duplicates efficiently here?
推薦答案
在這里使用解決方案:numpy 數組最有效的散列屬性 我們看到,如果 a 是一個 numpy 數組,散列最適合使用 a.tostring().所以:
Using the solutions here: Most efficient property to hash for numpy array we see that hashing works best with a.tostring() if a is an numpy array. So:
import numpy as np
arraylist = [np.array([1,2,3,4]), np.array([1,2,3,4]), np.array([1,3,2,4])]
L = {array.tostring(): array for array in arraylist}
L.values() # [array([1, 3, 2, 4]), array([1, 2, 3, 4])]
這篇關于從 numPy 數組列表中刪除重復項的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!