亚洲一区二区三区四区,天天干天天看,久久香蕉网站

本文介紹了訪問 ConcurrentHashMap<Element, Boolean> 的每個元素的可擴展方式恰好一次的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我有 32 個機器線程和一個 ConcurrentHashMap<Key,Value>map，其中包含很多鍵.Key 定義了一個公共方法 visit().我想visit() 使用我可用的處理能力以及可能的某種線程池，只對 map 的每個元素進行一次.

I have 32 machine threads and one ConcurrentHashMap<Key,Value> map, which contains a lot of keys. Key has defined a public method visit(). I want to visit() every element of map exactly once using the processing power I have available and possibly some sort of thread pooling.

我可以嘗試的事情:

我可以使用 map.keys() 方法.生成的 Enumeration 可以使用 nextElement() 進行迭代，但是由于對 key.visit() 的調(diào)用非常簡短，因此我無法使線程保持忙碌.枚舉本質(zhì)上是單線程的.
我可以使用同步的 HashSet<Key> 代替，調(diào)用方法 toArray() 并將數(shù)組上的工作分成所有 32 個線程.我嚴(yán)重懷疑這個解決方案，因為方法 toArray() 很可能是單線程瓶頸.
我可以嘗試從 ConcurrentHashMap 繼承，掌握其內(nèi)部 Segment<K,V> 的實例，嘗試將它們分成 32 個組并工作分別在每組上.不過，這聽起來像是一種硬核方法.
或使用 Enumeration<Key> 的類似魔法.

I could use the method map.keys(). The resulting Enumeration<Key> could be iterated over using nextElement(), but since a call to key.visit() is very brief I won't manage to keep threads busy. The Enumeration is inherently single-threaded.
I could use a synchronised HashSet<Key> instead, invoke a method toArray() and split the work on the array into all 32 threads. I seriously doubt in this solution, since the method toArray() will likely be a single-thread bottle-neck.
I could try to inherit from ConcurrentHashMap, get my hands on the instances of its inner Segment<K,V>, try to group them into 32 groups and work on each group separately. This sounds like a hardcore approach though.
or similar magic with Enumeration<Key>.

理想情況下:

理想情況下，ConcurrentHashMap<Key, Value> 將定義一個方法 keysEnumerator(intapproxPosition)，這可能會使我丟失大約前 1/32 個元素的枚舉器，即map.keysEnumerator(map.size()/32)

Ideally a ConcurrentHashMap<Key, Value> would define a method keysEnumerator(int approximatePosition), which could drop me an enumerator missing approximately first 1/32 elements, i.e. map.keysEnumerator(map.size()/32)

我錯過了什么明顯的東西嗎?有沒有人遇到過類似的問題?

Am I missing anything obvious? Has anybody run into similar problem before?

編輯

我已經(jīng)進行了分析，看看這個問題是否真的會影響實踐中的性能.由于目前我無法訪問集群，因此我使用筆記本電腦并嘗試將結(jié)果外推到更大的數(shù)據(jù)集.在我的機器上，我可以創(chuàng)建一個 200 萬個鍵 ConcurrentHashMap，并且在每個鍵上調(diào)用 visit() 方法來迭代它大約需要 1 秒.該程序應(yīng)該擴展到 8500 萬鍵(及以上).集群的處理器稍微快一些，但它仍然需要大約 40 秒來迭代整個地圖.現(xiàn)在談?wù)劤绦虻倪壿嬃鞒?呈現(xiàn)的邏輯是順序的，即在上一步中的所有線程都完成之前，不允許任何線程進行下一步:

I've had a go at profiling to see whether this problem is actually going to affect the performance in practice. As I don't have access to the cluster at the moment I used my laptop and tried to extrapolate the results to a bigger dataset. On my machine I can create a 2 million keys ConcurrentHashMap and it takes about 1 second to iterate over it invoking the visit() method on every key. The program is supposed to scale to 85 million keys (and over). The cluster's processor is slightly faster, but it still should take about 40 seconds to iterate over entire map. Now a few words about the logic flow of the program. The logic presented is sequential, i.e. it is not allowed for any thread to proceed to the next step until all the threads in the previous step are finished:

創(chuàng)建哈希映射，創(chuàng)建鍵并填充哈希映射
遍歷訪問所有鍵的整個哈希映射.
進行一些數(shù)據(jù)混洗，即并行插入和刪除.
將第 2 步和第 3 步重復(fù)數(shù)百次.

Create the hash map, create the keys and populate the hash map
Iterate over entire hash map visiting all the keys.
Do some data shuffling which is parallel insertions and deletions.
Repeat step 2 and 3 a few hundred times.

這個邏輯流程意味著一個 40 秒的迭代將被重復(fù)幾百次，比如 100 次.這讓我們在訪問節(jié)點上花費了 一個多小時.使用一組 32 個并行迭代器，它可以縮短到幾分鐘，這是一個顯著的性能改進.

That logic flow means that a 40 second iteration is going to be repeated a few hundred times, say 100. Which gives us a bit over an hour spent just in visiting the nodes. With a set of 32 parallel iterators it could go down to just a few minutes, which is a significant performance improvement.

現(xiàn)在談?wù)?ConcurrentHashMap 是如何工作的(或者我認為它是如何工作的).每個 ConcurrentHashMap 都由段組成(默認為 16 個).對哈希映射的每次寫入都會在相關(guān)段上同步.假設(shè)我們正在嘗試將兩個新鍵 k1 和 k2 寫入哈希映射，并且它們將被解析為屬于同一段，例如 s1.如果嘗試同時寫入它們，則其中一個將首先獲取鎖，然后再添加另一個.兩個元素被解析為屬于同一段的機會是多少?如果我們有一個好的散列函數(shù)和 16 個段，那么它就是 1/16.

Now a few words on how ConcurrentHashMap works (Or how I believe it works). Every ConcurrentHashMap consists of segments (by default 16). Every write to a hash map is synchronised on a relevant segment. So say we're trying to write two new keys k1 and k2 to the hash map and that they would be resolved to belong to the same segment, say s1. If they are attempted to be written simultaneously, one of them is going to acquire the lock first and be added earlier then the other. What is the chance of two elements to be resolved to belong to the same segment? In case we have got a good hash function and 16 segements it is 1/16.

我相信 ConcurrentHashMap 應(yīng)該有一個方法 concurrentKeys()，它將返回一個枚舉數(shù)組，每個段一個.我有一些想法如何通過繼承將它添加到 ConcurrentHashMap，如果我成功了，我會告訴你的.就目前而言，解決方案似乎是創(chuàng)建一個 ConcurrentHashMaps 數(shù)組并預(yù)??先散列每個鍵以解析為此類數(shù)組的一個成員.準(zhǔn)備好后，我也會分享該代碼.

I belive that ConcurrentHashMap should have a method concurrentKeys(), which would return an array of Enumerations, one per each segment. I have got a few ideas how to add it to ConcurrentHashMap through inheritance and I'll let you know if I succeed. As for now the solution seems to be to create an array of ConcurrentHashMaps and pre-hashing every key to resolve to one member of such array. I'll share that code as well, once it's ready.

編輯

這是不同語言的相同問題:

This is the same problem in a different language:

并行迭代器

推薦答案

我最終會采用的解決方案是一個 ConcurrentHashMaps 數(shù)組，而不是一個 ConcurrentHashMap.這是臨時的，但似乎與我的用例有關(guān).我不在乎第二步的速度很慢，因為它不會影響我的代碼的性能.解決辦法是:

The solution I will eventually go for is an array of ConcurrentHashMaps instead of one ConcurrentHashMap. This is ad hoc, but seems to be relevant for my usecase. I don't care about the second step being slow as it doesn't affect my code's performance. The solution is:

對象創(chuàng)建:

創(chuàng)建一個大小為 t 的 ConcurrentHashMaps 數(shù)組，其中 t 是線程數(shù).
創(chuàng)建一個大小為 t 的 Runnables 數(shù)組.

Create an array of size t of ConcurrentHashMaps, where t is a number of threads.
Create an array of Runnables, also of size t.

數(shù)組填充(單線程，不是問題):

Array Population (single threaded, not an issue):

創(chuàng)建鍵并應(yīng)用 pre-hash 函數(shù)，它將返回 0 ... t-1 范圍內(nèi)的 int.在我的情況下，只需模 t.
通過訪問數(shù)組中的適當(dāng)條目，將鍵放入哈希圖中.例如.如果預(yù)哈希導(dǎo)致索引為 4，則使用 hashArray[4].put(key)

Create the keys and apply pre-hash function, which will return an int in the range 0 ... t-1. In my case simply modulo t.
Put the key in the hashmap, by accessing appropriate entry in the array. E.g. if the pre-hash has resulted in index 4, then go for hashArray[4].put(key)

數(shù)組迭代(很好的多線程，性能提升):

Array Iteration (nicely multithreaded, performance gain):

為 Runnables 數(shù)組中的每個線程分配一個使用相應(yīng)索引遍歷 hashmap 的作業(yè).與單線程相比，這應(yīng)該使迭代時間縮短 t 倍.

Assign every thread from Runnables array a job of iterating over the hashmap with a corresponding index. This should give give a t times shorter iteration as opposed to single threaded.

要查看概念驗證代碼(因為它有一些來自項目的依賴項，我無法在此處發(fā)布)前往我在 github 上的項目

To see the proof of concept code (as it's got some dependencies from the project I can't post it here) head towards my project on github

編輯

實際上，為我的系統(tǒng)實施上述概念證明已被證明是耗時、容易出錯且令人非常失望的.此外，我發(fā)現(xiàn)我會錯過標(biāo)準(zhǔn)庫 ConcurrentHashMap 的許多功能.我最近一直在探索的解決方案是使用 Scala，它看起來不那么特別而且更有希望，它產(chǎn)生的字節(jié)碼可以與 Java 完全互操作.概念證明依賴于本文中描述的令人驚嘆的庫和 AFAIK考慮到標(biāo)準(zhǔn)庫和相應(yīng)第三方庫的當(dāng)前狀態(tài)，如果不編寫數(shù)千行代碼，目前在 vanilla Java 中實現(xiàn)相應(yīng)的解決方案是不可能的.

Actually, implementing the above proof of concept for my system has proven to be time-consuming, bug-prone and grossly disappointing. Additionally I've discovered I would have missed many features of the standard library ConcurrentHashMap. The solution I have been exploring recently, which looks much less ad-hoc and much more promising is to use Scala, which produces bytecode that is fully interoperable with Java. The proof of concept relies on stunning library described in this paper and AFAIK it is currently IMPOSSIBLE to achieve a corresponding solution in vanilla Java without writing thousands lines of code, given the current state of the standard library and corresponding third-party libraries.

import scala.collection.parallel.mutable.ParHashMap

class Node(value: Int, id: Int){
    var v = value
    var i = id
    override def toString(): String = v toString
}

object testParHashMap{
    def visit(entry: Tuple2[Int, Node]){
        entry._2.v += 1
    }
    def main(args: Array[String]){
        val hm = new ParHashMap[Int, Node]()
        for (i <- 1 to 10){
            var node = new Node(0, i)
            hm.put(node.i, node)
        }

        println("========== BEFORE ==========")
        hm.foreach{println}

        hm.foreach{visit}

        println("========== AFTER ==========")
        hm.foreach{println}

    }
}

這篇關(guān)于訪問 ConcurrentHashMap<Element, Boolean> 的每個元素的可擴展方式恰好一次的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！

pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

訪問 ConcurrentHashMap<Element, Boolean> 的每

問題描述

推薦答案

相關(guān)文檔推薦

pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

訪問 ConcurrentHashMap&lt;Element, Boolean&gt; 的每

問題描述

推薦答案

相關(guān)文檔推薦

訪問 ConcurrentHashMap<Element, Boolean> 的每