pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

<i id='WVBAc'><tr id='WVBAc'><dt id='WVBAc'><q id='WVBAc'><span id='WVBAc'><b id='WVBAc'><form id='WVBAc'><ins id='WVBAc'></ins><ul id='WVBAc'></ul><sub id='WVBAc'></sub></form><legend id='WVBAc'></legend><bdo id='WVBAc'><pre id='WVBAc'><center id='WVBAc'></center></pre></bdo></b><th id='WVBAc'></th></span></q></dt></tr></i><div class="7vdd7j3" id='WVBAc'><tfoot id='WVBAc'></tfoot><dl id='WVBAc'><fieldset id='WVBAc'></fieldset></dl></div>
  • <small id='WVBAc'></small><noframes id='WVBAc'>

    1. <tfoot id='WVBAc'></tfoot>
        <bdo id='WVBAc'></bdo><ul id='WVBAc'></ul>
      1. <legend id='WVBAc'><style id='WVBAc'><dir id='WVBAc'><q id='WVBAc'></q></dir></style></legend>

        Python 多處理性能僅隨著使用的內核數的平方根而

        Python multiprocessing performance only improves with the square root of the number of cores used(Python 多處理性能僅隨著使用的內核數的平方根而提高)

          • <small id='WP2Uw'></small><noframes id='WP2Uw'>

          • <tfoot id='WP2Uw'></tfoot>
            • <bdo id='WP2Uw'></bdo><ul id='WP2Uw'></ul>
            • <i id='WP2Uw'><tr id='WP2Uw'><dt id='WP2Uw'><q id='WP2Uw'><span id='WP2Uw'><b id='WP2Uw'><form id='WP2Uw'><ins id='WP2Uw'></ins><ul id='WP2Uw'></ul><sub id='WP2Uw'></sub></form><legend id='WP2Uw'></legend><bdo id='WP2Uw'><pre id='WP2Uw'><center id='WP2Uw'></center></pre></bdo></b><th id='WP2Uw'></th></span></q></dt></tr></i><div class="pvrjpt7" id='WP2Uw'><tfoot id='WP2Uw'></tfoot><dl id='WP2Uw'><fieldset id='WP2Uw'></fieldset></dl></div>
              <legend id='WP2Uw'><style id='WP2Uw'><dir id='WP2Uw'><q id='WP2Uw'></q></dir></style></legend>

                    <tbody id='WP2Uw'></tbody>
                  本文介紹了Python 多處理性能僅隨著使用的內核數的平方根而提高的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

                  問題描述

                  限時送ChatGPT賬號..

                  我正在嘗試在 Python (Windows Server 2012) 中實現多處理,但無法達到我期望的性能改進程度.特別是對于一組幾乎完全獨立的任務,我希望通過增加內核實現線性改進.

                  <小時>

                  我知道——尤其是在 Windows 上——打開新進程會產生開銷
                  ( "Normalized Performance" 是 [ 1 CPU-core ] 的運行時間除以 [ N CPU 內核 ] 的運行時間).

                  多處理導致回報急劇減少是否正常?或者我的實現缺少什么?

                  <小時>

                  將 numpy 導入為 npfrom multiprocessing import Pool, cpu_count, Manager將數學導入為 m從 functools 導入部分從時間進口時間def check_prime(num):#斷言正整數值如果 num!=m.floor(num) 或 num<1:print("輸入必須是正整數")返回無#檢查所有可能因素的可分性素數 = 真對于范圍內的 i (2,num):如果 num%i==0: 素數=假返回素數def cp_worker(num, L):素數 = check_prime(num)L.append((num, prime))def mp_primes(omag, mp=cpu_count()):以 Manager() 作為經理:np.random.seed(0)numlist = np.random.randint(10**omag, 10**(omag+1), 100)L = manager.list()cp_worker_ptl = 部分(cp_worker,L=L)嘗試:池 = 池(進程 = mp)列表(pool.imap(cp_worker_ptl,numlist))例外為 e:打印(e)最后:pool.close() # 沒有更多任務pool.join()返回 L如果 __name__ == '__main__':rt = []對于我在范圍內(cpu_count()):t0 = 時間()mp_result = mp_primes(6, mp=i+1)t1 = 時間()rt.append(t1-t0)print("使用 %i 個核心,運行時間為 %.2fs" % (i+1, rt[-1]))

                  注意:我知道對于這項任務,實現多線程可能會更有效線程,但這個是簡化模擬的實際腳本是由于 GIL,與 Python 多線程不兼容.

                  解決方案


                  最近的祝你在這個有趣的領域好運!

                  <小時>

                  最后但并非最不重要的一點,

                  NUMA/non-locality 問題在討論 HPC 級調整(緩存內/RAM 內計算策略)的擴展討論中得到了重視,并且可能 - 作為副作用 - 有助于檢測缺陷(如由

                  I am attempting to implement multiprocessing in Python (Windows Server 2012) and am having trouble achieving the degree of performance improvement that I expect. In particular, for a set of tasks which are almost entirely independent, I would expect a linear improvement with additional cores.


                  I understand that--especially on Windows--there is overhead involved in opening new processes [1], and that many quirks of the underlying code can get in the way of a clean trend. But in theory the trend should ultimately still be close to linear for a fully parallelized task [2]; or perhaps logistic if I were dealing with a partially serial task [3].

                  However, when I run multiprocessing.Pool on a prime-checking test function (code below), I get a nearly perfect square-root relationship up to N_cores=36 (the number of physical cores on my server) before the expected performance hit when I get into the additional logical cores.


                  Here is a plot of my performance test results :
                  ( "Normalized Performance" is [ a run time with 1 CPU-core ] divided by [ a run time with N CPU-cores ] ).

                  Is it normal to have this dramatic diminishing of returns with multiprocessing? Or am I missing something with my implementation?


                  import numpy as np
                  from multiprocessing import Pool, cpu_count, Manager
                  import math as m
                  from functools import partial
                  from time import time
                  
                  def check_prime(num):
                  
                      #Assert positive integer value
                      if num!=m.floor(num) or num<1:
                          print("Input must be a positive integer")
                          return None
                  
                      #Check divisibility for all possible factors
                      prime = True
                      for i in range(2,num):
                          if num%i==0: prime=False
                      return prime
                  
                  def cp_worker(num, L):
                      prime = check_prime(num)
                      L.append((num, prime))
                  
                  
                  def mp_primes(omag, mp=cpu_count()):
                      with Manager() as manager:
                          np.random.seed(0)
                          numlist = np.random.randint(10**omag, 10**(omag+1), 100)
                  
                          L = manager.list()
                          cp_worker_ptl = partial(cp_worker, L=L)
                  
                          try:
                              pool = Pool(processes=mp)   
                              list(pool.imap(cp_worker_ptl, numlist))
                          except Exception as e:
                              print(e)
                          finally:
                              pool.close() # no more tasks
                              pool.join()
                  
                          return L
                  
                  
                  if __name__ == '__main__':
                      rt = []
                      for i in range(cpu_count()):
                          t0 = time()
                          mp_result = mp_primes(6, mp=i+1)
                          t1 = time()
                          rt.append(t1-t0)
                          print("Using %i core(s), run time is %.2fs" % (i+1, rt[-1]))
                  

                  Note: I am aware that for this task it would likely be more efficient to implement multithreading, but the actual script for which this one is a simplified analog is incompatible with Python multithreading due to GIL.

                  解決方案

                  @KellanM deserved [+1] for quantitative performance monitoring

                  am I missing something with my implementation?

                  Yes, you abstract from all add-on costs of the process-management.

                  While you have expressed an expectation of " a linear improvement with additional cores. ", this would hardly appear in practice for several reasons ( even the hype of communism failed to deliver anything for free ).

                  Gene AMDAHL has formulated the inital law of diminishing returns.
                  A more recent, re-formulated version, took into account also the effects of process-management {setup|terminate}-add-on overhead costs and tried to cope with atomicity-of-processing ( given large workpackage payloads cannot get easily re-located / re-distributed over available pool of free CPU-cores in most common programming systems ( except some indeed specific micro-scheduling art, like the one demonstrated in Semantic Design's PARLANSE or LLNL's SISAL have shown so colourfully in past ).


                  A best next step?

                  If indeed interested in this domain, one may always experimentally measure and compare the real costs of process management ( plus data-flow costs, plus memory-allocation costs, ... up until the process-termination and results re-assembly in the main process ) so as to quantitatively fair record and evaluate the add-on costs / benefit ratio of using more CPU-cores ( that will get, in python, re-instated the whole python-interpreter state, including all its memory-state, before a first usefull operation will get carried out in a first spawned and setup process ).

                  Underperformance ( for the former case below )
                  if not disastrous effects ( from the latter case below ),
                  of either of ill-engineered resources-mapping policy, be it
                  an "under-booking"-resources from a pool of CPU-cores
                  or
                  an "over-booking"-resources from a pool of RAM-space
                  are discussed also here

                  The link to the re-formulated Amdahl's Law above will help you evaluate the point of diminishing returns, not to pay more than will ever receive.

                  Hoefinger et Haunschmid experiments may serve as a good practical evidence, how a growing number of processing-nodes ( be it a local O/S managed CPU-core, or a NUMA distributed architecture node ) will start decreasing the resulting performance,
                  where a Point of diminishing returns ( demonstrated in overhead agnostic Amdahl's Law )
                  will actually start to become a Point after which you pay more than receive. :

                  Good luck on this interesting field!


                  Last, but not least,

                  NUMA / non-locality issues get their voice heard, into the discussion of scaling for HPC-grade tuned ( in-Cache / in-RAM computing strategies ) and may - as a side-effect - help detect the flaws ( as reported by @eryksun above ). One may feel free to review one's platform actual NUMA-topology by using lstopo tool, to see the abstraction, that one's operating system is trying to work with, once scheduling the "just"-[CONCURRENT] task execution over such a NUMA-resources-topology:

                  這篇關于Python 多處理性能僅隨著使用的內核數的平方根而提高的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

                  【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

                  相關文檔推薦

                  What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                  Passing multiple parameters to pool.map() function in Python(在 Python 中將多個參數傳遞給 pool.map() 函數)
                  multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開
                  Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進程池.當其中一個工作進程確定不再需要完成工作時,如何退出腳本?) - IT屋-程序員
                  How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊列引用傳遞給 pool.map_async() 管理的函數?)
                  yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯誤的另一個混淆,“模塊對象沒有屬性“f)

                      • <bdo id='Fmxlr'></bdo><ul id='Fmxlr'></ul>
                        <i id='Fmxlr'><tr id='Fmxlr'><dt id='Fmxlr'><q id='Fmxlr'><span id='Fmxlr'><b id='Fmxlr'><form id='Fmxlr'><ins id='Fmxlr'></ins><ul id='Fmxlr'></ul><sub id='Fmxlr'></sub></form><legend id='Fmxlr'></legend><bdo id='Fmxlr'><pre id='Fmxlr'><center id='Fmxlr'></center></pre></bdo></b><th id='Fmxlr'></th></span></q></dt></tr></i><div class="nrnvrll" id='Fmxlr'><tfoot id='Fmxlr'></tfoot><dl id='Fmxlr'><fieldset id='Fmxlr'></fieldset></dl></div>
                        1. <tfoot id='Fmxlr'></tfoot>

                            <tbody id='Fmxlr'></tbody>

                            <small id='Fmxlr'></small><noframes id='Fmxlr'>

                            <legend id='Fmxlr'><style id='Fmxlr'><dir id='Fmxlr'><q id='Fmxlr'></q></dir></style></legend>

                          • 主站蜘蛛池模板: 节流截止放空阀-不锈钢阀门-气动|电动截止阀-鸿华阀门有限公司 | 镀锌角钢_槽钢_扁钢_圆钢_方矩管厂家_镀锌花纹板-海邦钢铁(天津)有限公司 | 粉末包装机,拆包机厂家,价格-上海强牛包装机械设备有限公司 | 防爆电机-高压防爆电机-ybx4电动机厂家-河南省南洋防爆电机有限公司 | 瓶盖扭矩测试仪-瓶盖扭力仪-全自动扭矩仪-济南三泉中石单品站 | 地磅-地秤-江阴/无锡地磅-江阴天亿计量设备有限公司_ | 棉服定制/厂家/公司_棉袄订做/价格/费用-北京圣达信棉服 | 灰板纸、灰底白、硬纸板等纸品生产商-金泊纸业 | 搪瓷搅拌器,搪玻璃搅拌器,搪玻璃冷凝器_厂家-淄博越宏化工设备 | 不锈钢轴流风机,不锈钢电机-许昌光维防爆电机有限公司(原许昌光维特种电机技术有限公司) | 河南道路标志牌_交通路标牌_交通标志牌厂家-郑州路畅交通 | 高尔夫球杆_高尔夫果岭_高尔夫用品-深圳市新高品体育用品有限公司 | 蓝莓施肥机,智能施肥机,自动施肥机,水肥一体化项目,水肥一体机厂家,小型施肥机,圣大节水,滴灌施工方案,山东圣大节水科技有限公司官网17864474793 | 宽带办理,电信宽带,移动宽带,联通宽带,电信宽带办理,移动宽带办理,联通宽带办理 | 济南侦探调查-济南调查取证-山东私家侦探-山东白豹调查咨询公司 密集架|电动密集架|移动密集架|黑龙江档案密集架-大量现货厂家销售 | 四川职高信息网-初高中、大专、职业技术学校招生信息网 | 缝纫客| 丽陂特官网_手机信号屏蔽器_Wifi信号干扰器厂家_学校考场工厂会议室屏蔽仪 | 罗茨真空机组,立式无油往复真空泵,2BV水环真空泵-力侨真空科技 | 立式壁挂广告机厂家-红外电容触摸一体机价格-华邦瀛 | 乳化沥青设备_改性沥青设备_沥青加温罐_德州市昊通路桥工程有限公司 | POS机办理_个人pos机免费领取-银联pos机申请首页 | 宁夏活性炭_防护活性炭_催化剂载体炭-宁夏恒辉活性炭有限公司 | 不锈钢/气体/液体玻璃转子流量计(防腐,选型,规格)-常州天晟热工仪表有限公司【官网】 | 小型UV打印机-UV平板打印机-大型uv打印机-UV打印机源头厂家 |松普集团 | 土壤养分检测仪_肥料养分检测仪_土壤水分检测仪-山东莱恩德仪器 大型多片锯,圆木多片锯,方木多片锯,板材多片锯-祥富机械有限公司 | 液氨泵,液化气泵-淄博「亚泰」燃气设备制造有限公司 | elisa试剂盒价格-酶联免疫试剂盒-猪elisa试剂盒-上海恒远生物科技有限公司 | 天津次氯酸钠酸钙溶液-天津氢氧化钠厂家-天津市辅仁化工有限公司 | ?水马注水围挡_塑料注水围挡_防撞桶-常州瑞轩水马注水围挡有限公司 | 滚筒线,链板线,总装线,流水线-上海体能机电有限公司 | wika威卡压力表-wika压力变送器-德国wika代理-威卡总代-北京博朗宁科技 | 复盛空压机配件-空气压缩机-复盛空压机(华北)总代理 | 顶空进样器-吹扫捕集仪-热脱附仪-二次热解吸仪-北京华盛谱信仪器 | 成都顶呱呱信息技术有限公司-贷款_个人贷款_银行贷款在线申请 - 成都贷款公司 | 铝扣板-铝方通-铝格栅-铝条扣板-铝单板幕墙-佳得利吊顶天花厂家 elisa试剂盒价格-酶联免疫试剂盒-猪elisa试剂盒-上海恒远生物科技有限公司 | 回转窑-水泥|石灰|冶金-巩义市瑞光金属制品有限责任公司 | 校园文化空间设计-数字化|中医文化空间设计-党建|法治廉政主题文化空间施工-山东锐尚文化传播公司 | 河北码上网络科技|邯郸小程序开发|邯郸微信开发|邯郸网站建设 | 金属波纹补偿器厂家_不锈钢膨胀节价格_非金属伸缩节定制-庆达补偿器 | 涿州网站建设_网站设计_网站制作_做网站_固安良言多米网络公司 |