pbootcms网站模板|日韩1区2区|织梦模板||网站源码|日韩1区2区|jquery建站特效-html5模板网

為什么編譯器不合并冗余的 std::atomic 寫入?

Why don#39;t compilers merge redundant std::atomic writes?(為什么編譯器不合并冗余的 std::atomic 寫入?)
本文介紹了為什么編譯器不合并冗余的 std::atomic 寫入?的處理方法,對(duì)大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我想知道為什么沒有編譯器準(zhǔn)備將相同值的連續(xù)寫入合并到單個(gè)原子變量,例如:

I'm wondering why no compilers are prepared to merge consecutive writes of the same value to a single atomic variable, e.g.:

#include <atomic>
std::atomic<int> y(0);
void f() {
  auto order = std::memory_order_relaxed;
  y.store(1, order);
  y.store(1, order);
  y.store(1, order);
}

我嘗試過的每個(gè)編譯器都會(huì)發(fā)出上述寫入的 3 次.哪個(gè)合法的、無種族的觀察者可以看到上述代碼與經(jīng)過一次寫入的優(yōu)化版本之間的差異(即as-if"規(guī)則不適用)?

Every compiler I've tried will issue the above write three times. What legitimate, race-free observer could see a difference between the above code and an optimized version with a single write (i.e. doesn't the 'as-if' rule apply)?

如果變量是可變的,那么顯然沒有優(yōu)化是適用的.在我的情況下是什么阻止了它?

If the variable had been volatile, then obviously no optimization is applicable. What's preventing it in my case?

這是編譯器資源管理器中的代碼.

推薦答案

C++11/C++14 標(biāo)準(zhǔn)編寫確實(shí)允許將三個(gè)商店折疊/合并為一個(gè)商店的最終值.即使在這樣的情況下:

The C++11 / C++14 standards as written do allow the three stores to be folded/coalesced into one store of the final value. Even in a case like this:

  y.store(1, order);
  y.store(2, order);
  y.store(3, order); // inlining + constant-folding could produce this in real code

該標(biāo)準(zhǔn)保證在 y 上旋轉(zhuǎn)的觀察者(使用原子負(fù)載或 CAS)將永遠(yuǎn)看到 y == 2.依賴于此的程序?qū)⒕哂袛?shù)據(jù)競(jìng)爭(zhēng)錯(cuò)誤,但只有普通錯(cuò)誤類型的競(jìng)爭(zhēng),而不是 C++ 未定義行為類型的數(shù)據(jù)競(jìng)爭(zhēng).(它只是帶有非原子變量的 UB).一個(gè)希望有時(shí)看到它的程序甚至不一定有缺陷.(見下文:進(jìn)度條.)

The standard does not guarantee that an observer spinning on y (with an atomic load or CAS) will ever see y == 2. A program that depended on this would have a data race bug, but only the garden-variety bug kind of race, not the C++ Undefined Behaviour kind of data race. (It's UB only with non-atomic variables). A program that expects to sometimes see it is not necessarily even buggy. (See below re: progress bars.)

在 C++ 抽象機(jī)器上可能的任何排序都可以(在編譯時(shí))被選為 總是 發(fā)生的排序.這是實(shí)際中的 as-if 規(guī)則.在這種情況下,好像所有三個(gè)存儲(chǔ)都以全局順序背靠背發(fā)生,在 y=1y=3.

Any ordering that's possible on the C++ abstract machine can be picked (at compile time) as the ordering that will always happen. This is the as-if rule in action. In this case, it's as if all three stores happened back-to-back in the global order, with no loads or stores from other threads happening between the y=1 and y=3.

它不依賴于目標(biāo)架構(gòu)或硬件;就像編譯時(shí)重新排序一樣,即使在以強(qiáng)序 x86 為目標(biāo).編譯器不必保留您在考慮要編譯的硬件時(shí)可能期望的任何內(nèi)容,因此您需要障礙.屏障可以編譯成零匯編指令.

It doesn't depend on the target architecture or hardware; just like compile-time reordering of relaxed atomic operations are allowed even when targeting strongly-ordered x86. The compiler doesn't have to preserve anything you might expect from thinking about the hardware you're compiling for, so you need barriers. The barriers may compile into zero asm instructions.

這是一個(gè)實(shí)施質(zhì)量問題,可能會(huì)改變?cè)谡鎸?shí)硬件上觀察到的性能/行為.

It's a quality-of-implementation issue, and can change observed performance / behaviour on real hardware.

最明顯的問題是進(jìn)度條.將存儲(chǔ)從循環(huán)(不包含其他原子操作)中取出并將它們?nèi)空郫B為一個(gè)將導(dǎo)致進(jìn)度條保持在 0,然后在最后變?yōu)?100%.

The most obvious case where it's a problem is a progress bar. Sinking the stores out of a loop (that contains no other atomic operations) and folding them all into one would result in a progress bar staying at 0 and then going to 100% right at the end.

沒有 C++11 std::atomic 方法可以阻止他們?cè)谀悴幌胍那闆r下這樣做,所以現(xiàn)在編譯器只需選擇永遠(yuǎn)不要將多個(gè)原子操作合并為一個(gè).(將它們?nèi)亢喜橐粋€(gè)操作不會(huì)改變它們相對(duì)于彼此的順序.)

There's no C++11 std::atomic way to stop them from doing it in cases where you don't want it, so for now compilers simply choose never to coalesce multiple atomic operations into one. (Coalescing them all into one operation doesn't change their order relative to each other.)

編譯器編寫者已經(jīng)正確地注意到,程序員期望每次源代碼執(zhí)行 y.store() 時(shí),原子存儲(chǔ)實(shí)際上會(huì)發(fā)生在內(nèi)存中.(請(qǐng)參閱此問題的大多數(shù)其他答案,這些答案聲稱商店需要單獨(dú)發(fā)生,因?yàn)榭赡艿淖x者等待看到中間值.)即它違反了 最小驚喜原則.

Compiler-writers have correctly noticed that programmers expect that an atomic store will actually happen to memory every time the source does y.store(). (See most of the other answers to this question, which claim the stores are required to happen separately because of possible readers waiting to see an intermediate value.) i.e. It violates the principle of least surprise.

但是,在某些情況下它會(huì)非常有用,例如避免在循環(huán)中使用無用的 shared_ptr ref count inc/dec.

However, there are cases where it would be very helpful, for example avoiding useless shared_ptr ref count inc/dec in a loop.

顯然,任何重新排序或合并都不能違反任何其他排序規(guī)則.例如,num++;num--; 仍然必須完全阻止運(yùn)行時(shí)和編譯時(shí)重新排序,即使它不再觸及 num 處的內(nèi)存.

Obviously any reordering or coalescing can't violate any other ordering rules. For example, num++; num--; would still have to be full barrier to runtime and compile-time reordering, even if it no longer touched the memory at num.

正在討論擴(kuò)展 std::atomic API 以讓程序員控制此類優(yōu)化,此時(shí)編譯器將能夠在有用時(shí)進(jìn)行優(yōu)化,從而即使在并非故意低效的精心編寫的代碼中也可能發(fā)生.以下工作組討論/提案鏈接中提到了一些有用的優(yōu)化案例示例:

Discussion is under way to extend the std::atomic API to give programmers control of such optimizations, at which point compilers will be able to optimize when useful, which can happen even in carefully-written code that isn't intentionally inefficient. Some examples of useful cases for optimization are mentioned in the following working-group discussion / proposal links:

  • http://wg21.link/n4455:N4455 沒有健全的編譯器會(huì)優(yōu)化原子
  • http://wg21.link/p0062:WG21/P0062R1:編譯器應(yīng)該何時(shí)優(yōu)化原子?莉>
  • http://wg21.link/n4455: N4455 No Sane Compiler Would Optimize Atomics
  • http://wg21.link/p0062: WG21/P0062R1: When should compilers optimize atomics?

另請(qǐng)參閱 Richard Hodges 對(duì) int num"的 num++ 可以是原子的嗎?(見評(píng)論).另請(qǐng)參閱同一問題的我的回答的最后一部分,我更詳細(xì)地論證了允許這種優(yōu)化.(在此簡(jiǎn)短,因?yàn)槟切?C++ 工作組鏈接已經(jīng)承認(rèn)當(dāng)前編寫的標(biāo)準(zhǔn)確實(shí)允許這樣做,而且當(dāng)前的編譯器只是沒有故意優(yōu)化.)

See also discussion about this same topic on Richard Hodges' answer to Can num++ be atomic for 'int num'? (see the comments). See also the last section of my answer to the same question, where I argue in more detail that this optimization is allowed. (Leaving it short here, because those C++ working-group links already acknowledge that the current standard as written does allow it, and that current compilers just don't optimize on purpose.)

在當(dāng)前標(biāo)準(zhǔn)中,volatile atomic;y 將是確保不允許對(duì)其進(jìn)行優(yōu)化的一種方法.(正如 Herb Sutter 在 SO 答案中指出的,volatileatomic 已經(jīng)共享了一些需求,但它們是不同的).另請(qǐng)參閱 std::memory_ordervolatile 在 cppreference 上.

Within the current standard, volatile atomic<int> y would be one way to ensure that stores to it are not allowed to be optimized away. (As Herb Sutter points out in an SO answer, volatile and atomic already share some requirements, but they are different). See also std::memory_order's relationship with volatile on cppreference.

對(duì) volatile 對(duì)象的訪問不允許被優(yōu)化掉(因?yàn)樗鼈兛赡苁莾?nèi)存映射的 IO 寄存器,例如).

Accesses to volatile objects are not allowed to be optimized away (because they could be memory-mapped IO registers, for example).

使用 volatile atomic 主要修復(fù)了進(jìn)度條問題,但如果/當(dāng) C++ 決定使用不同的語(yǔ)法來控制優(yōu)化以便編譯器使用不同的語(yǔ)法時(shí),它有點(diǎn)丑陋并且可能在幾年后看起來很傻可以開始實(shí)踐了.

Using volatile atomic<T> mostly fixes the progress-bar problem, but it's kind of ugly and might look silly in a few years if/when C++ decides on different syntax for controlling optimization so compilers can start doing it in practice.

我認(rèn)為我們可以確信編譯器不會(huì)開始進(jìn)行這種優(yōu)化,除非有一種方法可以控制它.希望它是某種選擇加入(如 memory_order_release_coalesce),在編譯為 C++ 時(shí)不會(huì)改變現(xiàn)有代碼 C++11/14 代碼的行為.但它可能類似于 wg21/p0062 中的提議:使用 [[brittle_atomic]] 標(biāo)記不優(yōu)化案例.

I think we can be confident that compilers won't start doing this optimization until there's a way to control it. Hopefully it will be some kind of opt-in (like a memory_order_release_coalesce) that doesn't change the behaviour of existing code C++11/14 code when compiled as C++whatever. But it could be like the proposal in wg21/p0062: tag don't-optimize cases with [[brittle_atomic]].

wg21/p0062 警告說,即使 volatile atomic 也不能解決所有問題,因此不鼓勵(lì)將其用于此目的.它給出了這個(gè)例子:

wg21/p0062 warns that even volatile atomic doesn't solve everything, and discourages its use for this purpose. It gives this example:

if(x) {
    foo();
    y.store(0);
} else {
    bar();
    y.store(0);  // release a lock before a long-running loop
    for() {...} // loop contains no atomics or volatiles
}
// A compiler can merge the stores into a y.store(0) here.

即使使用 volatile atomicy,允許編譯器從 if/else 中提取 y.store() 并且只做一次,因?yàn)樗匀恢蛔?1存儲(chǔ)相同的值.(這將在 else 分支中的長(zhǎng)循環(huán)之后).特別是如果商店只是 relaxedrelease 而不是 seq_cst.

Even with volatile atomic<int> y, a compiler is allowed to sink the y.store() out of the if/else and just do it once, because it's still doing exactly 1 store with the same value. (Which would be after the long loop in the else branch). Especially if the store is only relaxed or release instead of seq_cst.

volatile 確實(shí)停止了問題中討論的合并,但這指出 atomic<> 上的其他優(yōu)化對(duì)于實(shí)際性能也可能存在問題.

volatile does stop the coalescing discussed in the question, but this points out that other optimizations on atomic<> can also be problematic for real performance.

不優(yōu)化的其他原因包括:沒有人編寫復(fù)雜的代碼來允許編譯器安全地進(jìn)行這些優(yōu)化(而不會(huì)出錯(cuò)).這還不夠,因?yàn)?N4455 表示 LLVM 已經(jīng)實(shí)現(xiàn)或可以輕松實(shí)現(xiàn)它提到的幾個(gè)優(yōu)化.

Other reasons for not optimizing include: nobody's written the complicated code that would allow the compiler to do these optimizations safely (without ever getting it wrong). This is not sufficient, because N4455 says LLVM already implements or could easily implement several of the optimizations it mentioned.

不過,讓程序員感到困惑的原因當(dāng)然是有道理的.無鎖代碼一開始就很難正確編寫.

The confusing-for-programmers reason is certainly plausible, though. Lock-free code is hard enough to write correctly in the first place.

不要隨意使用原子武器:它們并不便宜,也沒有進(jìn)行太多優(yōu)化(目前根本沒有).但是,使用 std::shared_ptr<T> 避免冗余原子操作并不總是那么容易,因?yàn)樗鼪]有非原子版本(盡管 這里的一個(gè)答案給出了一個(gè)簡(jiǎn)單的方法為 gcc 定義一個(gè) shared_ptr_unsynchronized).

Don't be casual in your use of atomic weapons: they aren't cheap and don't optimize much (currently not at all). It's not always easy easy to avoid redundant atomic operations with std::shared_ptr<T>, though, since there's no non-atomic version of it (although one of the answers here gives an easy way to define a shared_ptr_unsynchronized<T> for gcc).

這篇關(guān)于為什么編譯器不合并冗余的 std::atomic 寫入?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請(qǐng)聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

What is the fastest way to transpose a matrix in C++?(在 C++ 中轉(zhuǎn)置矩陣的最快方法是什么?)
Sorting zipped (locked) containers in C++ using boost or the STL(使用 boost 或 STL 在 C++ 中對(duì)壓縮(鎖定)容器進(jìn)行排序)
Rotating a point about another point (2D)(圍繞另一個(gè)點(diǎn)旋轉(zhuǎn)一個(gè)點(diǎn) (2D))
Image Processing: Algorithm Improvement for #39;Coca-Cola Can#39; Recognition(圖像處理:Coca-Cola Can 識(shí)別的算法改進(jìn))
How do I construct an ISO 8601 datetime in C++?(如何在 C++ 中構(gòu)建 ISO 8601 日期時(shí)間?)
Sort list using STL sort function(使用 STL 排序功能對(duì)列表進(jìn)行排序)
主站蜘蛛池模板: 西宁装修_西宁装修公司-西宁业之峰装饰-青海业之峰墅级装饰设计公司【官网】 | 档案密集架_电动密集架_移动密集架_辽宁档案密集架-盛隆柜业厂家现货批发销售价格公道 | 圣才学习网-考研考证学习平台,提供万种考研考证电子书、题库、视频课程等考试资料 | 世界箱包品牌十大排名,女包小众轻奢品牌推荐200元左右,男包十大奢侈品牌排行榜双肩,学生拉杆箱什么品牌好质量好 - Gouwu3.com | 广东恩亿梯电源有限公司【官网】_UPS不间断电源|EPS应急电源|模块化机房|电动汽车充电桩_UPS电源厂家(恩亿梯UPS电源,UPS不间断电源,不间断电源UPS) | 动物麻醉机-数显脑立体定位仪-北京易则佳科技有限公司 | 安全,主动,被动,柔性,山体滑坡,sns,钢丝绳,边坡,防护网,护栏网,围栏,栏杆,栅栏,厂家 - 护栏网防护网生产厂家 | 航空铝型材,7系铝型材挤压,硬质阳*氧化-余润铝制品 | 焦作网 WWW.JZRB.COM | 学叉车培训|叉车证报名|叉车查询|叉车证怎么考-工程机械培训网 | 皮带机_移动皮带机_大倾角皮带机_皮带机厂家 - 新乡市国盛机械设备有限公司 | 【星耀裂变】_企微SCRM_任务宝_视频号分销裂变_企业微信裂变增长_私域流量_裂变营销 | 酒糟烘干机-豆渣烘干机-薯渣烘干机-糟渣烘干设备厂家-焦作市真节能环保设备科技有限公司 | 代理记账_免费注册公司_营业执照代办_资质代办-【乐财汇】 | 防爆正压柜厂家_防爆配电箱_防爆控制箱_防爆空调_-盛通防爆 | 登车桥动力单元-非标液压泵站-非标液压系统-深圳市三好科技有限公司 | 黄石妇科医院_黄石东方女子医院_黄石东方妇产医院怎么样 | 专业生物有机肥造粒机,粉状有机肥生产线,槽式翻堆机厂家-郑州华之强重工科技有限公司 | 雷冲击高压发生器-水内冷直流高压发生器-串联谐振分压器-武汉特高压电力科技有限公司 | 最新范文网_实用的精品范文美文网 | 防爆正压柜厂家_防爆配电箱_防爆控制箱_防爆空调_-盛通防爆 | 继电器模组-IO端子台-plc连接线-省配线模组厂家-世麦德 | 考勤系统_考勤管理系统_网络考勤软件_政企|集团|工厂复杂考勤工时统计排班管理系统_天时考勤 | 回转炉,外热式回转窑,回转窑炉-淄博圣元窑炉工程有限公司 | 档案密集架_电动密集架_移动密集架_辽宁档案密集架-盛隆柜业厂家现货批发销售价格公道 | 光伏家 - 太阳能光伏发电_分布式光伏发电_太阳能光伏网 | 钢衬四氟管道_钢衬四氟直管_聚四氟乙烯衬里管件_聚四氟乙烯衬里管道-沧州汇霖管道科技有限公司 | 热工多功能信号校验仪-热电阻热电偶校验仿真仪-金湖虹润仪表 | 北京燃气公司 用户服务中心 | 耐磨陶瓷,耐磨陶瓷管道_厂家-淄博拓创陶瓷科技 | 泰国专线_泰国物流专线_广州到泰国物流公司-泰廊曼国际 | 北京康百特科技有限公司-分子蒸馏-短程分子蒸馏设备-实验室分子蒸馏设备 | 酒瓶_酒杯_玻璃瓶生产厂家_徐州明政玻璃制品有限公司 | 学校用栓剂模,玻璃瓶轧盖钳,小型安瓿熔封机,实验室安瓿熔封机-长沙中亚制药设备有限公司 | ptc_浴霸_大巴_干衣机_呼吸机_毛巾架_电动车加热器-上海帕克 | 河北中仪伟创试验仪器有限公司是专业生产沥青,土工,水泥,混凝土等试验仪器的厂家,咨询电话:13373070969 | 空调风机,低噪声离心式通风机,不锈钢防爆风机,前倾皮带传动风机,后倾空调风机-山东捷风风机有限公司 | 淘气堡_室内儿童乐园_户外无动力儿童游乐设备-高乐迪(北京) | 亚克力制品定制,上海嘉定有机玻璃加工制作生产厂家—官网 | 储能预警-储能消防系统-电池舱自动灭火装置-四川千页科技股份有限公司官网 | 广州展台特装搭建商|特装展位设计搭建|展会特装搭建|特装展台制作设计|展览特装公司 |