Limited number of strand implementations leads to horrible bottleneck
(too old to reply)
2016-10-28 11:49:21 UTC
boost::asio and master version of asio suffer from the design choice with the limited number of strand implementations (num_implementations).
in the master of asio I see a great progress in the strand_executor_service::strand, where different strategy was used: limited number of mutexes instead of strands. My question is why didn't you update the strategy of original strand_service::strand?
The original strategy leads to terrible consequences when your application has two features:     * it creates new strands constantly     * it may perform some long work withing a strand

Here is a test case that shows the issue:

#include <asio/io_service.hpp> #include <asio/strand.hpp> #include <atomic> #include <functional> #include <iostream> #include <thread>
std::atomic<bool> running{true}; std::atomic<int> counter{0};
struct Work {     Work(asio::io_service & io_service)         : _strand(io_service)     { }
    static void start_the_work(asio::io_service & io_service)     {         std::shared_ptr<Work> _this(new Work(io_service));
        _this->_strand.get_io_service().post(_this->_strand.wrap(std::bind(do_the_work, _this)));     }
    static void do_the_work(std::shared_ptr<Work> _this)     {         counter.fetch_add(1, std::memory_order_relaxed);
        if (running.load(std::memory_order_relaxed)) {             start_the_work(_this->_strand.get_io_service());         }     }
    asio::io_service::strand _strand; };
struct BlockingWork {     BlockingWork(asio::io_service & io_service)         : _strand(io_service)     { }
    static void start_the_work(asio::io_service & io_service)     {         std::shared_ptr<BlockingWork> _this(new BlockingWork(io_service));
         _this->_strand.get_io_service().post(_this->_strand.wrap(std::bind(do_the_work, _this)));     }
    static void do_the_work(std::shared_ptr<BlockingWork> _this)     {         sleep(5);     }
    asio::io_service::strand _strand; };

int main(int argc, char ** argv) {     asio::io_service io_service;     std::unique_ptr<asio::io_service::work> work{new asio::io_service::work(io_service)};
    for (std::size_t i = 0; i < 1000; ++i) {         Work::start_the_work(io_service);     }
    std::vector<std::thread> workers;
    for (std::size_t i = 0; i < 8; ++i) {         workers.push_back(std::thread([&io_service] {             io_service.run();         }));     }
    if (argc > 1) {         std::cout << "Spawning a blocking work" << std::endl;         workers.push_back(std::thread([&io_service] {             io_service.run();         }));         BlockingWork::start_the_work(io_service);     }
    sleep(5);     running = false;     work.reset();
    for (auto && worker : workers) {         worker.join();     }
    std::cout << "Work performed:" << counter.load() << std::endl;     return 0; }

Test run in a usual way:
time ./asio_strand_test_case  Work performed:3183957
real    0m5.008s user    0m15.224s sys     0m3.332s

Test run with a long blocking work:
time ./asio_strand_test_case 1 Spawning a blocking work Work performed:195189
real    0m5.024s user    0m0.920s sys     0m0.164s

I have updated strategy of the former strand_service::strand to match strand_executor_service::strand and it removed the bottleneck completely. 
Can I hope that for the updates to the strand_service::strand in the nearest release?