ostream::operator<<() and sputn()

Wed Jul 14 21:30:48 GMT 2021

On Wed, 14 Jul 2021 at 22:26, Lewis Hyatt via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> Hello-
>
> I noticed that libstdc++'s implementation of ostream::operator<<() prefers
> to call sputn() on the underlying streambuf for all char, char*, and string
> output operations, including single characters, rather than manipulate the
> buffer directly. I am curious why it works this way, it feels perhaps
> suboptimal to me because sputn() is mandated to call the virtual function
> xsputn() on every call, while e.g. sputc() simply manipulates the buffer and
> only needs a virtual call when the buffer is full. I always thought that the
> buffer abstraction and the resulting avoidance of virtual calls for the
> majority of operations was the main point of streambuf's design, and that
> sputn() was meant for cases when the output would be large enough to
> overflow the buffer anyway, if it may be possible to skip the buffer and
> flush directly instead?
>
> It seems to me that for most typical use cases, xsputn() is still going to
> want to use the buffer if the output fits into it; libstdc++ does this in
> basic_filebuf, for example. So then it would seem to be beneficial to try
> the buffer prior to making the virtual function call, instead of after --
> especially because the typical char instantiation of __ostream_insert that
> makes this call for operator<<() is hidden inside the .so, and is not
> inlined or eligible for devirtualization optimizations.
>
> FWIW, here is a small test case.
>
> ---------
> #include <ostream>
> #include <iostream>
> #include <fstream>
> #include <sstream>
> #include <chrono>
> #include <random>
> using namespace std;
>
> int main() {
>     constexpr size_t N = 500000000;
>     string s(N, 'x');
>
>     ofstream of{"/dev/null"};
>     ostringstream os;
>     ostream* streams[] = {&of, &os};
>     mt19937 rng{random_device{}()};
>
>     const auto timed_run = [&](const char* label, auto&& callback) {
>         const auto t1 = chrono::steady_clock::now();
>         for(char c: s) callback(*streams[rng() % 2], c);
>         const auto t2 = chrono::steady_clock::now();
>         cout << label << " took: "
>              << chrono::duration<double>(t2-t1).count()
>              << " seconds" << endl;
>     };
>
>     timed_run("insert with put()", [](ostream& o, char c) {o.put(c);});
>     timed_run("insert with op<< ", [](ostream& o, char c) {o << c;});
> }
> ---------
>
> This is what I get with the current trunk:
> ---------
> insert with put() took: 6.12152 seconds
> insert with op<<  took: 13.4437 seconds
> ---------
>
> And this is what I get with the attached patch:
> ---------
> insert with put() took: 6.08313 seconds
> insert with op<<  took: 8.24565 seconds
> ---------
>
> So the overhead of calling operator<< vs calling put() was reduced by more
> than 3X.
>
> The prototype patch calls an internal alternate to sputn(), which tries the
> buffer prior to calling xsputn().

This won't work if a user provides an explicit specialization of
basic_streambuf<char, MyTraits>. std::basic_ostream<char, MyTraits>
will still try to call your new function, but it won't be present in
the user's specialization, so will fail to compile. The basic_ostream
primary template can only use the standard API of basic_streambuf. The
std::basic_ostream<char> specialization can use non-standard members
of std::basic_streambuf<char> because we know users can't specialize
that.