ostream::operator<<() and sputn()

Lewis Hyatt lhyatt@gmail.com
Wed Jul 14 21:26:09 GMT 2021


Hello-

I noticed that libstdc++'s implementation of ostream::operator<<() prefers
to call sputn() on the underlying streambuf for all char, char*, and string
output operations, including single characters, rather than manipulate the
buffer directly. I am curious why it works this way, it feels perhaps
suboptimal to me because sputn() is mandated to call the virtual function
xsputn() on every call, while e.g. sputc() simply manipulates the buffer and
only needs a virtual call when the buffer is full. I always thought that the
buffer abstraction and the resulting avoidance of virtual calls for the
majority of operations was the main point of streambuf's design, and that
sputn() was meant for cases when the output would be large enough to
overflow the buffer anyway, if it may be possible to skip the buffer and
flush directly instead?

It seems to me that for most typical use cases, xsputn() is still going to
want to use the buffer if the output fits into it; libstdc++ does this in
basic_filebuf, for example. So then it would seem to be beneficial to try
the buffer prior to making the virtual function call, instead of after --
especially because the typical char instantiation of __ostream_insert that
makes this call for operator<<() is hidden inside the .so, and is not
inlined or eligible for devirtualization optimizations.

FWIW, here is a small test case.

---------
#include <ostream>
#include <iostream>
#include <fstream>
#include <sstream>
#include <chrono>
#include <random>
using namespace std;

int main() {
    constexpr size_t N = 500000000;
    string s(N, 'x');

    ofstream of{"/dev/null"};
    ostringstream os;
    ostream* streams[] = {&of, &os};
    mt19937 rng{random_device{}()};

    const auto timed_run = [&](const char* label, auto&& callback) {
        const auto t1 = chrono::steady_clock::now();
        for(char c: s) callback(*streams[rng() % 2], c);
        const auto t2 = chrono::steady_clock::now();
        cout << label << " took: "
             << chrono::duration<double>(t2-t1).count()
             << " seconds" << endl;
    };

    timed_run("insert with put()", [](ostream& o, char c) {o.put(c);});
    timed_run("insert with op<< ", [](ostream& o, char c) {o << c;});
}
---------

This is what I get with the current trunk:
---------
insert with put() took: 6.12152 seconds
insert with op<<  took: 13.4437 seconds
---------

And this is what I get with the attached patch:
---------
insert with put() took: 6.08313 seconds
insert with op<<  took: 8.24565 seconds
---------

So the overhead of calling operator<< vs calling put() was reduced by more
than 3X.

The prototype patch calls an internal alternate to sputn(), which tries the
buffer prior to calling xsputn(). All the tests still pass with this patch
in place, but I think it would not be suitable as-is, because it affects the
overload for std::string_view, and I think the standard may actually mandate
that xsputn() be called always for this case. (It says it should be output
as-if sputn() were called, which I think makes xsputn() should be an
observable side-effect.) Still it would work fine outside the string case
AFAIK, and I thought it was maybe an interesting result worth sharing, and
am curious what others think here, or if there is a reason I am missing why
it's better the current way. Thanks!

-Lewis

-------------- next part --------------
diff --git a/libstdc++-v3/include/bits/ostream.tcc b/libstdc++-v3/include/bits/ostream.tcc
index 06b2217d216..614b347736b 100644
--- a/libstdc++-v3/include/bits/ostream.tcc
+++ b/libstdc++-v3/include/bits/ostream.tcc
@@ -195,7 +195,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  ios_base::iostate __err = ios_base::goodbit;
 	  __try
 	    {
-	      if (this->rdbuf()->sputn(__s, __n) != __n)
+	      if (this->rdbuf()->__buffered_sputn(__s, __n) != __n)
 		__err = ios_base::badbit;
 	    }
 	  __catch(__cxxabiv1::__forced_unwind&)
diff --git a/libstdc++-v3/include/bits/ostream_insert.h b/libstdc++-v3/include/bits/ostream_insert.h
index 72eea8b1d05..650ea1afd9e 100644
--- a/libstdc++-v3/include/bits/ostream_insert.h
+++ b/libstdc++-v3/include/bits/ostream_insert.h
@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       typedef basic_ostream<_CharT, _Traits>       __ostream_type;      
       typedef typename __ostream_type::ios_base    __ios_base;
 
-      const streamsize __put = __out.rdbuf()->sputn(__s, __n);
+      const streamsize __put = __out.rdbuf()->__buffered_sputn(__s, __n);
       if (__put != __n)
 	__out.setstate(__ios_base::badbit);
     }
diff --git a/libstdc++-v3/include/bits/streambuf.tcc b/libstdc++-v3/include/bits/streambuf.tcc
index 22464c4401c..7e803e4fc08 100644
--- a/libstdc++-v3/include/bits/streambuf.tcc
+++ b/libstdc++-v3/include/bits/streambuf.tcc
@@ -108,6 +108,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       return __ret;
     }
 
+  template<typename _CharT, typename _Traits>
+    streamsize
+    basic_streambuf<_CharT, _Traits>::
+    __buffered_sputn(const char_type* __s, streamsize __n)
+  {
+    const streamsize __buf_len = this->epptr() - this->pptr();
+    if (__buf_len >= __n)
+      {
+	traits_type::copy(this->pptr(), __s, __n);
+	this->__safe_pbump(__n);
+	return __n;
+      }
+    else
+      return this->xsputn(__s, __n);
+  }
+
   // Conceivably, this could be used to implement buffer-to-buffer
   // copies, if this was ever desired in an un-ambiguous way by the
   // standard.
diff --git a/libstdc++-v3/include/std/streambuf b/libstdc++-v3/include/std/streambuf
index e213104c4f4..a1a2277adc7 100644
--- a/libstdc++-v3/include/std/streambuf
+++ b/libstdc++-v3/include/std/streambuf
@@ -455,6 +455,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       sputn(const char_type* __s, streamsize __n)
       { return this->xsputn(__s, __n); }
 
+      /**
+       *  @brief  Entry point for all single-character output functions.
+       *  @param  __s  A buffer read area.
+       *  @param  __n  A count.
+       *
+       *  Internal output function.
+       *
+       *
+       *  Has the same effect as sputn(__s,__n), but attempts to use the
+       *  existing buffer space rather than make a virtual call, if possible.
+      */
+      streamsize
+      __buffered_sputn(const char_type* __s, streamsize __n);
+
     protected:
       /**
        *  @brief  Base constructor.


More information about the Libstdc++ mailing list