Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers

Ken Brown kbrown@cornell.edu
Wed Apr 1 16:15:28 GMT 2020


On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote:
>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote:
>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes
>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor.
>>>>>>>>>>>>
>>>>>>>>>>>> This is consistent with my guess that the error is generated
>>>>>>>>>>>> by fhandler_fifo::wait.  I have a feeling that read_ready
>>>>>>>>>>>> should have been created as a manual-reset event, and that
>>>>>>>>>>>> more care is needed to make sure it's set when it should be.
>>>>>>>>>>>>
>>>>>>>>>>>>> I could provide a code-snippet to reproduce it if wanted ?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, please!
>>>>>>>>>>>
>>>>>>>>>>> That might not be necessary.  If you're able to build the git
>>>>>>>>>>> repo master branch, please try the attached patch.
>>>>>>>>>
>>>>>>>>>> Here's a better patch.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I finally succeeded to build latest master (make is not my
>>>>>>>>> favourite
>>>>>>>>> tool) and added the patch, but still no success in my little
>>>>>>>>> test-program (see
>>>>>>>>> attachment) when creating a write-file-descriptor with
>>>>>>>>> O_NONBLOCK
>>>>>>>
>>>>>>>> Your test program fails for me on Linux too.  Here's the output
>>>>>>>> from one
>>>>>>> run:
>>>>>>>
>>>>>>> You're right. That was extremely careless of me to not test this
>>>>>>> in Linux first :-)
>>>>>>
>>>>>> No problem.
>>>>>>
>>>>>>> I can assure that we have a use case that works on Linux but not
>>>>>>> in Cygwin, but it seems like I failed to narrow it down in the
>>>>>>> wrong way
>>>>>>>
>>>>>>> I'll try to rearrange my code (that works in Linux) to mimic our
>>>>>>> application but in a simple way (I'll be back)
>>>>>>
>>>>>> OK, I'll be waiting for you.  BTW, if it's not too hard to write
>>>>>> your test case in plain C, or at least less modern C++, that would
>>>>>> simplify things for me.  For example, your pipe.cpp failed to
>>>>>> compile on one Linux machine I wanted to test it on, presumably
>>>>>> because that
>>> machine had an older C++ compiler.
>>>>>
>>>>> Never mind.  I was able to reproduce the problem and find the cause.
>>>>> What happens is that when the first subprocess exits,
>>>>> fhandler_fifo::close resets read_ready.  That causes the second and
>>>>> subsequent subprocesses to think that there's no reader open, so
>>>>> their attempts to open a writer with O_NONBLOCK fail with ENXIO.
>>>>>
>>>>> I should be able to fix this tomorrow.
>>>
>>>> I've pushed what I think is a fix to the topic/fifo branch.  I tested
>>>> it
>>> with the attached program, which is a variant of the test case you
>>> sent last week.
>>>> Please test it in your use case.
>>>
>>>> Note: If you've previously pulled the topic/fifo branch, then you
>>>> will
>>> probably get a lot of conflicts when you pull again, because I did a
>>> forced push a few days ago.  If that happens, just do
>>>
>>>>     git reset --hard origin/topic/fifo
>>>
>>>> It turned out that the fix required some of the ideas that I've been
>>> working on in connection with allowing multiple readers.  Even though
>>> the code allows a FIFO to be *explicitly* opened for reading only
>>> once, there can still be several open file descriptors for readers
>>> because of dup and fork.  The existing code on git master doesn't
>>> handle those situations properly.
>>>
>>>> The code on topic/fifo doesn't completely fix that yet, but I think
>>>> it
>>> should work under the following assumptions:
>>>
>>>> 1. The FIFO is opened only once for reading.
>>>
>>>> 2. The file descriptor obtained from this is the only one on which a
>>>> read
>>> is attempted.
>>>
>>>> I'm working on removing both of these restrictions.
>>>
>>>> Ken
>>>
>>> We finally took the time to make some kind of a simplified "hack" that
>>> works on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave
> "ENXIO"
>>> now and then but with your previous patch attached, there was no ENXIO
>>> but ::read returns EAGIN (until exhausted) (with cygwin) almost every
>>> run
>>>
>>> I will try your newest things tomorrow
>>>
>>> See latest attatched test-program (starts to get bloated but this time
>>> more C-compatible though:-)
>>
>> Thanks.  This runs fine with the current HEAD of topic/fifo.
> 
> I wrote in a previous mail in this topic that it seemed to work fine for me
> as well, but when I bumped up the numbers of writers and/or the number of
> messages (e.g. 25/25) it starts to fail again
> 
> The initial thought is that we're bumping into some kind of system resource
> limit, but I haven't had the time to dig into details (yet) (I'm sorry for
> that)

Yes, it is a resource issue.  There is a limit on the number of writers that can 
be open at one time, currently 64.  I chose that number arbitrarily, with no 
idea what might actually be needed in practice, and it can easily be changed.

In addition, a writer isn't recognized as closed until a reader tries to read 
and gets an error.  In your example with 25/25, the list of writers quickly gets 
to 64 before the parent ever tries to read.

I'll see if I can find a better way to manage this.

Ken


More information about the Cygwin mailing list