Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers
Ken Brown
kbrown@cornell.edu
Wed Apr 1 18:34:12 GMT 2020
On 4/1/2020 1:14 PM, sten.kristian.ivarsson@gmail.com wrote:
>> On 4/1/2020 4:52 AM, sten.kristian.ivarsson@gmail.com wrote:
>>>> On 3/31/2020 5:10 PM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
>>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivarsson@gmail.com wrote:
>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes
>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is consistent with my guess that the error is
>>>>>>>>>>>>>> generated by fhandler_fifo::wait. I have a feeling that
>>>>>>>>>>>>>> read_ready should have been created as a manual-reset
>>>>>>>>>>>>>> event, and that more care is needed to make sure it's set
> when it should be.
>
> [snip]
>
>>>>>>> Never mind. I was able to reproduce the problem and find the cause.
>>>>>>> What happens is that when the first subprocess exits,
>>>>>>> fhandler_fifo::close resets read_ready. That causes the second
>>>>>>> and subsequent subprocesses to think that there's no reader open,
>>>>>>> so their attempts to open a writer with O_NONBLOCK fail with ENXIO.
>
> [snip]
>
>>> I wrote in a previous mail in this topic that it seemed to work fine
>>> for me as well, but when I bumped up the numbers of writers and/or the
>>> number of messages (e.g. 25/25) it starts to fail again
>
> [snip]
>
>> Yes, it is a resource issue. There is a limit on the number of writers
> that can be open at one
>> time, currently 64. I chose that number arbitrarily, with no idea what
> might actually be
>> needed in practice, and it can easily be changed.
>
> Does it have to be a limit at all ? We would rather see that the application
> decide how much resources it would like to use. In our particular case there
> will be a process-manager with an incoming pipe that possible several
> thousands of processes will write to
I agree.
> Just for fiddling around (to figure out if this is the limit that make other
> things work a bit odd), where's this 64 limit defined now ?
It's MAX_CLIENTS, defined in fhandler.h. But there seem to be other resource
issues also; simply increasing MAX_CLIENTS doesn't solve the problem. I think
there are also problems with the number of threads, for example. Each time your
program forks, the subprocess inherits the rfd file descriptor and its
"fifo_reader_thread" starts up. This is unnecessary for your application, so I
tried disabling it (in fhandler_fifo::fixup_after_fork), just as an experiment.
But then I ran into some deadlocks, suggesting that one of the locks I'm using
isn't robust enough. So I've got a lot of things to work on.
>> In addition, a writer isn't recognized as closed until a reader tries to
> read and gets an error.
>> In your example with 25/25, the list of writers quickly gets to 64 before
> the parent ever tries
>> to read.
>
> That explains the behaviour, but should there be some error returned from
> open/write (maybe it is but I'm missing it) ?
The error is discovered in add_client_handler, called from thread_func. I think
you'll only see it if you run the program under strace. I'll see if I can find
a way to report it. Currently, there's a retry loop in fhandler_fifo::open when
a writer tries to open, and I think I need to limit the number of retries and
then error out.
Ken
More information about the Cygwin
mailing list