This is the mail archive of the
mailing list for the Cygwin project.
Re: gawk: Bad File Descriptor error with concurrent readonly access to a network file
- From: Vermessung AVT - Wolfgang Rieger <w dot rieger at avt dot at>
- To: "cygwin at cygwin dot com" <cygwin at cygwin dot com>
- Date: Thu, 22 Oct 2015 11:37:20 +0000
- Subject: Re: gawk: Bad File Descriptor error with concurrent readonly access to a network file
- Authentication-results: sourceware.org; auth=none
- Reply-to: "cygwin at cygwin dot com" <cygwin at cygwin dot com>
Marco suggested I should wait for Corinna being back for that issue. Did you ever look into that, Corinna? For your info, my first mail in that thread contains a description and test case.
From: Vermessung AVT - Wolfgang Rieger
Sent: Sa, 26. September 2015 12:01
Subject: Re: gawk: Bad File Descriptor error with concurrent readonly access to a network file
On Fri, 25 Sep 2015 18:58:57 +0200, Marco Atzeri wrote:
>> "Bad file descriptor" just arose recently in another problem
I don't think this applies to our case. We use massive parallel processing, and the problem is related to that as the test case shows in our environment. In single thread operation we don't have any problems at all. I don't use fork or other of the tools mentioned. We don't have Chrome or Comodo or so installed. We have an encapsulated environment with not even an anti-virus sw running in the power workstations and as little stuff as possible because computing speed is our main issue.
>> Have you by chance some potential suspect like usual ones
I did not find there anything that seems related to our problem.
>> On your cygcheck output I notice nothing strange.
I do not think there is anything strange. I have been using Cygwin for 15+ years now. We started parallelizing our jobs some 12 years ago. Of course, hardware was not comparable then to what we have today. But the Bad File Descriptor issues only started some 3 or 4 years ago with an update of Cygwin (I really don't remember when; there must have been some major change in the Cygwin-dll: E. g., since then the type-ahead buffer of cmd.exe is no longer useable when Cygwin programs run in the shell). Since these errors were fairly rare (say, 1 in >1000 tiles), we did not dig into it deeper. However, it is an ongoing issue.
With raising workload at the file server and new workstations with more cores (allowing for more parallel processes) it became more frequent during last years. A server upgrade last winter reduced the problem, but with recently massively increasing work load it raise again.
>> Can you provide the type of network disk with
>> /usr/lib/csih/getVolInfo <volumename>
I am sorry, I have a very small installation of Cygwin running with no getVolInfo. In which package can I find that? We have MS Windows Server 2008 that provides network shares.
Again I want to stress: Running the jobs in single thread we never experienced any such problems at all. Only with several jobs running in parallel (the same batch job is started in several cmd-shell windows independently) we have these errors. The reason is obviously when by chance two processes try to access the same file at the same time which happens not often, but it happens. I assume access to local files is better synchronized by the CPU, whereas at the server there may arise these conflicts.
The major question is, what is the underlying access problem within Cygwin? As mentioned, the MS programs (e. g. copy) never show a similar problem.
Thanks for your help,
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple