[win32] Fix suspend count handling

Lerele lerele@champenstudios.com
Sat Nov 24 14:18:00 GMT 2007


Christopher Faylor escribió:
> On Fri, Nov 23, 2007 at 11:19:14AM +0100, Lerele wrote:
>   
>> Christopher Faylor escribi?:
>>     
>>> On Wed, Nov 21, 2007 at 11:18:38PM +0000, Pedro Alves wrote:
>>>   
>>>       
>>>> Pedro Alves wrote:
>>>>     
>>>>         
>>>>> Pedro Alves wrote:
>>>>>       
>>>>>           
>>>>>> On Nov 21, 2007 2:13 PM, Pierre Muller wrote:
>>>>>> That's not what I see here.  Can you show me a run where you get 4
>>>>>> only this patch applied?
>>>>>>
>>>>>>         
>>>>>>             
>>>>> I did try that, but posted a log of not doing it :-).
>>>>> I've just tried about 30 times, and only once I did see
>>>>> a 4 coming out ...  oh, well, one of those things.
>>>>>       
>>>>>           
>>>> OK.  Back at my home laptop, I can reproduce that with
>>>> no problems.  Let me clarify what the 4 problem really is.
>>>> It's a race between gdb and the inferior.
>>>>
>>>> Take this slightly changed test case.  The only difference
>>>> to the original version is the extra Sleep call.
>>>>
>>>> #include <windows.h>
>>>>
>>>> HANDLE started;
>>>> HANDLE stop;
>>>>
>>>> DWORD WINAPI
>>>> thread_start (void *arg)
>>>> {
>>>>     SetEvent (started);
>>>>     WaitForSingleObject (stop, INFINITE);
>>>>     return 0;
>>>> }
>>>>
>>>> int
>>>> main (int argc, char **argv)
>>>> {
>>>>     int i;
>>>>     DWORD suspend_count;
>>>>     started = CreateEvent (NULL, TRUE, FALSE, NULL);
>>>>     stop = CreateEvent (NULL, TRUE, FALSE, NULL);
>>>>
>>>>     HANDLE h = CreateThread (NULL, 0, thread_start, NULL,
>>>> 			   0, NULL);
>>>>
>>>>     WaitForSingleObject (started, INFINITE);
>>>>
>>>>     for (i = 0; i < 3; i++)
>>>>       if (SuspendThread (h) == (DWORD) -1)
>>>>         {
>>>> 	printf ("SuspendThreadFailed\n");
>>>> 	return 1;
>>>>         }
>>>>
>>>>     Sleep (300);
>>>>
>>>>     suspend_count = ResumeThread (h); /* set breakpoint here */
>>>>
>>>>     printf ("%lu\n", suspend_count); /* should be 3 */
>>>>
>>>>     while ((suspend_count = ResumeThread (h)) != 0
>>>> 	 && suspend_count != -1)
>>>>       ;
>>>>     SetEvent (stop);
>>>>     WaitForSingleObject (h, INFINITE);
>>>>     CloseHandle (h);
>>>>     CloseHandle (started);
>>>>     CloseHandle (stop);
>>>>     return 0;
>>>> }
>>>>
>>>> If you do the "break at ...", "run", "thread 3", "continue"
>>>> sequence, and "..." is the "Sleep" line, you'll get 3,
>>>> but if you put the break at the /* set breakpoint here */
>>>> line, you'll get 4 (if you're (un)lucky).
>>>>
>>>> The race happens due to the fact that gdb is
>>>> doing something similar to this:
>>>>
>>>> win32_continue()
>>>> {
>>>>    ContinueDebugEvent (...);  /* Resumes all non suspended
>>>>                                  threads of the process.  */
>>>>
>>>>    /* At this point, gdb is running concurrently with
>>>>       the inferior threads that were not suspended - which
>>>>       included the main thread of the testcase.  */
>>>>    foreach t in threads do
>>>>       if t is suspended
>>>>          ResumeThread t
>>>>       fi
>>>>    done
>>>> }
>>>>
>>>> If you break at the Sleep call, when we resume, gdb will
>>>> have a bit of time to call ResumeThread on the suspended
>>>> thread of the testcase.  If you instead break at the
>>>> ResumeThread line, you'll have a good chance that the
>>>> inferior wins the race, hence the "4" result (remember
>>>> that ResumeThread returns the previous suspend count).
>>>> If we put something like this after the ResumeThread call:
>>>>
>>>>     (...)
>>>>     suspend_count = ResumeThread (h); /* set breakpoint here */
>>>>
>>>> +  Sleep (300);
>>>> +  SuspendThread (h);
>>>> +  suspend_count = ResumeThread (h);
>>>>
>>>>     printf ("%lu\n", suspend_count); /* should be 3 */
>>>>     (...)
>>>>
>>>> ... you'll see that eventually gdb will bring the
>>>> suspend count back to 3.  (A SuspendThread, ResumeThread
>>>> pair is the way to get at the suspend count.)
>>>>
>>>>     
>>>>         
>>>>> Since the watchpoint patch should fix this, what shall I do?  Shall I 
>>>>> merge
>>>>> the two and resubmit, or leave it at that ?  They've already been tested
>>>>> together without regressions.
>>>>>       
>>>>>           
>>>> Here is the merge from the patch I posted at the start of the
>>>> thread with this patch:
>>>>    [win32] Fix watchpoint support
>>>>    http://sourceware.org/ml/gdb-patches/2007-11/msg00390.html
>>>>
>>>> This patch fixes both the suspend_count
>>>> handling, and the watchpoint support.
>>>>
>>>> Thanks Pierre, for looking at it.
>>>>
>>>> OK ?
>>>>
>>>> -- 
>>>> Pedro Alves
>>>>     
>>>>         
>>>   
>>>       
>>>> 2007-11-21  Pedro Alves  <pedro_alves@portugalmail.pt>
>>>>
>>>> 	* win32-nat.c (thread_info_struct): Rename suspend_count to
>>>> 	suspended, to be used as a flag.
>>>> 	(thread_rec): Only suspend the thread if it wasn't suspended by
>>>> 	gdb before.  Warn if suspending failed.
>>>> 	(win32_add_thread): Set Dr6 to 0xffff0ff0.
>>>> 	(win32_resume): Likewise.
>>>> 	(win32_continue): Set Dr6 to 0xffff0ff0.  Update usage of the
>>>> 	`suspended' flag.  Do ContinueDebugEvent after resuming the
>>>> 	suspended threads, not before.  Set threads' contexts before
>>>> 	resuming them, not after.
>>>>     
>>>>         
>>>   
>>>       
>>>> ---
>>>> gdb/win32-nat.c |   80 
>>>> +++++++++++++++++++++++++++++++-------------------------
>>>> 1 file changed, 45 insertions(+), 35 deletions(-)
>>>>
>>>> Index: src/gdb/win32-nat.c
>>>> ===================================================================
>>>> --- src.orig/gdb/win32-nat.c	2007-11-11 23:13:04.000000000 +0000
>>>> +++ src/gdb/win32-nat.c	2007-11-21 22:39:56.000000000 +0000
>>>> @@ -112,14 +112,14 @@ static enum target_signal last_sig = TAR
>>>> /* Set if a signal was received from the debugged process */
>>>>
>>>> /* Thread information structure used to track information that is
>>>> -   not available in gdb's thread structure. */
>>>> +   not available in gdb's thread structure.  */
>>>> typedef struct thread_info_struct
>>>>   {
>>>>     struct thread_info_struct *next;
>>>>     DWORD id;
>>>>     HANDLE h;
>>>>     char *name;
>>>> -    int suspend_count;
>>>> +    int suspended;
>>>>     int reload_context;
>>>>     CONTEXT context;
>>>>     STACKFRAME sf;
>>>> @@ -244,9 +244,10 @@ check (BOOL ok, const char *file, int li
>>>> 		     GetLastError ());
>>>> }
>>>>
>>>> -/* Find a thread record given a thread id.
>>>> -   If get_context then also retrieve the context for this
>>>> -   thread. */
>>>> +/* Find a thread record given a thread id passed in ID.  If
>>>> +   GET_CONTEXT is not 0, then also retrieve the context for this
>>>> +   thread.  If GET_CONTEXT is negative, then don't suspend the
>>>> +   thread.  */
>>>>     
>>>>         
>>> I don't see any reason to capitalize get_context in the comment.
>>>
>>>   
>>>       
>>>> static thread_info *
>>>> thread_rec (DWORD id, int get_context)
>>>> {
>>>> @@ -255,12 +256,21 @@ thread_rec (DWORD id, int get_context)
>>>>   for (th = &thread_head; (th = th->next) != NULL;)
>>>>     if (th->id == id)
>>>>       {
>>>> -	if (!th->suspend_count && get_context)
>>>> +	if (!th->suspended && get_context)
>>>> 	  {
>>>> 	    if (get_context > 0 && id != current_event.dwThreadId)
>>>> -	      th->suspend_count = SuspendThread (th->h) + 1;
>>>> +	      {
>>>> +		if (SuspendThread (th->h) == (DWORD) -1)
>>>> +		  {
>>>> +		    DWORD err = GetLastError ();
>>>> +		    warning (_("SuspendThread failed. (winerr %d)"),
>>>> +			     (int) err);
>>>> +		    return NULL;
>>>> +		  }
>>>> +		th->suspended = 1;
>>>> +	      }
>>>> 	    else if (get_context < 0)
>>>> -	      th->suspend_count = -1;
>>>> +	      th->suspended = -1;
>>>> 	    th->reload_context = 1;
>>>> 	  }
>>>> 	return th;
>>>> @@ -294,8 +304,7 @@ win32_add_thread (DWORD id, HANDLE h)
>>>>       th->context.Dr1 = dr[1];
>>>>       th->context.Dr2 = dr[2];
>>>>       th->context.Dr3 = dr[3];
>>>> -      /* th->context.Dr6 = dr[6];
>>>> -      FIXME: should we set dr6 also ?? */
>>>> +      th->context.Dr6 = 0xffff0ff0;
>>>>     
>>>>         
>>> This, and similar cases, needs to use a #define with an explanatory 
>>> comment.
>>>
>>> With the above minor changes, this looks ok.
>>>
>>> I have to ask, however, if the SuspendThread's are even needed at all.
>>> When I was writing this code, I wasn't entirely sure that gdb needed to
>>> do anything like this but I erred on the side of caution.  So, I'm
>>> wondering if things would still work ok if the
>>> SuspendThread/ResumeThread stuff was gone.
>>>       
>> I think they are needed.  They were anyway with the new gdbserver based
>> (vs.  Win32 API based) interrupt code I sent several days ago, and that
>> so very kindly Pedro prepared for commitment, but that I still haven't
>> found the time to sit down and look at them (however I'm absolutely
>> sure they're just fine), I guess his changes must be similar to what I
>> sent in the first place.
>>     
>
> *Why* did you think you needed them in gdbserver?  Did you actually try
> not using them or did you just copy my code from win32-nat.c?
>
>   

I thought it natural to implement win32 functionality into gdbserver not 
starting from scratch, but from an already working implementation, such 
as win32-nat.c, however *truly* one drawback of this is that original 
code bugs got inherited into new implementation (such as the subject in 
question), and the porting process however was not really as 
straightforward as copy/paste; certain amount of work was involved (a 
lot of debugging gdbserver and gdb themselves), however I did not stop 
to think that SuspendThread was not actually needed, as it was already 
there.

I like your previous statement "I wasn't entirely sure that gdb needed 
to do anything like this but I erred on the side of caution.", except 
for the "erred" part.
I think its not bad to, in order to access thread data, lower memory, 
etc., first make sure it is actually paused, for whatever operation you 
want to do with it, as to not rely on a debug exception being received, 
or gdbserver itself stopping the low.  Also you may even need this 
functionality for any future new requirement.

If it works without problems and it's safer and more flexible, I'd just 
leave it. :-)



>> Apart from this, there's also the case where (at least for gdbserver) 
>> socket data is received asynchronously while the child is running. This 
>> socket data could indicate gdbserver to set/enable/disable a breakpoint, 
>> read thread registers, etc., and this kind of things may require to stop 
>> the child using SuspendThread.
>> Right?
>>     
>
> I don't know exactly what you are referring to but if you are proposing
> using SuspendThread/ResumeThread to work around non UNIX-like handling
> of signals, I've previously mentioned that the use of SuspendThread is
> dangerous since, the last I checked, Windows wasn't very good about not
> avoiding reentracy problems if you suspend a thread and then use another
> funcion from the Windows API.  This used to cause strange random
> behavior and even blue screens in older versions of Windows NT.  I guess
> it's possible that it's better these days but there is no way to really
> know for sure.
>
> cgf
>
>   


So then that's the reason.
Yes, I was referring to what you understood.

I do use win32 gdbserver quite often, and I have not encountered *any* 
kind of problem yet, however I do have to say I do use XP almost 
exclusively to run gdbserver.
If there is a problem with some other version of Windows, maybe you're 
right and SuspendThread should be removed after all.
It is a dilemma.

However, I guess you may have stumbled upon the fact that say for 
example msvcmon inferior *pausing* does actually work quite well, no 
matter what Windows version it's running on.  It's not that I want to 
start a discussion for mentioning that specific debugger nor writing ms 
here, but my opinion is that I think people would expect that kind of 
working from gdb/gdbserver on windows.  I would anyway.  This paragraph 
makes me ask, maybe the problems you talk about were due to some other 
reason, and not just about using SuspendThread only? Or are you 100% 
sure it's because of that specific function?

Leo.




More information about the Gdb-patches mailing list