This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Making the transport layer more robust


Hi,

>Indeed. That is the first pass, if there is anything to read,
>we read it, but then (if select is supported) we call select()
>on the control_channel

Mmmmh, we are probably not talking about the same thing ;-) I was stating "before changes/after changes" in 1st mail to mention introduction of select() in the code. In fact I was doing a comparison between v1.3 and latest mainline.
To explain my point, I then re-explained the "before changes" but did not emphasize it was the case. And in "before changes", code states: "The runtime does not implement select() on the command filehandle"

No select() means kernel polling is useless thanks to userspace polling. But that is old story !


>Yeah, you are right, that second bit is a little strange, we
>kind of depend on user space always calling us with
>O_NONBLOCK. But if it doesn't, then at least we do something
>somewhat sensible.

In the past, I had wondered if userspace could not make a blocking read to remove userspace polling. But now we have select() so we could probably remove any (dead) code related to blocking read.


>> Control channel userpace polling (well, I consider control
>everything that is not trace/data output from script)
>> -      usleep (250*1000); /* sleep 250ms between polls */
>> +      usleep (2000*1000); /* sleep 250ms between polls */
>
>So with current git trunk, you get pselect support, so don't
>need to tweak this anymore.

Aligned

>yeah, that is somewhat extreme, if possible try just
>-DSTP_WORK_TIMER=... to get a good balance. I am interested in
>what value works well for you.

Acceptable on v1.3 ;-)
On v1.5, we kept polling and went for 1s and it seemed to work well. This is giving latency I imagine but 1s is not annoying for us. I had to stop testing to have board repaired.

>
>> Data channel userspace timeout of select()
>> -       struct timespec tim = {.tv_sec=0,
>.tv_nsec=200000000}, *timeout = &tim;
>> +       struct timespec tim = {.tv_sec=5, .tv_nsec=0}, *timeout =
>> + &tim;
>
>With bulk mode this shouldn't be necessary. But it might be
>nice to make this tunable too with some -D option.

The key point here and above is that we can use -D as anyone has to recompile kernel module. This would be the customization we are looking for (slow polling for PM cases, fast polling for cases with lot of trace).
Userspace would need to be frozen (thanks to select()/ppoll()) as we want to deliver it pre-compiled to other developers. So -D would not really help.
What is the rationale behind 200ms timeout ? Bulkmode code explains that there is a potential race condition but then sets a timeout of 10s.

Well, we are close to close on these Power Management topics, great...

Regards
Fred

Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]