This is the mail archive of the ecos-patches@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC, fix for bogus timeouts in select()


> Unix is not a realtime OS like eCos.
> 
> How does other realtime operating systems implement select()?
> 
> Hmmm... I wonder if Linux actually does an extra check after it wakes up
> after a timeout...

It might do, but that not the issue here. Even if it did, it could
still get into the situation that the select() function has returned,
but there is data in the socket to be read. The other issue is that
TCP/IP network by its very nature is not real time, so if its eCos,
Linux, or M$ it makes no difference!

> What if the Linux implementation actually performs a check for more data
> after a timeout, would that swing your opinion?

It probably does, which is why i said i would look at your patch. But
even with your patch, it can still happen that the function select()
has existed with a timeout but there is data to be read on the socket.
 
> > The select() system call exited on a timeout and you are back
> > into the libc select() function when you get time sliced. While some
> > other process is running the ethernet device interrupt goes off and
> > the stack puts new data into the socket ready for the userspace to
> > read sometime in the future. Your process then gets the CPU back and
> > the libc select function exits back into you application. Select tells
> > you it has timed out, but there is infact data to be read on the
> > socket. 
> 
> So what is the correct use of select then?

It all depend on what you actually want to know? If your really want
to cut out the race condition between select returning after a timeout
and there being data to read on the socket you need to do something
like:

        if (select(n,&readfds,NULL,NULL,&tv) == 0) {
            cyg_schedular_lock();
            tv.tv_usec = tv.tv_sec = 0;
            if (select(n,&readfds,NULL,NULL,&tv) == 0) {
                // Here we know there is nothing to read
            }
            cyg_schedular_unlock();
        }

i.e. do a select with timeout. Then stop the network stack from
running and then do a select which just polls to see if anything has
arrived. So long as the schedular is locked, you know the stack has
not put anything into the socket.

But is this enough? Maybe the network stack has been preempted while
its been processing a packet for the socket? So there is data in the
stack, but it has not made it to the socket yet? Or maybe an ISR and
DSR has run, but the network stack is running at a lower priorty on a
busy system so has not had chance to processes the packet yet?

Basically, there is always a race and you have to live with it.

 
> > In practice this makes little difference. The next time around
> > the loop select will exist imeadiately telling you there is data on
> > the socket.
> 
> Hmmm.... I wonder how many applications that get this right. 

Most. People know that networking is full of race conditions you have
to be careful of to avoid deadlocks, live locks, or unexpected
packets.

> I don't think the patch itself will pass muster. The mail was intended
> to bring up the issue. 

Which it has done. Like i said, i will look at the patch and think
about it. But i also suspect your application is broken in its
assumptions about select().

        Andrew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]