This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about ecos server performance


On Tue, 2002-08-13 at 09:29, Gary Thomas wrote:
> On Tue, 2002-08-13 at 09:16, NavEcos wrote:
> > On Tuesday 13 August 2002 07:56, Gary Thomas wrote:
> > > On Tue, 2002-08-13 at 08:51, NavEcos wrote:
> > > > On Tuesday 13 August 2002 07:04, Gary Thomas wrote:
> > > > > On Tue, 2002-08-13 at 08:03, NavEcos wrote:
> > > > > > [SNIP]
> > > > > >
> > > > > > > > The bug is as follows:
> > > > > > > >
> > > > > > > > 1) The server (eCos app) starts,
> > > > > > > > 2) Connect to the server with telnet, port 4000
> > > > > > >
> > > > > > > Then what?  What do you have to do [from the "client" side] to
> > > > > > > evoke the crash?
> > > > > >
> > > > > > Connect.  That's all.  My crash happens in less than 3000 bytes
> > > > > > of transferred data, always.
> > > > > >
> > > > > > If you want, I can send you my entire environment but before I
> > > > > > do that I'll update CVS.  Maybe it was a bad day when I downloaded?
> > > > >
> > > > > No, I was able to duplicate this.  I just asked before trying it as
> > > > > I didn't want to waste time if there was more that was necessary.
> > > > >
> > > > > The problem is obvious and, indeed, the program tells you exactly why.
> > > > > It's reporting "too many mbufs to tx", which comes from the logical
> > > > > network layer which tries to pack up a packet to be sent and give it
> > > > > to the physical driver.  However, in this case, the data structure
> > > > > which represents the packet has [perhaps] hundreds of little tiny
> > > > > pieces in it.  The method used by the physical layer can't handle that
> > > > > [currently].  I'll have to think a bit about how to fix this.
> > > >
> > > > Well, the documentation states that running out of mbufs will not
> > > > crash the TCP/IP layer.  Why does it?  I suspected that it was
> > > > because there were a bunch of tiny pieces but I didn't debug it.  I
> > > > did see the error message, of course, and assumed they may be linked.
> > > > I probably should have mentioned that.
> > > >
> > > > Maybe incorporating a counting semaphore to cause threads allocating
> > > > mbufs to block would do it?  I am not sure how much overhead there
> > > > would be in doing that, but it would nicely block the threads when there
> > > > were no more mbufs.
> > >
> > > This has *nothing* to do with running out of mbufs.  That's not what
> > > what message says at all.  It says that it [currently] can't handle
> > > a data packet which is composed of so many mbufs.
> > 
> > Sorry.  As I said, I didn't do much work in debugging it.
> > 
> > > > > I would say that this is a aberrant program though and just happened to
> > > > > run into this limitation.
> > > >
> > > > Well, I agree, it's an atypical example but it's still a serious problem
> > > > when you can crash it for whatever reason.  The code is legal.
> > > >
> > > > I don't care about performance for such a program, what and I do not
> > > > think ANYBODY would writing awful code like that.  But what concerns
> > > > me is that the stack crashes.  There are instances in which you may
> > > > get a bunch of small packets being sent.
> > > >
> > > > For example, say you have a profiler that sends out the PC at the
> > > > time of an interrupt at regular intervals.  If you get the interval
> > > > just right, you'll crash the box.  You may do this as a low priority
> > > > thread too that sends all available data.  In a quiet system, it will
> > > > end up sending 4 bytes almost always.
> > >
> > > But probably not continuously, as your example does though.
> > 
> > In most cases no, but in a critical system it would be dangerous.
> > For example, a medical system.
> > 
> > > I agree that there is a problem with the stack.  It's simply not a
> > > scenario I ever imagined (nor, until today, experienced).  It's been
> > > filed as a bug and will get fixed [someday].
> > >
> > > Of course, you're free to fix it yourself.  Remember that for the
> > > most part, eCos is now a *volunteer* project.  I'm certainly not
> > > getting paid to fix this (any more). Things will get fixed if and
> > > when there is time.
> > 
> > I am well aware that it's a volunteer project.  I've contributed
> > several patches before for the XScale board and I fixed a driver
> > problem with the stack back in September of last year before it was
> > a volunteer project.  That bug also caused a crash.
> > 
> > If you could give me some advice as to what exactly is going
> > on and where, I'll submit a patch as time permits.  I'm busy too
> > but I think I can look into it before next week.  I am trying to gain
> > a mastery of this OS, so I'll be happy to try to fix it.  I just wrote the
> > list to confirm it was indeed a bug.  I don't have a huge sampling
> > of hardware here.
> 
> Plain and simple, it shouldn't crash.  The message you are getting
> tells us that the stack is trying to handle the situation.
> 
> On other hardware, it doesn't crash, or at least not the same way.
> Fixing this may take some effort.
> 

I've put in a reasonable fix - actually only making the data structure
larger, but configurable.  It now runs on the systems I tested without
any problems.

-- 
------------------------------------------------------------
Gary Thomas                  |
eCosCentric, Ltd.            |  
+1 (970) 229-1963            |  eCos & RedBoot experts
gthomas@ecoscentric.com      |
http://www.ecoscentric.com/  |
------------------------------------------------------------


-- 
Before posting, please read the FAQ: http://sources.redhat.com/fom/ecos
and search the list archive: http://sources.redhat.com/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]