This is the mail archive of the ecos-discuss@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: eCos TCP/IP Performance


> I am researching several RTOSs and processors for an ethernet
> application.  The application is a board that takes serial MPEG-2 data
> at up to 32Mbits/sec and packetizes the data for transmission over UDP
> on Fast Ethernet.  eCos looks very promising as a RTOS for our design.
> 
> I am looking for UDP or TCP performance data for the processors and/or
> platforms supported by eCos to be sure we can handle this throughput.
> 
> I searched the archive for this mailing list and found one post that
> addresses TCP/IP performance.  The author (Richard Wicks) said he was
> able to get 20 Mbits/sec on a IQ80310 board with an Intel XScale
> processor running at 600 MHz.  He also said that under VxWorks he was
> able to get 40 Mbits/sec.
> 
> Does anyone else have UDP or TCP performance numbers for the various
> platforms supported by eCos?  Is a 600 Mhz processor really necessary to
> get this throughput?
> 
> Our application will be packetizing the data close to the maximum
> ethernet frame size of 1500 bytes.

Part of the inefficiency comes from the design goals. Network stacks
are by nature non real time. eCos tries to stop the none real time
stack upsetting real time code. So operations that could take a long
time are done in a thread rather than a DSR. For example it can take a
long time for the device driver to copy the next packet to be sent
from network buffer memory into the devices memory and then fiddle
around with ring buffer pointers etc. This is all done in a thread
which can be preempted by real time threads running at a higher
priority. It could be that VxWorks does this sort of thing in a DSR
which is more efficient, but can have big impact on thread switch
latency. 

There are lots of different variables involved which will affect your
performance. I think the bottleneck for our 200MHz StrongARM board is
the memory bandwidth. The L1 cache is quite small so its spending a
lot of time getting stuff from DRAM. If we had a bigger L1 we could
probably get faster results. Don't just look at MHz, look at cache
size and memory bandwidth.

The ARM does not have instructions to flush cache to DRAM. This means
we cannot transmits directly from the network buffer. We have to copy
the data into a region of memory which is not cachable. This means one
extra copy. The Xscale may have the same problem? 

The network stack in eCos is a port of a Unix Stack. As such it still
has the kernel/user space boundary concept and associated copy. I
don't know anything about the VxWorks stack, but its maybe a zero copy
stack. I'm sure there are optimizations that can be made hear,
especially with UDP.

I optimized the memcpy used to move the data between normal memory and
the PCI window. I got about a 3x speedup for this copy for the SA110
with an i82559. Unfortunately the code is little endian only and so
cannot be incorporated into the mainstream i82559 driver until there
is a big endian equivalent. 

Where are you sources your MPEG stream from? Can it generate packets
of the correct size? Do you actually need to do any processing on it?
If not, i would be tempted to play around with the insides of the
stack. Get your MPEG encoded to generate packets of about 1.5Kbytes,
at 2Kbyte spacing. Overlap your encoders PCI window with the ethernet
device window. Modify the cluster code so you can put the data part of
a cluster in the PCI window. You can then push the mbuf into the UDP
layer so its adds all the heads needed and passes it down to the
device driver. Turn of UDP checksums so you don't need to touch the
data. They are optional. Hack the device driver so that it sees the
cluster is in the PCI window already and so does not need to copy
it. Its a lot of work with lots of hidden traps, but its zero copy and
so processor and memory efficient.

   Andrew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]