This is the mail archive of the gdb@sourceware.cygnus.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: Standard GDB Remote Protocol

To: Steven Johnson <sbjohnson at ozemail dot com dot au>
Subject: Re: Standard GDB Remote Protocol
From: jtc at redback dot com (J.T. Conklin)
Date: 07 Dec 1999 14:12:55 -0800
Cc: gdb at sourceware dot cygnus dot com
References: <199911090706.CAA13120@zwingli.cygnus.com> <199911102246.RAA01846@mescaline.gnu.org> <npr9hi321d.fsf@zwingli.cygnus.com> <199911231303.IAA01523@mescaline.gnu.org> <npr9hg2a9t.fsf@zwingli.cygnus.com> <199911251715.MAA09225@mescaline.gnu.org> <npzovvc04o.fsf@zwingli.cygnus.com> <199912010821.DAA27130@mescaline.gnu.org> <npogca9tb8.fsf@zwingli.cygnus.com> <3845AB0E.3795D99E@ozemail.com.au> <5md7sql00o.fsf@jtc.redbacknetworks.com> <3845F45A.38EA29CF@ozemail.com.au>
Reply-To: jtc at redback dot com

>>>>> "Steven" == Steven Johnson <sbjohnson@ozemail.com.au> writes:
>> Since you're putting up your hand, would you be willing to review the
>> protocol spec and point out areas that are ambiguous, confusing, need
>> revising, etc?  

Steven> Following is a Hopefully Constructive Critique, of the GDB Remote
Steven> Protocol.

Many thanks.  

I realize I'm late in this response, but I read your message as soon
as it came in.  I was quite pleased that you found so many issues so
quickly.  

Steven> Packet Structure:
Steven> Simple structure, obviously originally designed to be able to be driven
Steven> manually from a TTY. (Hence it's ASCII nature.) However, the protocol
Steven> has evolved quite significantly and I doubt it could still be used very
Steven> efficiently from a TTY. That said, it still demarks frames effectively.

I'm a bit more pessimistic.  I believe that framing was effective in
the original protocol (although I would have put the checksum inside
the packet delimeters); but recent changes have muddied the distinc-
tion between the debug protocol and the data link layer to the point
where they are now inseparable.  I think this is most unfortunate.

Steven> Sequence Numbers:
Steven> Definition of Sequence ID's needs work. Are they necessary? Are they
Steven> deprecated? What purpose do they currently serve within GDB? One would
Steven> imagine that they are used to allow GDB to handle retransmits from a
Steven> remote system. Reading between the lines, this is done to allow error
Steven> recovery when a transmission from target to host fails. Possible
Steven> sequence being:

I'm sure that you understand that we have to document the protocol as
it is, rather than how we would like it to be.  The protocol allows a
good deal of flexibility in adding new commands, which GDB can use by
probing the remote stub.  This allows the protocol to evolve without
requiring all existing stubs to be changed.  Unfortunately, there is
not a similar ability to change the data-link layer.  The cases where
we've done so (RLE, binary write) have had unfortunate consequences
that were not realized until too late.

That's a long and roundabout way of saying that sequence numbers have
limited usefulness, and it will be difficult if not impossible to fix
them.

All a stub does is append the sequence number to the ack.  GDB could
use this to ensure that it's not misinterpreting the ACK for another
packet or a '+' found within a packet, but it can't assume that stubs
will reject subsequent packets with the same sequence number (duplic-
ate packets).

Another problem, not with the spec but with the implementation is that
naks to packets with sequence numbers should also append the sequence
number, but none of the sample stubs do this.

Steven> <- $packet-data#checksum
Steven> -> +
Steven> -> $sequence-id:packet-data#checksum (checksum fails or receive timeout
Steven> halfway through packet).
Steven> <- -sequence-id
Steven> -> $sequence-id:packet-data#checksum
Steven> <- +sequence-id

Steven> When do the sequence-id's increment? Presumably on the successful
Steven> receipt of the +sequence-id acknowledgement.

Steven> If they increment on the successful acknowledgement, what happens if the
Steven> acknowledgement is in error? For example a framing error on the '+'. The
Steven> target would never see the successful acknowledgement and would not
Steven> increment it's sequence number.

Steven> So what if it doesn't? The +/- Ack/Nak mechanism should be amply
Steven> sufficient to allow retransmits of missed responses. 

Steven> I can see little practical benefit in a sequence-id in the responses, as
Steven> it is currently documented. This is supported buy the comment within the
Steven> document: "Beyond that its meaning is poorly defined. GDB is not known
Steven> to output sequence-ids". This tends to indicate that the mechanism has
Steven> fallen out of use, Probably because it doesn't actually achieve
Steven> anything. If this is the case, it could be deprecated. However, I would
Steven> advocate not deprecating it from the protocol, because If they were sent
Steven> by GDB a current hole I believe is in the protocol could be plugged. (I
Steven> will discuss this hole later in this critique.)

Steven> Ack/Nak Mechanism:
Steven> Simple Ack/Nak Mechanism, using + and - Respectively. Also
Steven> reflects the simple ASCII basis of the protocol. My main
Steven> concern with this system is there is no documentation of
Steven> timing. Usually Ack/Nak must be received within a certain time
Steven> frame, otherwise a Nak is assumed and a retransmit
Steven> proceeds. This is necessary, because it is possible for the
Steven> Ack/Nak character to be lost (however unlikely) on the line
Steven> due to a data error.  I think there should be a general timing
Steven> basis to the entire protocol to tie up some potential
Steven> communications/implementation problems.

Yes, the protocol should talk about timeouts.  IMHO the timeouts need
to be presented as variables rather than absolutes, as the acceptable
timeout will vary greatly depending on situation.

There is GDB variable 'remotetimeout' which is used for timeouts, but
there are more than one timeout should distinguish even if GDB and the
stubs do not.  Two that come to mind are the time between a packet is
sent and an ack/nak is sent in response, and the time between a ack'd
command is sent and a response is received.

Steven> Once the $ character for the start of a packet is transmitted, each
Steven> subsequent byte must be received within "n" byte transmission times.
Steven> (This would allow for varying comms line speeds). Or alternately a
Steven> global timeout on the whole message could be define one "$" (start
Steven> sentinel) is sent, the complete message must be received within "X"
Steven> time. I personally favour the inter character time as opposed to
Steven> complete message time as it will work with any size message, 

As above, rather than come up with a heuristic like n * bit rate, or n
ticks (in whatever unit) per packet, I believe that GDB must provide a
variable (or variables) for timeouts.  For example, both of the above
would fail with a high latency connection, perhaps commands to a mars
probe...

Steven> One possible timeout that would be easy to work with could be:
Steven> Timeout occurs 1 second after the last received byte.

This might lose for very low bandwidth connections, perhaps commands
to submerged submarines.

Steven> For ACK/NAK I propose that something needs to be defined along
Steven> the lines: ACK/NAK must be received within X Seconds from
Steven> transmission of the end of the message, otherwise a NAK must
Steven> be assumed.

Steven> There is no documentation of the recovery procedure, Does GDB
Steven> retransmit if its message is responded to with a NAK? If not,
Steven> what does it do? How is the target supposed to identify and
Steven> handle retransmits from GDB.

GDB retransmits commands 3 times and gives up.  This is a hardcoded
constant --- there is no 'remoteretries' variable.  

Steven> What happens if something other than + or - is received when
Steven> ACK/NAK is expected. (For example $).

A '$' is assumed to be an old response.  It then calls getpkt() to eat
up the packet.  Even if it is a packet, getpkt() is probably going to
fail because the '$' has already been read.  

Any other character is considered to be junk and is discarded.  There
is no timeout so if the stub continuously spews junk, the connection
will hang.

Steven> If this is the intent of sequence-id and it has fallen into
Steven> disuse, then to allow it's re-introduction at a later date, it
Steven> could be documented that if GDB sends a sequence-id, then the
Steven> retransmit processing I've documented here operates, otherwise
Steven> the currently defined behaviour operates, and that sequence-id
Steven> is only sent by the target in responses where they are present
Steven> in the original GDB message. This would allow GDB to probe if
Steven> the target supports secure and recoverable message delivery or
Steven> not.

I'm a bit fuzzy here.  It's not clear how GDB can probe whether the
stub supports a reliable transport, since there are so many stubs in
the field that respond with an ack with an appended sequence number
after receiving a packet with a sequence number.

Steven> Run Length Encoding:
Steven> Is run length encoding supported in all packets, or just some packets?
Steven> (For example, not binary packets)

It's supported on all packets from the target to GDB.  It's unfortunate
that this isn't symetric, because it would be most useful in one of the
most issued commands: write all registers.

Steven> Why not allow lengths greater than 126? Or does this mean lengths
Steven> greater than 97 (as in 126-29)

The run length is encoded as a printable ASCII value, so the run
length should not be greater than 97.

Steven> If binary packets with 8 bit data can be sent, why not allow
Steven> RLE to use length also greater than 97. If the length maximum
Steven> is really 126, then this yields the character 0x9B which is 8
Steven> bits, wouldn't the maximum length in this case be 226. Or is
Steven> this a misprint?

Packets with 8 bit values are still problematic.  Even the probe GDB
does to determine whether the binary memory write command exists is 
no guarentee that all possible values can be transmitted.

Steven> Why are there 2 methods of RLE? Is it important for a Remote
Steven> Target to understand and process both, or is the "cisco
Steven> encoding" a proprietary extension of the GDB Remote protocol,
Steven> and not part of the standard implementation. The documentation
Steven> of "cisco encoding" is confusing and seems to conflict with
Steven> standard RLE encoding. They appear to be mutually
Steven> exclusive. If they are both part of the protocol, how are they
Steven> distinguished when used?

Some companies have made their own modifications to the remote protocol
without consulting or working with the GDB maintainers.  In some cases,
they later decide to contribute their extensions.  This is the case of
the Cisco remote protocol extensions.  GDB distinguishes the remote
protocol and the Cisco varient by treating them as separate protocols 
that happen to share a lot of code.

I still think the integration of Cisco's changes were not in the best
interest of GDB.  But one thing I think they did right was their
implementation of RLE.  At the expense of one extra character to
express the run length, 256 bytes can be represented.

Steven> Deprecated Messages:
Steven> Should an implementation of the protocol implement the deprecated
Steven> messages or not? What is the significance of the deprecated messages 
Steven> to the current implementation?

I think not.  Commands have been depricated because:
        * they were insufficently specified
        * were never implemented in a released GDB and/or sample debug stubs
        * were replaced by a better way of doing things.
        * etc.

In the table of comands, I'd prefer that instead of some commands be
marked 'optional', that the required commands (g/G/m/M/c) be marked
required.

Steven> Character Escaping:
Steven> The mechanism of Escaping the characters is not
Steven> defined. Further it is only defined as used by write mem
Steven> binary. Wouldn't it be useful for future expansion of the
Steven> protocol to define Character Escaping as a global feature of
Steven> the protocol, so that if any control characters were required
Steven> to be sent, they could be escaped in a consistent manner
Steven> across all messages. Also, wouldn't the full list of escape
Steven> characters be $,#,+,-,*,0x7d. Otherwise, + & - might be
Steven> processed inadvertently as ACK or NAK. If this can't happen,
Steven> then why must they be avoided in RLE? If they are escaped
Steven> across all messages, then that means they could be used in RLE
Steven> and not treated specially.

IMO the problem with character stuffing is that it's happining at the
protocol layer instead of the data link layer.  You're right that the
full list of characters should include '$', '#', '+', and '-'.
Although GDB -> target RLE is not performed, '*' should be probably
done done in case it is (or binary memory read) is implemented.

In addition, I'd include characters with values 0-31 and 128-159 to
avoid problems with serial links that don't handle control characters.
Perhaps with some sort of data link negotiation that would open up 
larger windows.

Another benefit with doing character stuffing at the data link layer
is that it avoids dealing with variable length packets at the protocol
layer.  

But I don't know how we could fix this now.

Steven> 8/7 Bit protocol.
Steven> With the documentation of RAW Binary transfers, the protocol moves from
Steven> being a strictly 7 bit affair into being a 8 bit capable protocol. If
Steven> this is so, then shouldn't all the restrictions that are placed from the
Steven> 7 bit protocol days be lifted to take advantage of the capabilities of
Steven> an 8 bit message stream. (RLE limitations, for example). Would anyone
Steven> seriously be using a computer that had a 7 bit limitation anymore
Steven> anyway? (At least a computer that would run GDB with remote debugging).

I'm more concerned with the other side than the GDB side.  I can
imagine a target that could only reasonably support a 7 bit channel.

Steven> Thoughts on consistency and future growth:
Steven> Apply RLE as a feature of All messages. (Including binary messages, as
Steven> these can probably benefit significantly from it).

Steven> Apply the Binary Escaping mechanism as a feature of the packet that is
Steven> performed on all messages prior to transmission and immediately after
Steven> reception. Define an exhaustive set of "Characters to be escaped".

Steven> Introduce message timing constraints.
Steven> Properly define sequence-id and allow it to be used from GDB to make
Steven> communications secure and reliable.

These are all great, but I'm not sure how much can been within the
confines of the existing protocol.  

        --jtc

-- 
J.T. Conklin
RedBack Networks

References:
- MMX: Messy Multimedia eXtensions
  - From: Jim Blandy
- Re: none
  - From: Jim Blandy
- Re: none
  - From: Eli Zaretskii
- Re: none
  - From: Jim Blandy
- Re: none
  - From: Eli Zaretskii
- Re: none
  - From: Jim Blandy
- Re: ST(i) and MMj
  - From: Eli Zaretskii
- Re: ST(i) and MMj
  - From: Jim Blandy
- Standard GDB Remote Protocol
  - From: Steven Johnson
- Re: Standard GDB Remote Protocol
  - From: J.T. Conklin
- Re: Standard GDB Remote Protocol
  - From: Steven Johnson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]