This is the mail archive of the archer@sourceware.org mailing list for the Archer project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Stop the Insanity! Linespec Rewrite

From: Keith Seitz <keiths at redhat dot com>
To: archer at sourceware dot org
Date: Thu, 01 Mar 2012 17:13:37 -0800
Subject: Stop the Insanity! Linespec Rewrite

Hi,

I think that most of us know the pain that is linespec.c. Well, Tom gave me the green light on this a while back, and I played with an initial design/implementation in September.

Earlier this month, I dug this out of mothballs, and it is now time to submit the work-in-progress for comment. This is *not* a finished design and/or implementation! It is simply a place to stop along the road and ask for advice, for a fresh set of eyes. Consider this akin to an RFC.

Let me recap what some of the more important requirements for this rewrite were:

- More robust parser. As several of us can attest to, adding just about anything to linespec.c is PAINFUL. Really painful. We need a more robust linespec-to-sal infrastructure.

- Consolidate similar functionality. There are several functions which pretty much do the same thing (e.g., decode_variable vs decode_compound vs the remaining bits of decode_internal).

- Rain on the quoting parade. This is actually pretty tightly coupled to the first requirement, but it is such a nightmare as it is now, that it's just worth calling out separately. I doubt I need say more.

- Enable a way for "explicit" linespecs (as I call them, for lack of a better term). I would like to add the ability to do, e.g., "break -sourcefile foo.c -function baz::doit(char*) -offset 3" and the like. Probably most useful to MI (and python), but not altogether useless to CLI users.

- Increase the maintainability and reduce the fragility of this code. If you haven't had the pleasure of hacking on linespecs, consider yourself lucky. You've never lost sleep over the prospect of digging into this code.

I'm sure there are some other requirements/wish-list items which I've forgotten. These are just basically mine.

If you'll pardon my rambling, here are some notes about what the design/implementation does/does not do:

o We now have a (trivial) parser and a lexer. The lexer "word" breaks the input on ':', but it does know about "::" as a scope operator for C++ (but nothing on ObjC). The whole thing attempts to integrate all languages into one design/implementation. You will not see any references to current_language anywhere, except one tiny place where canonicalization is done, and that's cut-n-paste from current sources.

o Speaking of canonicalization: that is now done in only one place in the parser.

o Quoting is greatly minimized. Got spaces in a filename? Doesn't matter anymore. The only thing that matters is ':'. Everything else is simply text. When a quote character (either " or ') is seen, everything is skipped to the next quote character. [No, you cannot mix quote characters in the same lexeme, e.g., "foo'bar'" will return the lexeme foo'bar'.] If your filename is "main::foo.cc", then you must quote that: "break 'main::foo.cc':3".

o As previously, Objective C is short-circuited. I have not changed that at all. Selector names would need quoting to avoid being lexed incorrectly. I haven't pursued this at all (and probably won't).

o All current functionality is maintained AFAICT. I did not add any new functionality that didn't just drop out from doing it this way. [One example: you can now do "break myclass::mymethod::a_label". It "just works."]

o decode_* are almost all gone. All symbol lookups are essentially done by one function (although there is one helper function for it). [Notable exceptions: decode_dollar and decode_objc.]

o Canonicalization (of the linespec) is now pretty trivial and isolated to one function. It is no longer scattered all over the place.

o Error reporting could be greatly expanded. The version I have committed simply maintains the status quo.

o Some linespecs don't work anymore. IMO, these existed because of the nightmare of the existing code. The only group of linespecs that won't work involve "goofy" quoting. For example, you cannot do: "break klass::'operator +'" anymore. Nor can you do "break 'foo.c:static_function'". IMO a small price to pay for sanity.

o I have not written tests for this yet.

o I have not checked memory allocation or other such common problems. My goal was to get something that works.

o There is probably some dead code that can/could now be removed. [cp_validate_operator comes to mind] I have not even begun to look for this.

o Probably lots of cleanup to do. Probably goofed/thinko'd a few things, too, especially in the area of ambiguous linespecs. This is part of the reason for exposing this less-than-beta quality code to other developers and maintainers. I still have comments to myself all over the place (comments with "!!" and #if WHATS_THIS_FOR). Who needs a stinkin' notebook!? I take notes in the code.

Ok, so enough of that. You probably are all anxious to see what I've actually done...

If you will permit me a few more paragraphs of your time, please allow me very briefly describe the underlying idea. Basically, I've split linespec decoding into several main areas: symtabs, symbols, labels, line offsets, SALs, and canonicalization The first four are pretty obvious. We have to find those things. The last two are a rather large departure from the current code. [I won't go so far as to call it a design.]

Previously we constructed SALs as we found symbols. At the same time, we also constructed canonical linespec representations. We now take in a list of parameters and convert them to SALs in one swoop. Then we compute any necessary canonicalizations.

The main routine dealing with SALs is convert_linespec_to_sals. This converts a structure into a list of SALs, which is then returned to the caller. Canonicalization is done by canonicalize_linespec.

The structure which must be "filled-in" for convert_linespec_to_sals is "typedef struct linespec *linespec_t". Start with that. Right now, the only way to fill-in this structure is via the parser, but another function for "explicit" linespecs could fill it in, too, by using some of the new functions that I've introduced.

Ok, so enough chatter. You want to see the actual code.

Here you go: Archer branch "archer-keiths-linespec-rewrite" [duh!]

Please share your comments, concerns, and suggestions. Please help me stop the [linespec] insanity!

Keith

Follow-Ups:
- Re: Stop the Insanity! Linespec Rewrite
  - From: Tom Tromey
- Re: [Archer] Stop the Insanity! Linespec Rewrite
  - From: Joel Brobecker

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]