This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/7] [python] API for macros


On Tue, Aug 30, 2011 at 2:32 AM, Phil Muldoon <pmuldoon@redhat.com> wrote:
> matt rice <ratmice@gmail.com> writes:
>
>> take 2 on the python macro API...
>> In addition to the testsuite, I tested with the last release of
>> gcc following: http://gcc.gnu.org/wiki/DebuggingGCC
>>
>> using variations of the do_stuff function in following script...
>> I didn't save exact timing #'s but it was something like
>> ?2 mins to count all 600+ million macros
>> ?5 mins to filter all by name with macro.name().startswith()
>> ?15 mins to call the str() method (which calls all other methods).
>
> I'm more concerned about having 600 million Python objects lying around
> indefinitely ;) ?(When a macro is invalidated, the Python object still
> has to exist so that the user can still call the APIs (to figure out,
> hey this macro is now invalid.))

yeah... I can't store 600 million valid or invalid macros in memory...
that's about 31.2 gigs of macro objects for x86_64. macro_object = 56 bytes..
(16 for python object overhead + 36 bytes for macro stuff + 4 for
structure alignment)
that's *not* including any of the strings for the actual macro
contents, just the top-level structure..
It can possibly be shrunk by 8 bytes

thus in my scripts at least using gcc as a test case, they have to
iterate over each symtab's macros, and let python deallocate a good
portion of the macros...
e.g. in the adding all TREE_* macros to `a_set', I hold on to a total
of about 300k macros or 16 megs or so...

> But I looked at your code, and to be honest, this area of GDB is brand
> new to me, so I don't feel very qualified to review it. ?I will try. ?My
> main overall concern is removing the macro cache.

I actually didn't remove the macro cache, it's still there, we replaced it
with a pointer to the Objfile, e.g. new_macro_table(objfile)
where we previously had new_macro_table(&objfile->obstack,
objfile->macro_cache);

this is because we rely on being able to do the objfile->data free callbacks
for macro invalidation, because we don't want people creating
objfile-less macros with bcaches and obstacks.

> ?Given that we cannot
> count the numbers before your change (well not easily, as there is no
> real way to script them), I'm a little bit concerned if the above
> numbers are significantly impacted by the removal of the macro cache.

see above about the macro cache.

>> I'd tried doing a deep copy in the gdb.Macro class,
>> to avoid all the objfile/obstack/bcache horse pucky evident in this series,
>> but i killed it before it completed when working with gcc...
>
> Given that macros can be extremely prevalent in some projects, I think a
> deep copy would not be the way to proceed anyway.

yep... it was ignorant hope, because that's the `cleanest` way to get
a macro object implementation
that works with both 'user-defined' and 'from-debug info' macros

>> it's not really timing equivalent to that last 15 minutes case since
>> we use lots more memory having a deep copy of all the macros in a symtab in a
>> list. ?Where the 15 minute version does a deep copy, with only one macro's
>> contents in memory at a time.
>>
>> thus, I think it is useful even for large projects (if used with care.)
>> this will fall over if someone has has way too many
>> macros in a single symtab. ?should that happen we can add a macro_map()
>> that behaves sort of like the python map function.
>
> We should add it now, IMO, instead of waiting for it to fail later. ?I'm
> not sure of the exact requirements for the number of macros in symtab to
> qualify for this case, but given how widely used GDB is, it will fail,
> sooner or later.

k, I'll probably add this as maybe macros(filter=some_filter_func)
(None by default), where if filter returns None, nothing gets appended
to the list/tuple/to be determined
and if it returns an object that object gets added to the
list/tuple/to be determined

after thinking about this some more it needs to be done with extreme care,
to avoid the possibility of the user causing any lookups while iterating.

As well as not being reentrant, the iterators can be foiled by any
function which causes a macro lookup in the same table... the current
returning a complete list approach doesn't suffer from this issue.

even if we can avoid this on gdb.Macro methods, if 'some_filter_func'
calls macros(some_other_filter_func), or gdb.execute ("info macro"),
or expand a macro,
those will mess up the initial call to macro_foreach*, causing either
incorrect results or possibly an infinite loop.  because the nature of
a splay tree lookup causes modifications to the splay tree itself.

>> I think a list is the most straight forward approach for general usage,
>> and has been shown to work even with projects that use macros extensively.
>
> You did not note your machine specifications, btw.

amd64 phenom II 3ghz (3 cores... but its single threaded so that doesn't matter)
4G of cheap ram, nothing spectacular

>> With regards to the hash/compare methods, the implementation of those
>> is up for debate, I see at least 3 valid ways to compare them and have only one
>> comparison function. ?right now I have it compare deeply e.g. including the whole include_trail
>> other options that seem valid, compare the name,args,defininition,
>> and compare the name,args,definition and partial include_trail
>> (just definition location) since they pretty much are equal before expansion,
>> and expansion is an 'expression' thing.
>
> I'd rather correct hash uniqueness over comparable performance here.

Ahh... the problem is 'correct hash uniqueness' is situation dependent,
The above have little to do with performance,

I've attached a script which shows the only way I could figure out to
do a custom
macro comparison with the current gdb.Macro nomacros.c is just a file
with a single function named 'gimmie_a_text_section'
no #includes, no macros defined.

because the compiler itself defines a bunch of macros, I use a file
with no #includes and no macros a baseline,
 and put the macros for 'nomacros.c' into the baseline_macros set.
then subtract that set from the current list of macros,  getting only
the actual #included and #define'd macros... including the
include_trail in this causes those macros to compare as !=.

thus the script needs to provide a comparison that only compares macro
name/args/contents.
I wasn't able to figure out how to a) subclass gdb.Macro, or b) monkey
patch the gdb.Macro comparison function

>> There are some implementation quirks described in the documentation,
>> some of these are so in the future we can add a gdb.UserMacro which does
>> a deep-copy on initialization, I wasn't going to add that unless someone
>> requests it. ?Python doesn't seem to have any form of java's `interface',
>> or abstract base classing at least within our version range,
>> we could also hack up something ugly in the future to avoid this documentation,
>> (read union'ify the class implementation,
>> ?or make gdb.Macro have a pointer to a gdb.ObjfileMacro
>> ?and do the lambda macro: macro.method() inside gdb.Macro)
>> I'd personally just leave it there to give us future choice on the matter.
>> we can always remove that from the docs if we implement it in a way
>
>
> I really think we ought to do this now, not what you wanted to hear, I
> know. ?But I think it would be genuinely useful. ?No hacks, though! ;)

hrm, I'll have to think about how to do it then, as quickly thinking
about it i discovered
i'd only considered user-defined macros from the 'how user-defined
macros affect the from-debuginfo macros API'

I don't think there is a way to do it with the 'No Hacks'
qualification, without rewriting how gdb/macrotab does stuff.

again the choice currently is:
a) make user defined and from-debuginfo objects 2 different kinds of
objects with seperate implementations.
and document them.

b) do a) but add an interim object, to rid ourselves of the python api
limitations of a)

struct macro_object {
  PyObject_HEAD
  PyObject *user_defined_or_debuginfo_macro_object;
};

this adds (16 + 8 = 24 bytes), to each macro, 8 bytes short of 1/2 the
size of the current macro_object.

c) unionize the implementation, and deal with the consequences in the
macro_object methods.
struct macro_object {
    union user_defined_or_from_debuginfo_macro_object_contents *macro;
    int macro_kind:1;
};

I'm not yet sure the affect of this on the size of 'from-debuginfo'
macro objects yet.
but from the code side, it means we to dual implement each method
based on 'kind'.

d) fix macrotab.c so we can use the same API for user-defined and
from-debuginfo macros.
(this then affects all macrotab usage everywhere, and may not really
be possible)...

---

the latter option here is obviously the way to go, it's also the
largest and most intrusive of the options.
In my current patch, I've left all these options open, documented 'a)'
as a possibility,
but nowhere do we exercise this so we aren't stuck with this API
limitation forever (in the python script writing side).

a) is an option i'd prefer to avoid at all costs, but has the least
effect on the gdb implementation side.
b) is the cleanest to implement but I hate to increase the size of
each macro object by that much. e.g. our 31G of 600 mil macros becomes
44.7
c) is going to be ugly in the python macro object implementation,
and may/may not also increase the macro object size, the `ugly' bit
has kept me from figuring out the
affect on macro_object structure size..  i'd hazard guess that we'd
break even or add a pointer, or fill up the unused space used for
structure alignment...
d) is just going to be a large project...

i'd prefer to avoid a, b, and c and the only way to avoid it is to
implement d...
I'll probably spend a little time seeing if we can't implement c, in a
moderately acceptable way,
if it proves possible to get a smaller 'from-debuginfo' macro object
than with b)

---
more about d)....

the problem with d) is figuring out a way to do macro invalidation on
user-defined macros,
gdb.Macro does validation on a per-macro table basis, all macros in
the same macro table are invalidated at the same time.

with user-defined macros any macro can be invalidated at any time.
currently there's no code in place to do invalidation of user-defined
macros, we don't have anything like the objfile->data free
callback....
thus, the short-term plan was to do a deep copy, this keeps the macro
from being xfree'd...
to check if it is valid, do a lookup of the macro by its name and see
if the macro and the looked up macro compare equally.  == is a valid
macro != is an invalid.

user defined macros also need additional API for
defining/redefining/undefining, where 'from-debuginfo' macros do
not... I imagine these will not be methods of a gdb.Macro object
though.

anyhow... figuring out the best way to unify (or if the best way is to
not unify) these two similar but different things from the python API
perspective is something i'd really hoped to avoid... not because it
requires work, but because I don't want to set in stone any
limitations which we may be able to get rid of in the future, but
still be stuck with from a python perspective.

the very best we can hope to achieve is a gdb.Macro object which is
the same for user-defined and from-debuginfo macros, and seperate
implementations of gdb.MacroValidator and
gdb.UserDefinedMacroValidator.  so i need to figure out the best way
to achieve this in macrotab.c without adversely affecting
from-debuginfo macros.

sorry for the length.

Attachment: macro.gdb
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]