This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: PR11334 progress update
- From: Serguei Makarov <smakarov at redhat dot com>
- To: systemtap at sourceware dot org
- Date: Fri, 19 Oct 2012 17:44:52 -0400 (EDT)
- Subject: Re: PR11334 progress update
As an add-on to the notes I just posted, here is a brief 20,000-foot overview of what the final regex feature should work like internally:
- We traverse all uses of the =~ operator, building a global table of regexes for the script. (The same regex being used in more than one place is represented by just one table entry.)
- A slightly tweaked version of re2c is run to emit a state machine (in the form of a C helper function with a bunch of switch()es and gotos) that matches the expression. The name of the function is recorded in the table under the corresponding regex.
- When translating uses of =~, emit a call to the corresponding helper function.
- If we need to grab subexpressions of a match, the helper function saves a regmatch-like array in the probe context. A tapset defines the matched() functions, which are written in embedded-C and query this array in straightforward fashion for coordinates, then extract the appropriate substring.
- Serguei