This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: sort a foreach on a stat value?
Hi -
joshua.i.stone wrote:
> [...] One thing I've noticed is that our foreach syntax has
> different semantics than other languages [...]
Indeed, just like in awk, we iterate over indexes rather than values.
> [...]
> foreach ([tid, c=@count-, a=@avg++, h=@hist_log] in mystats)
> [...]
That sort of thing has some promise at abbreviating that excessive
duplication hunt made an example of in bug #2115.
While this does not address sorting, another related syntactical
possibility is to infer a "[idx1, idx2]" suffix on undecorated
occurrences of the indexed array within the body of a foreach:
foreach ([x,y] in thingie)
total += thingie # implied [x,y]
foreach ([x,y,z] in mystats)
printf("%d %d %d", @count(mystats), @sum(mystats), @min(mystats))
The latter could be abbreviated further to "@count, @sum, @min", to
infer the innermost-looped array itself, plus its index tuple.
A later independent optimization could make sure that the translator
does not emit duplicate array-lookup operations within loops.
> [...]
> > foreach (tid in stat) // sort by value -> ???
> > stat_counts[tid] = @count(stat[tid])
> > foreach (tid in stat_counts-)
> > printf("%d: %d\n", tid, stat_counts[tid]) # and/or
> > @avg(stat[tid])) etc. }
> [...]
> This is a passable workaround, yes. The downside is that if stat were
> very large, I would have to fudge with the maxaction counter. If I was
> only interested in maybe the top 20, then with a single loop construct
> it's easy to break out after 20 and not hit the MAXACTION boundary.
Unless I'm mistaken, the current runtime aggregates the whole pmap for
loops/sorting, even if you want just the top 20. This cost will be
fully reflected in activity count (bug #1885) at some point. It is
unlikely to cost much less than the explicit copying loop above.
I wonder if this behavior makes sorting on statistical values
sufficiently inefficient that special syntax is not sufficiently
justified at this point, given that open-coding is possible.
> >> Along the same lines, it would be extremely useful to be able to do
> >> "cascading" sort - i.e. sort by more than one field.
> >
> > [...]
> > foreach ([x1+, x2--, y2+++] in array----) { ... }
>
> That's not a bad suggestion, though I think it's not obvious in which
> order the cascading happens. [...]
I guess we'd pick and document one of the two interpretations.
- FChE