[SCRIPT] NUMA page fault accounting.
Jose R. Santos
jrs@us.ibm.com
Tue Mar 21 20:37:00 GMT 2006
Stone, Joshua I wrote:
>Jose R. Santos wrote:
>> page_faults [pid(), $write_access ? 1 : 0] ++
>> node_faults [pid(), addr_to_node($address)] ++
>
>You could improve scalability of this script by using statistics to
>maintain your count, e.g.:
>
> page_faults [pid(), $write_access ? 1 : 0] <<< 1
> node_faults [pid(), addr_to_node($address)] <<< 1
>
>And then access the values with @count(page_faults[...]).
>
>
OK, I will play with this and send a revised script.
>Other than that, this looks good. It might be nice to start publishing
>case studies on the website, so if you have a real problem that you
>solved with this, please share!
>
>
>Josh
>
>
This script does not solve a particular problem, it is meant to narrow
down common issues that we have seen on some of our customer workloads.
I've been thinking of ways to use SystemTap in our performance area and
one of the possibilities that I'm currently working with is to come up
with small scripts which are design to narrow down performance related
problem. The inspiration behind this script is that we have had
multiple cases were a customer has brougt issues with performance of
their application on our servers which are sometime hard to narrow
down. NUMA related issues cause by bad compiler optimization,
non-optimal code design or Linux kernel issues have appeared more than
once. This is the first of what I hope will be many scripts that are
design with this purpose in mind.
I will also be working on a script which will be called why_idle.stp
which will be use to determine the reasons why a large system is not
able to run at 100% CPU capacity. This is another common problem that
we have seen here.
Thanks for the comments
-JRS
More information about the Systemtap
mailing list