[SCRIPT] NUMA page fault accounting.

Jose R. Santos jrs@us.ibm.com
Tue Mar 21 20:37:00 GMT 2006


Stone, Joshua I wrote:

>Jose R. Santos wrote:
>>         page_faults [pid(), $write_access ? 1 : 0] ++
>>         node_faults [pid(), addr_to_node($address)] ++
>
>You could improve scalability of this script by using statistics to
>maintain your count, e.g.:
>
>         page_faults [pid(), $write_access ? 1 : 0] <<< 1
>         node_faults [pid(), addr_to_node($address)] <<< 1
>
>And then access the values with @count(page_faults[...]).
>  
>
OK, I will play with this and send a revised script.

>Other than that, this looks good.  It might be nice to start publishing
>case studies on the website, so if you have a real problem that you
>solved with this, please share!
>
>
>Josh
>  
>

This script does not solve a particular problem, it is meant to narrow 
down common issues that we have seen on some of our customer workloads.  
I've been thinking of ways to use SystemTap in our performance area and 
one of the possibilities that I'm currently working with is to come up 
with small scripts which are design to narrow down performance related 
problem.  The inspiration behind this script is that we have had 
multiple cases were a customer has brougt issues with performance of 
their application on our servers which are sometime hard to narrow 
down.  NUMA related issues cause by bad compiler optimization, 
non-optimal code design or Linux kernel issues have appeared more than 
once.  This is the first of what I hope will be many scripts that are 
design with this purpose in mind.

I will also be working on a script which will be called why_idle.stp 
which will be use to determine the reasons why a large system is not 
able to run at 100% CPU capacity.  This is another common problem that 
we have seen here.

Thanks for the comments

-JRS



More information about the Systemtap mailing list