This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] Improve strcpy: Faster unaligned loads.


On Mon, Sep 09, 2013 at 04:59:21PM -0400, Carlos O'Donell wrote:
> On 09/09/2013 12:11 PM, OndÅej BÃlka wrote:
> > This is actual implmentation. We use optimized header that makes calls
> > around 50 cycles faster for nehalem and ivy bridge.
> > 
> > Currently this improves strcpy, stpcpy, ctrcat I keep old implementation
> > of strncpy/strncat.
> > 
> > A header that I use improves speed by 10% on most processors for gcc
> > workload. Separate loops that use ssse3/shifts are needed as this
> > implemenation is slower on large sizes for processors without fast
> > unaligned loads.
> >
> > Results were obtained by following benchmark:
> > 
> > http://kam.mff.cuni.cz/~ondra/benchmark_string/strcpy_profile.html
> > http://kam.mff.cuni.cz/~ondra/benchmark_string/strcpy_profile90913.tar.bz2
>  
> The benchmark numbers are great. I appreciate you running the various
> tests against the old, new, and sse3 implementations.
> 
> Does the glibc microbenchmark show a performance increase also or are
> we still lacking the requisite framework to measure these changes?
>
There are several areas lacking, one not calling function in tight loop
to take effect of branch prediction into account. Numbers from
benchtests tend to be off by large amount due lack of randomization.

For example a strcpy-ssse3.S handles first 16 bytes with following code:

  cmpb  $0, (%rcx)
  jz  L(Exit1)
  cmpb  $0, 1(%rcx)
  jz  L(Exit2)
  cmpb  $0, 2(%rcx)
  jz  L(Exit3)
  cmpb  $0, 3(%rcx)
  jz  L(Exit4)
  cmpb  $0, 4(%rcx)
  jz  L(Exit5)
  cmpb  $0, 5(%rcx)
  jz  L(Exit6)
  cmpb  $0, 6(%rcx)
  jz  L(Exit7)
  cmpb  $0, 7(%rcx)
  jz  L(Exit8)
 ...

when size varies then it will degrade performance but benchmarks do not
catch this.

There is other problem how to compare old and new implementations. You
have two tables of results before and after and now you want to compare
them.

This is design problem main use case of benchmarks is comparing
implementations it should be easy to do without having do tedious tasks
like adding functions to makefile, ifunc-impl-list, renaming them,
recompiling libc and running benchmarks.

old

                       	simple_strcpy	__strcpy_ssse3	__strcpy_sse2_unaligned	__strcpy_sse2
Length    0, alignments in bytes  0/ 0:	10.9062	11.1562	16.6719	15.5312
Length    0, alignments in bytes  0/ 0:	10.0156	10.2969	17.7031	14.7812
Length    0, alignments in bytes  0/ 0:	9.8125	10.3906	16.5781	14.5938
Length    0, alignments in bytes  0/ 0:	9.67188	9.25	20.4531	14.7812
Length    1, alignments in bytes  0/ 0:	11.375	11.0469	15.625	19.7812
Length    1, alignments in bytes  0/ 0:	13.8281	10.3906	15.875	15.2969
Length    1, alignments in bytes  0/ 1:	12.7969	9.96875	15.875	15.7812
Length    1, alignments in bytes  1/ 0:	17.2812	11.0938	15.9219	16.625
Length    2, alignments in bytes  0/ 0:	14.6406	13.9219	16.9062	20.6875
Length    2, alignments in bytes  0/ 0:	15.7812	13.7812	16.5312	19.5469
Length    2, alignments in bytes  0/ 2:	14.6875	13.375	16	19.3125
Length    2, alignments in bytes  2/ 0:	18.1875	16.4219	15.5781	20.2031
Length    3, alignments in bytes  0/ 0:	17.9375	13.4062	16	22.4844
Length    3, alignments in bytes  0/ 0:	18.9844	12.9375	16.375	20.7812
Length    3, alignments in bytes  0/ 3:	16.625	12.375	16.7188	20.6719
Length    3, alignments in bytes  3/ 0:	18.2812	12.125	16.2344	23.1406
Length    4, alignments in bytes  0/ 0:	18.8906	16.8125	17.5312	25.5469
Length    4, alignments in bytes  0/ 0:	20.0625	16.4375	17.4219	23.7969
Length    4, alignments in bytes  0/ 4:	22.4375	16.1875	17.2812	24.125
Length    4, alignments in bytes  4/ 0:	21.4844	15.9219	16.1406	30.7969
Length    5, alignments in bytes  0/ 0:	20.3906	19.7812	18.3281	27.7656
Length    5, alignments in bytes  0/ 0:	22.8125	17.4688	20.7188	26.125
Length    5, alignments in bytes  0/ 5:	23.3281	17.7969	20.9219	25.6406
Length    5, alignments in bytes  5/ 0:	22.6094	18.0781	16.3906	31.1094
Length    6, alignments in bytes  0/ 0:	23.4688	19.7969	20.9688	32.1094
Length    6, alignments in bytes  0/ 0:	24.3281	18.7031	20.9688	28.7031
Length    6, alignments in bytes  0/ 6:	24.8906	22.2344	18	29.2812
Length    6, alignments in bytes  6/ 0:	24.6406	19.3594	20.7812	33.5312
Length    7, alignments in bytes  0/ 0:	25.3594	22.3438	16.5312	30.6406
Length    7, alignments in bytes  0/ 0:	27.1562	19.9219	19.7344	28.7188
Length    7, alignments in bytes  0/ 7:	26.6719	19.7344	16.1406	29.0469
Length    7, alignments in bytes  7/ 0:	27.1094	19.3125	15.625	34.3281
Length    8, alignments in bytes  0/ 0:	26.2031	23.5156	16.6719	21.0156
Length    8, alignments in bytes  0/ 0:	27.6719	23.1406	16.2969	19.6406
Length    8, alignments in bytes  0/ 0:	28.2031	21.9531	17.8438	19.2656
Length    8, alignments in bytes  0/ 0:	28	21.9531	19.4062	19.2188
Length    9, alignments in bytes  0/ 0:	28.25	23.9375	17.8906	21.3906
Length    9, alignments in bytes  0/ 0:	30.2656	24.0781	20.5938	21.3906
Length    9, alignments in bytes  0/ 1:	30.125	23.8906	18.7031	20.2969
Length    9, alignments in bytes  1/ 0:	29.375	23.8906	17.7969	44.5312
Length   10, alignments in bytes  0/ 0:	31.3125	25.5938	17.8438	33
Length   10, alignments in bytes  0/ 0:	30.4062	25.5	18.4219	24.8438
Length   10, alignments in bytes  0/ 2:	31.6406	25.7344	17.9375	23.4219
Length   10, alignments in bytes  2/ 0:	30.3125	25.0312	18.7031	46.8906
Length   11, alignments in bytes  0/ 0:	33.7656	27.5312	18.375	27.1562
Length   11, alignments in bytes  0/ 0:	33.3438	27.1562	18.8438	25.9844
Length   11, alignments in bytes  0/ 3:	34.0938	27.2031	19.75	32.5938
Length   11, alignments in bytes  3/ 0:	34.0938	26.2969	17.375	48.1094
Length   12, alignments in bytes  0/ 0:	35.3281	30.1719	16.6719	32.9219
Length   12, alignments in bytes  0/ 0:	35.125	29.9844	17.4688	29.1875
Length   12, alignments in bytes  0/ 4:	58.6562	31.1719	17.625	28.6719
Length   12, alignments in bytes  4/ 0:	34.4219	30.8906	17.5312	34.1406
Length   13, alignments in bytes  0/ 0:	37.8281	30.8438	17.6094	31.5
Length   13, alignments in bytes  0/ 0:	36.4531	31.125	17.9531	30.3125
Length   13, alignments in bytes  0/ 5:	37.3594	30.8281	17.8438	30.2188
Length   13, alignments in bytes  5/ 0:	36.3125	30.7969	18.4062	37.1719
Length   14, alignments in bytes  0/ 0:	40.3281	34.6562	17.2344	38.0625
Length   14, alignments in bytes  0/ 0:	38.7188	33.7656	18.3281	33.8125
Length   14, alignments in bytes  0/ 6:	37.7344	34.3906	17.4688	33.9531
Length   14, alignments in bytes  6/ 0:	38.4375	33.9531	18.4219	47.2188
Length   15, alignments in bytes  0/ 0:	72.0625	38.9531	16.4844	35.9375
Length   15, alignments in bytes  0/ 0:	41.7031	34.1875	16.1406	34.1406
Length   15, alignments in bytes  0/ 7:	40.9375	34.1875	16.625	33.375
Length   15, alignments in bytes  7/ 0:	41.4219	34.3281	16.5781	40.375
Length   16, alignments in bytes  0/ 0:	42.8281	43.9688	18.7031	25.6406
Length   16, alignments in bytes  7/ 2:	41.8906	51.5625	18	40.3281
Length   32, alignments in bytes  0/ 0:	111.453	61.625	24.4219	33.7188
Length   32, alignments in bytes  6/ 4:	109.797	65.7812	30.6406	50.5312
Length   64, alignments in bytes  0/ 0:	168.453	54.6875	28.4844	77.25
Length   64, alignments in bytes  5/ 6:	188.125	114.562	28.75	64.7969
Length  128, alignments in bytes  0/ 0:	327.062	73.0938	50.0938	79.5156
Length  128, alignments in bytes  4/ 0:	326.406	103.078	48.3125	101.391
Length  256, alignments in bytes  0/ 0:	642.516	88.6875	65.5469	240.547
Length  256, alignments in bytes  3/ 2:	642.469	118.109	81.7031	163.25
Length  512, alignments in bytes  0/ 0:	1271.03	121.078	105.812	271.234
Length  512, alignments in bytes  2/ 4:	1307.97	172.875	105.969	298.734
Length 1024, alignments in bytes  0/ 0:	2530.12	184.406	167.969	528.172
Length 1024, alignments in bytes  1/ 6:	2684.86	254.672	170.094	593.922
Length   16, alignments in bytes  1/ 2:	45.1406	50.4375	18.6094	46.2812
Length   16, alignments in bytes  2/ 1:	42.2188	47.375	17.5625	47.0312
Length   16, alignments in bytes  1/ 1:	42.7812	51.6094	16.625	45.9062
Length   16, alignments in bytes  1/ 1:	43.2969	42.9219	16.5781	45.1875
Length   32, alignments in bytes  2/ 4:	125	72.7656	28.6094	54.2188
Length   32, alignments in bytes  4/ 2:	110.547	63.9375	27.1562	50.7656
Length   32, alignments in bytes  2/ 2:	109.359	52.6562	28.0938	52.1719
Length   32, alignments in bytes  2/ 2:	108.984	51.9062	27.5312	51.4688
Length   64, alignments in bytes  3/ 6:	182.375	81.75	30.0312	69.5625
Length   64, alignments in bytes  6/ 3:	263.078	79.2812	29.75	66.6719
Length   64, alignments in bytes  3/ 3:	169.625	59.9219	28.0469	67.4375
Length   64, alignments in bytes  3/ 3:	263.312	57.9375	28.3438	66.5781
Length  128, alignments in bytes  4/ 0:	324.844	101.672	48.2188	97.3125
Length  128, alignments in bytes  0/ 4:	345.375	97.6094	65.6875	82.0156
Length  128, alignments in bytes  4/ 4:	538.766	80.1406	48.4062	100.109
Length  128, alignments in bytes  4/ 4:	327.578	78.2969	64.7969	98.0312
Length  256, alignments in bytes  5/ 2:	1040.11	126.641	66.3438	163.812
Length  256, alignments in bytes  2/ 5:	667.625	125	71.6719	169.203
Length  256, alignments in bytes  5/ 5:	641.609	99.3594	81.5156	260.047
Length  256, alignments in bytes  5/ 5:	640.562	96.4219	70.4531	160.031
Length  512, alignments in bytes  6/ 4:	1270.94	169.25	107.094	292.078
Length  512, alignments in bytes  4/ 6:	1308.86	174.625	120.422	296.234
Length  512, alignments in bytes  6/ 6:	1270.33	134.203	97.1875	289.094
Length  512, alignments in bytes  6/ 6:	1270.38	129.953	102.328	472.172
Length 1024, alignments in bytes  7/ 6:	2530.78	254.859	168.594	543.109
Length 1024, alignments in bytes  6/ 7:	2709.52	267.703	166.172	964.609
Length 1024, alignments in bytes  7/ 7:	2529.83	194.406	169.719	883.391
Length 1024, alignments in bytes  7/ 7:	2529.64	192.812	159.281	539.609

new

                       	simple_strcpy	__strcpy_ssse3	__strcpy_sse2_unaligned	__strcpy_sse2
Length    0, alignments in bytes  0/ 0:	10.8125	22.1406	21.8125	19.2656
Length    0, alignments in bytes  0/ 0:	9.96875	21.0156	21.0156	17.5156
Length    0, alignments in bytes  0/ 0:	10.4844	20.9688	25.0781	14.5938
Length    0, alignments in bytes  0/ 0:	9.96875	20.6406	25.125	15.4531
Length    1, alignments in bytes  0/ 0:	17.6094	27.3438	27.0156	20.4062
Length    1, alignments in bytes  0/ 0:	13.5156	25.2656	24.5938	16.2969
Length    1, alignments in bytes  0/ 1:	16.9062	24.5	25.4062	19.6875
Length    1, alignments in bytes  1/ 0:	14.8281	24.5625	25.125	20.5469
Length    2, alignments in bytes  0/ 0:	14.9219	24.3594	24.3594	26.7812
Length    2, alignments in bytes  0/ 0:	15.7656	24.8906	24.4219	25.4062
Length    2, alignments in bytes  0/ 2:	14.9688	24.7031	25.5469	19.125
Length    2, alignments in bytes  2/ 0:	20.9219	24.7031	25.0156	28.2344
Length    3, alignments in bytes  0/ 0:	17.1406	26.4844	26.1562	30.7969
Length    3, alignments in bytes  0/ 0:	16.7656	25.0312	25.4062	20.6875
Length    3, alignments in bytes  0/ 3:	17.5625	25.2188	25.9688	27.6719
Length    3, alignments in bytes  3/ 0:	16.7656	25.0312	25.4062	31.5469
Length    4, alignments in bytes  0/ 0:	18.4688	24.8906	24.3125	24.9375
Length    4, alignments in bytes  0/ 0:	20.4062	24.5156	24.75	32.8281
Length    4, alignments in bytes  0/ 4:	17.7188	24.5625	24.7812	24.4531
Length    4, alignments in bytes  4/ 0:	21.4375	25.3594	24.3125	38.8125
Length    5, alignments in bytes  0/ 0:	21.2031	25.125	24.7344	27.1094
Length    5, alignments in bytes  0/ 0:	23.0469	24.5625	25.0312	32.5938
Length    5, alignments in bytes  0/ 5:	32.4844	24.8438	24.7969	25.7344
Length    5, alignments in bytes  5/ 0:	22.3906	24.6562	25.2188	30.5938
Length    6, alignments in bytes  0/ 0:	23.8906	25.6406	24.5938	29.9844
Length    6, alignments in bytes  0/ 0:	37.875	25.1562	24.9375	29.375
Length    6, alignments in bytes  0/ 6:	25.0312	24.8438	24.7969	29.7031
Length    6, alignments in bytes  6/ 0:	37.4375	25.2656	25.2656	33.9062
Length    7, alignments in bytes  0/ 0:	25.3125	27.5781	27.2031	38.1562
Length    7, alignments in bytes  0/ 0:	26.25	25.2188	25.4531	28.9844
Length    7, alignments in bytes  0/ 7:	42.7812	25.0312	25.2656	28.75
Length    7, alignments in bytes  7/ 0:	26.6719	25.0781	24.7812	35.5156
Length    8, alignments in bytes  0/ 0:	28.1406	24.8906	25.125	27.0625
Length    8, alignments in bytes  0/ 0:	28.3906	24.75	25.2656	25.4062
Length    8, alignments in bytes  0/ 0:	28.4844	24.7344	24.6562	19.6406
Length    8, alignments in bytes  0/ 0:	26.5	24.8906	24.8438	20.0312
Length    9, alignments in bytes  0/ 0:	51.5156	24.6875	24.8906	33.0625
Length    9, alignments in bytes  0/ 0:	28.9844	24.75	25.2188	20.875
Length    9, alignments in bytes  0/ 1:	29.9844	24.8438	24.9375	20.8281
Length    9, alignments in bytes  1/ 0:	28.9062	25.0781	25.0625	60.8281
Length   10, alignments in bytes  0/ 0:	30.7969	25.4062	24.5469	32.3594
Length   10, alignments in bytes  0/ 0:	31.9688	24.9375	24.5	23.75
Length   10, alignments in bytes  0/ 2:	32.5469	24.9375	25.3125	23.5625
Length   10, alignments in bytes  2/ 0:	31.3438	24.8906	25.3594	61.8125
Length   11, alignments in bytes  0/ 0:	32.5938	24.7969	24.8906	28.2344
Length   11, alignments in bytes  0/ 0:	34.2344	24.9375	25.0781	25.7812
Length   11, alignments in bytes  0/ 3:	34.2344	24.8438	24.5938	25.8281
Length   11, alignments in bytes  3/ 0:	33.4375	24.9844	25.5	61.4844
Length   12, alignments in bytes  0/ 0:	35.8438	24.7031	24.5938	36.7344
Length   12, alignments in bytes  0/ 0:	33.6719	25.2656	24.9844	36.8906
Length   12, alignments in bytes  0/ 4:	34.6562	27.3438	26.625	34.0938
Length   12, alignments in bytes  4/ 0:	34.7969	26.0625	25.9688	44.3438
Length   13, alignments in bytes  0/ 0:	40	24.9844	24.9375	40.1875
Length   13, alignments in bytes  0/ 0:	37.4531	24.9375	25.4531	30.5
Length   13, alignments in bytes  0/ 5:	36.9844	27.4375	26.5312	31.875
Length   13, alignments in bytes  5/ 0:	62	25.5938	25.9688	37.5469
Length   14, alignments in bytes  0/ 0:	40.9531	24.75	25.2188	35.8438
Length   14, alignments in bytes  0/ 0:	38.3438	24.3594	25.0312	33.5156
Length   14, alignments in bytes  0/ 6:	38.8125	27.4375	26.6406	34.6562
Length   14, alignments in bytes  6/ 0:	39.625	25.9688	25.4531	38.5781
Length   15, alignments in bytes  0/ 0:	42.4062	26.4531	26.1094	35.2188
Length   15, alignments in bytes  0/ 0:	39.7031	25.8906	24.7812	33.7656
Length   15, alignments in bytes  0/ 7:	41.1719	28.2344	27.25	33.1406
Length   15, alignments in bytes  7/ 0:	39.1406	25.2656	24.9375	40.3281
Length   16, alignments in bytes  0/ 0:	43.6406	27.5781	28.1094	26.1562
Length   16, alignments in bytes  7/ 2:	42.2188	25.9219	25.3125	39.625
Length   32, alignments in bytes  0/ 0:	113.094	29.7969	27.7656	47.4062
Length   32, alignments in bytes  6/ 4:	111.25	26.3906	26.6875	50.2031
Length   64, alignments in bytes  0/ 0:	168.969	56.0469	42.2656	49.1562
Length   64, alignments in bytes  5/ 6:	262.359	43.7344	44.0156	96.4688
Length  128, alignments in bytes  0/ 0:	539.891	57.9844	65.7344	80.2344
Length  128, alignments in bytes  4/ 0:	325.312	56.2969	50.1875	160.422
Length  256, alignments in bytes  0/ 0:	641.75	73.7031	65.4531	145.016
Length  256, alignments in bytes  3/ 2:	639.953	81.0312	63.7969	268.312
Length  512, alignments in bytes  0/ 0:	1271.36	106.719	94.7656	442.656
Length  512, alignments in bytes  2/ 4:	2048.22	175.562	128.969	301.703
Length 1024, alignments in bytes  0/ 0:	2528.8	169.469	189.734	528.031
Length 1024, alignments in bytes  1/ 6:	4054.08	207.219	234.984	991.812
Length   16, alignments in bytes  1/ 2:	45.6719	29.1875	28.7656	45.0469
Length   16, alignments in bytes  2/ 1:	41.1719	25.9219	25.2188	47.0781
Length   16, alignments in bytes  1/ 1:	42.6875	27.9062	26.6719	46
Length   16, alignments in bytes  1/ 1:	43.3594	26.8281	27.3438	46
Length   32, alignments in bytes  2/ 4:	170.047	30.9219	30.75	74.9531
Length   32, alignments in bytes  4/ 2:	92.3594	27.1406	25.6875	51.6562
Length   32, alignments in bytes  2/ 2:	93.5	28.6719	27.625	51.75
Length   32, alignments in bytes  2/ 2:	160.938	27.9531	27.7188	74.2344
Length   64, alignments in bytes  3/ 6:	181.391	42.8281	54.4062	67.5312
Length   64, alignments in bytes  6/ 3:	169.016	54.9688	43.0625	97.75
Length   64, alignments in bytes  3/ 3:	169.531	41.7812	42.5938	68.5625
Length   64, alignments in bytes  3/ 3:	268.641	53.2656	41.5	69.4688
Length  128, alignments in bytes  4/ 0:	325.5	56.625	60.4375	99.3125
Length  128, alignments in bytes  0/ 4:	346.656	67.8594	59.4062	81.2188
Length  128, alignments in bytes  4/ 4:	325.969	81.2188	49.875	99.8281
Length  128, alignments in bytes  4/ 4:	325.641	78.6719	48.7812	97.8906
Length  256, alignments in bytes  5/ 2:	640.562	113.766	65.125	161.547
Length  256, alignments in bytes  2/ 5:	663.469	90.9531	83.8125	272.234
Length  256, alignments in bytes  5/ 5:	641.562	76.4062	63.9844	161.453
Length  256, alignments in bytes  5/ 5:	640.469	75.375	63.1875	256.703
Length  512, alignments in bytes  6/ 4:	1270.47	118.625	97.6562	290.422
Length  512, alignments in bytes  4/ 6:	1307.3	132.281	129.672	296.125
Length  512, alignments in bytes  6/ 6:	1270.56	142.844	96.3438	472.641
Length  512, alignments in bytes  6/ 6:	1270.17	142.938	96.0469	288.047
Length 1024, alignments in bytes  7/ 6:	2529.41	194.453	154.984	544.938
Length 1024, alignments in bytes  6/ 7:	2715.66	205.328	235.453	584.797
Length 1024, alignments in bytes  7/ 7:	4055.83	170.328	149.453	542.391
Length 1024, alignments in bytes  7/ 7:	2529.73	221.656	150.031	881.031


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]