This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v1.2] Improve unaligned memcpy and memmove.


On Fri, Oct 04, 2013 at 03:14:04PM +0400, Liubov Dmitrieva wrote:
>    I don't understand why you use HAS_SLOW_SSE4_2 flag for Silvermont
>    version. It is supposed to be named as "Fast_Rep" or something like that
>    to make the core feature of the version be clear.
>    There is already HAS_FAST_REP_STRING, maybe it can be reused.
>    --
>    Liubov
> 
It was simplest way to identify silvermont. It is exceptional that rep
movsq is faster on L1 cache for sizes more than 4096 bytes. For core2 a
situation is opposite, rep movsq looks fastest for small sizes (upto 256
bytes) until ssse3 loop pays itself.

It might make sense to do silvermont specific casing as below.

Or there is second possibility that a switching to rep would be done by
a processor specific table. For silvermont threshold would be 4096
bytes.
On nehalem and ivy bridge a loop is faster when data are in L1 cache,
nearly identical for L2 cache and by far best possible for L3 cache and
more so we could use treshold of 65636. On fx10 a rep implementation is
always slower so we would need to disable it.



---
 sysdeps/x86_64/multiarch/init-arch.c | 3 ++-
 sysdeps/x86_64/multiarch/init-arch.h | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c
index 5583961..b80d9f2 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -90,7 +90,8 @@ __init_cpu_features (void)
 	      __cpu_features.feature[index_Fast_Unaligned_Load]
 		|= (bit_Fast_Unaligned_Load
 		    | bit_Prefer_PMINUB_for_stringop
-		    | bit_Slow_SSE4_2);
+		    | bit_Slow_SSE4_2
+		    | bit_Is_Silvermont);
 	      break;
 
 	    default:
diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h
index 0cb5f5b..36ec445 100644
--- a/sysdeps/x86_64/multiarch/init-arch.h
+++ b/sysdeps/x86_64/multiarch/init-arch.h
@@ -24,6 +24,8 @@
 #define bit_FMA_Usable			(1 << 7)
 #define bit_FMA4_Usable			(1 << 8)
 #define bit_Slow_SSE4_2			(1 << 9)
+#define bit_Is_Silvermont		(1 << 10)
+
 
 /* CPUID Feature flags.  */
 
@@ -64,6 +66,7 @@
 # define index_FMA_Usable		FEATURE_INDEX_1*FEATURE_SIZE
 # define index_FMA4_Usable		FEATURE_INDEX_1*FEATURE_SIZE
 # define index_Slow_SSE4_2		FEATURE_INDEX_1*FEATURE_SIZE
+# define index_Is_Silvermont		FEATURE_INDEX_1*FEATURE_SIZE
 
 #else	/* __ASSEMBLER__ */
 
@@ -163,6 +166,8 @@ extern const struct cpu_features *__get_cpu_features (void)
 # define index_FMA_Usable		FEATURE_INDEX_1
 # define index_FMA4_Usable		FEATURE_INDEX_1
 # define index_Slow_SSE4_2		FEATURE_INDEX_1
+# define index_Is_Silvermont		FEATURE_INDEX_1
+
 
 # define HAS_ARCH_FEATURE(name) \
   ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
@@ -174,5 +179,6 @@ extern const struct cpu_features *__get_cpu_features (void)
 # define HAS_AVX			HAS_ARCH_FEATURE (AVX_Usable)
 # define HAS_FMA			HAS_ARCH_FEATURE (FMA_Usable)
 # define HAS_FMA4			HAS_ARCH_FEATURE (FMA4_Usable)
+# define IS_SILVERMONT			HAS_ARCH_FEATURE (Is_Silvermont)
 
 #endif	/* __ASSEMBLER__ */
-- 
1.8.4.rc3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]