This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v1.2] Improve unaligned memcpy and memmove.
- From: Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>
- To: Ondřej Bílka <neleai at seznam dot cz>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 4 Oct 2013 17:07:35 +0400
- Subject: Re: [PATCH v1.2] Improve unaligned memcpy and memmove.
- Authentication-results: sourceware.org; auth=none
- References: <20130819085220 dot GB19541 at domone> <20130829153829 dot GA6105 at domone dot kolej dot mff dot cuni dot cz> <20131003220926 dot GA12203 at domone dot podge> <CAHjhQ93gDTLC9jh56PPXPf0DndUBxVd371Xpw1+vPM9HVnHHfw at mail dot gmail dot com> <20131004125248 dot GA23055 at domone dot podge>
Can we make "**back" versions clean up in this patch?
Are there any processors still use it?
Atom and core2 uses "***ssse3" version not the "***back" ones.
Do we need to handle these "***back" versions now?
--
Liubov
On Fri, Oct 4, 2013 at 4:52 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Fri, Oct 04, 2013 at 03:14:04PM +0400, Liubov Dmitrieva wrote:
>> I don't understand why you use HAS_SLOW_SSE4_2 flag for Silvermont
>> version. It is supposed to be named as "Fast_Rep" or something like that
>> to make the core feature of the version be clear.
>> There is already HAS_FAST_REP_STRING, maybe it can be reused.
>> --
>> Liubov
>>
> It was simplest way to identify silvermont. It is exceptional that rep
> movsq is faster on L1 cache for sizes more than 4096 bytes. For core2 a
> situation is opposite, rep movsq looks fastest for small sizes (upto 256
> bytes) until ssse3 loop pays itself.
>
> It might make sense to do silvermont specific casing as below.
>
> Or there is second possibility that a switching to rep would be done by
> a processor specific table. For silvermont threshold would be 4096
> bytes.
> On nehalem and ivy bridge a loop is faster when data are in L1 cache,
> nearly identical for L2 cache and by far best possible for L3 cache and
> more so we could use treshold of 65636. On fx10 a rep implementation is
> always slower so we would need to disable it.
>
>
>
> ---
> sysdeps/x86_64/multiarch/init-arch.c | 3 ++-
> sysdeps/x86_64/multiarch/init-arch.h | 6 ++++++
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c
> index 5583961..b80d9f2 100644
> --- a/sysdeps/x86_64/multiarch/init-arch.c
> +++ b/sysdeps/x86_64/multiarch/init-arch.c
> @@ -90,7 +90,8 @@ __init_cpu_features (void)
> __cpu_features.feature[index_Fast_Unaligned_Load]
> |= (bit_Fast_Unaligned_Load
> | bit_Prefer_PMINUB_for_stringop
> - | bit_Slow_SSE4_2);
> + | bit_Slow_SSE4_2
> + | bit_Is_Silvermont);
> break;
>
> default:
> diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h
> index 0cb5f5b..36ec445 100644
> --- a/sysdeps/x86_64/multiarch/init-arch.h
> +++ b/sysdeps/x86_64/multiarch/init-arch.h
> @@ -24,6 +24,8 @@
> #define bit_FMA_Usable (1 << 7)
> #define bit_FMA4_Usable (1 << 8)
> #define bit_Slow_SSE4_2 (1 << 9)
> +#define bit_Is_Silvermont (1 << 10)
> +
>
> /* CPUID Feature flags. */
>
> @@ -64,6 +66,7 @@
> # define index_FMA_Usable FEATURE_INDEX_1*FEATURE_SIZE
> # define index_FMA4_Usable FEATURE_INDEX_1*FEATURE_SIZE
> # define index_Slow_SSE4_2 FEATURE_INDEX_1*FEATURE_SIZE
> +# define index_Is_Silvermont FEATURE_INDEX_1*FEATURE_SIZE
>
> #else /* __ASSEMBLER__ */
>
> @@ -163,6 +166,8 @@ extern const struct cpu_features *__get_cpu_features (void)
> # define index_FMA_Usable FEATURE_INDEX_1
> # define index_FMA4_Usable FEATURE_INDEX_1
> # define index_Slow_SSE4_2 FEATURE_INDEX_1
> +# define index_Is_Silvermont FEATURE_INDEX_1
> +
>
> # define HAS_ARCH_FEATURE(name) \
> ((__get_cpu_features ()->feature[index_##name] & (bit_##name)) != 0)
> @@ -174,5 +179,6 @@ extern const struct cpu_features *__get_cpu_features (void)
> # define HAS_AVX HAS_ARCH_FEATURE (AVX_Usable)
> # define HAS_FMA HAS_ARCH_FEATURE (FMA_Usable)
> # define HAS_FMA4 HAS_ARCH_FEATURE (FMA4_Usable)
> +# define IS_SILVERMONT HAS_ARCH_FEATURE (Is_Silvermont)
>
> #endif /* __ASSEMBLER__ */
> --
> 1.8.4.rc3
>