Cui,Lili (1): x86: Update -mtune=alderlake H.J. Lu (53): op_by_pieces_d::run: Change a while loop to a do-while loop Generate offset adjusted operation for op_by_pieces operations Don't use nullptr return from simplify_gen_subreg Update alignment_for_piecewise_move Elide expand_constructor if move by pieces is preferred x86: Remove MAX_BITSIZE_MODE_ANY_INT Add a target calls hook: TARGET_PUSH_ARGUMENT x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast x86: Add vec_duplicate expander Don't use vec_duplicate on vector in CTOR expansion libffi/x86: Always check __x86_64__ for x86 hosts x86: Don't issue vzeroupper if callee returns AVX register x86: Don't return hard register when LRA is in progress x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register Add QI vector mode support to by-pieces for memset x86: Add TARGET_GEN_MEMSET_SCRATCH_RTX x86: Avoid stack realignment when copying data x86: Update piecewise move and store x86: Add AVX2 tests for PR middle-end/90773 x86: Add tests for piecewise move and store x86: Also pass -mno-avx to pr72839.c x86: Also pass -mno-avx to cold-attribute-1.c x86: Also pass -mno-avx to sw-1.c for ia32 x86: Update gcc.target/i386/incoming-11.c x86: Also pass -mno-sse to vect8-ret.c x86: Use XMM31 for scratch SSE register by_pieces: Pass MAX_PIECES to op_by_pieces_d x86: Update STORE_MAX_PIECES x86: Avoid stack realignment when copying data with SSE register x86: Broadcast from integer to a pseudo vector register x86: Add non-destructive source to @xorsign3_1 x86: Update -mtune=tremont x86: Update memcpy/memset inline strategies for -mtune=tremont x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY x86: Add gcc.target/i386/pr103205-2.c Check optab before transforming atomic bit test and operations Add a missing return when transforming atomic bit test and operations pr103194-5.c: Replace long with int64_t x86: Add testcases for PR target/80566 x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECES Add TARGET_IFUNC_REF_LOCAL_OK x86: Add -mmove-max=bits and -mstore-max=bits Adjust "x86: Add -mmove-max=bits and -mstore-max=bits" x86: Scan leal in PR target/83782 tests for x32 x86: Check each component of source operand for AVX_U128_DIRTY x86: Compile PR target/104441 tests with -march=x86-64 x86: Add -m[no-]direct-extern-access x86: Update PR 35513 tests pieces-memset-21.c: Expect vzeroupper for ia32 x86: Always return pseudo register in ix86_gen_scratch_sse_rtx x86: Disallow unsupported EH return x86: Also check _SOFT_FLOAT in Hongyu Wang (2): PR target/103069: Relax cmpxchg loop for x86 target i386: Fix wrong codegen for -mrelax-cmpxchg-loop Jakub Jelinek (7): i386: Punt on broadcasts from TImode integers [PR101286] i386: Fix up @xorsign3_1 [PR102224] i386: Fix up xorsign for AVX [PR89984] i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737] x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781] i386: Use no-mmx,no-sse for LIBGCC2_UNWIND_ATTRIBUTE [PR104890] Richard Biener (2): middle-end/100951 - make sure to generate VECTOR_CST in lowering target/104581 - compile-time regression in mode-switching Uros Bizjak (1): [i386] Introduce scalar version of avx512f_vmscalef. liuhongt (12): Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680] Extend ldexp{s,d}f3 to vscalefs{s,d} when TARGET_AVX512F and TARGET_SSE_MATH. Fix ICE. Fix typo in testcase. Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool. Optimize (a & b) | (c & ~b) to vpternlog instruction. Enable avx512 embedde broadcast for vpternlog. Remove copysign post_reload splitter for scalar modes. Fix ICE in pass_rpad. Improve integer bit test on __atomic_fetch_[or|and]_* returns Enhance optimize_atomic_bit_test_and to handle truncation. Fix typo in r12-5486. diff --git a/gcc/builtins.c b/gcc/builtins.c index f36ac1ef4a5..41aeb373581 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -128,7 +128,6 @@ static rtx expand_builtin_va_copy (tree); static rtx inline_expand_builtin_bytecmp (tree, rtx); static rtx expand_builtin_strcmp (tree, rtx); static rtx expand_builtin_strncmp (tree, rtx, machine_mode); -static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode); static rtx expand_builtin_memchr (tree, rtx); static rtx expand_builtin_memcpy (tree, rtx); static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len, @@ -145,7 +144,6 @@ static rtx expand_builtin_stpcpy (tree, rtx, machine_mode); static rtx expand_builtin_stpncpy (tree, rtx); static rtx expand_builtin_strncat (tree, rtx); static rtx expand_builtin_strncpy (tree, rtx); -static rtx builtin_memset_gen_str (void *, HOST_WIDE_INT, scalar_int_mode); static rtx expand_builtin_memset (tree, rtx, machine_mode); static rtx expand_builtin_memset_args (tree, tree, tree, rtx, machine_mode, tree); static rtx expand_builtin_bzero (tree); @@ -3860,14 +3858,17 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode) a target constant. */ static rtx -builtin_memcpy_read_str (void *data, HOST_WIDE_INT offset, - scalar_int_mode mode) +builtin_memcpy_read_str (void *data, void *, HOST_WIDE_INT offset, + fixed_size_mode mode) { /* The REPresentation pointed to by DATA need not be a nul-terminated string but the caller guarantees it's large enough for MODE. */ const char *rep = (const char *) data; - return c_readstr (rep + offset, mode, /*nul_terminated=*/false); + /* The by-pieces infrastructure does not try to pick a vector mode + for memcpy expansion. */ + return c_readstr (rep + offset, as_a (mode), + /*nul_terminated=*/false); } /* LEN specify length of the block of memcpy/memset operation. @@ -6415,15 +6416,17 @@ expand_builtin_stpncpy (tree exp, rtx) constant. */ rtx -builtin_strncpy_read_str (void *data, HOST_WIDE_INT offset, - scalar_int_mode mode) +builtin_strncpy_read_str (void *data, void *, HOST_WIDE_INT offset, + fixed_size_mode mode) { const char *str = (const char *) data; if ((unsigned HOST_WIDE_INT) offset > strlen (str)) return const0_rtx; - return c_readstr (str + offset, mode); + /* The by-pieces infrastructure does not try to pick a vector mode + for strncpy expansion. */ + return c_readstr (str + offset, as_a (mode)); } /* Helper to check the sizes of sequences and the destination of calls @@ -6624,30 +6627,134 @@ expand_builtin_strncpy (tree exp, rtx target) return NULL_RTX; } +/* Return the RTL of a register in MODE generated from PREV in the + previous iteration. */ + +static rtx +gen_memset_value_from_prev (by_pieces_prev *prev, fixed_size_mode mode) +{ + rtx target = nullptr; + if (prev != nullptr && prev->data != nullptr) + { + /* Use the previous data in the same mode. */ + if (prev->mode == mode) + return prev->data; + + fixed_size_mode prev_mode = prev->mode; + + /* Don't use the previous data to write QImode if it is in a + vector mode. */ + if (VECTOR_MODE_P (prev_mode) && mode == QImode) + return target; + + rtx prev_rtx = prev->data; + + if (REG_P (prev_rtx) + && HARD_REGISTER_P (prev_rtx) + && lowpart_subreg_regno (REGNO (prev_rtx), prev_mode, mode) < 0) + { + /* This case occurs when PREV_MODE is a vector and when + MODE is too small to store using vector operations. + After register allocation, the code will need to move the + lowpart of the vector register into a non-vector register. + + Also, the target has chosen to use a hard register + instead of going with the default choice of using a + pseudo register. We should respect that choice and try to + avoid creating a pseudo register with the same mode as the + current hard register. + + In principle, we could just use a lowpart MODE subreg of + the vector register. However, the vector register mode might + be too wide for non-vector registers, and we already know + that the non-vector mode is too small for vector registers. + It's therefore likely that we'd need to spill to memory in + the vector mode and reload the non-vector value from there. + + Try to avoid that by reducing the vector register to the + smallest size that it can hold. This should increase the + chances that non-vector registers can hold both the inner + and outer modes of the subreg that we generate later. */ + machine_mode m; + fixed_size_mode candidate; + FOR_EACH_MODE_IN_CLASS (m, GET_MODE_CLASS (mode)) + if (is_a (m, &candidate)) + { + if (GET_MODE_SIZE (candidate) + >= GET_MODE_SIZE (prev_mode)) + break; + if (GET_MODE_SIZE (candidate) >= GET_MODE_SIZE (mode) + && lowpart_subreg_regno (REGNO (prev_rtx), + prev_mode, candidate) >= 0) + { + target = lowpart_subreg (candidate, prev_rtx, + prev_mode); + prev_rtx = target; + prev_mode = candidate; + break; + } + } + if (target == nullptr) + prev_rtx = copy_to_reg (prev_rtx); + } + + target = lowpart_subreg (mode, prev_rtx, prev_mode); + } + return target; +} + /* Callback routine for store_by_pieces. Read GET_MODE_BITSIZE (MODE) bytes from constant string DATA + OFFSET and return it as target - constant. */ + constant. If PREV isn't nullptr, it has the RTL info from the + previous iteration. */ rtx -builtin_memset_read_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED, - scalar_int_mode mode) +builtin_memset_read_str (void *data, void *prev, + HOST_WIDE_INT offset ATTRIBUTE_UNUSED, + fixed_size_mode mode) { const char *c = (const char *) data; - char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode)); + unsigned int size = GET_MODE_SIZE (mode); - memset (p, *c, GET_MODE_SIZE (mode)); + rtx target = gen_memset_value_from_prev ((by_pieces_prev *) prev, + mode); + if (target != nullptr) + return target; + rtx src = gen_int_mode (*c, QImode); - return c_readstr (p, mode); + if (VECTOR_MODE_P (mode)) + { + gcc_assert (GET_MODE_INNER (mode) == QImode); + + rtx const_vec = gen_const_vec_duplicate (mode, src); + if (prev == NULL) + /* Return CONST_VECTOR when called by a query function. */ + return const_vec; + + /* Use the move expander with CONST_VECTOR. */ + target = targetm.gen_memset_scratch_rtx (mode); + emit_move_insn (target, const_vec); + return target; + } + + char *p = XALLOCAVEC (char, size); + + memset (p, *c, size); + + /* Vector modes should be handled above. */ + return c_readstr (p, as_a (mode)); } /* Callback routine for store_by_pieces. Return the RTL of a register containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned char value given in the RTL register data. For example, if mode is - 4 bytes wide, return the RTL for 0x01010101*data. */ + 4 bytes wide, return the RTL for 0x01010101*data. If PREV isn't + nullptr, it has the RTL info from the previous iteration. */ static rtx -builtin_memset_gen_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED, - scalar_int_mode mode) +builtin_memset_gen_str (void *data, void *prev, + HOST_WIDE_INT offset ATTRIBUTE_UNUSED, + fixed_size_mode mode) { rtx target, coeff; size_t size; @@ -6657,9 +6764,33 @@ builtin_memset_gen_str (void *data, HOST_WIDE_INT offset ATTRIBUTE_UNUSED, if (size == 1) return (rtx) data; + target = gen_memset_value_from_prev ((by_pieces_prev *) prev, mode); + if (target != nullptr) + return target; + + if (VECTOR_MODE_P (mode)) + { + gcc_assert (GET_MODE_INNER (mode) == QImode); + + /* vec_duplicate_optab is a precondition to pick a vector mode for + the memset expander. */ + insn_code icode = optab_handler (vec_duplicate_optab, mode); + + target = targetm.gen_memset_scratch_rtx (mode); + class expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], (rtx) data, QImode); + expand_insn (icode, 2, ops); + if (!rtx_equal_p (target, ops[0].value)) + emit_move_insn (target, ops[0].value); + + return target; + } + p = XALLOCAVEC (char, size); memset (p, 1, size); - coeff = c_readstr (p, mode); + /* Vector modes should be handled above. */ + coeff = c_readstr (p, as_a (mode)); target = convert_to_mode (mode, (rtx) data, 1); target = expand_mult (mode, target, coeff, NULL_RTX, 1); @@ -8993,6 +9124,93 @@ expand_ifn_atomic_bit_test_and (gcall *call) emit_move_insn (target, result); } +/* Expand IFN_ATOMIC_*_FETCH_CMP_0 internal function. */ + +void +expand_ifn_atomic_op_fetch_cmp_0 (gcall *call) +{ + tree cmp = gimple_call_arg (call, 0); + tree ptr = gimple_call_arg (call, 1); + tree arg = gimple_call_arg (call, 2); + tree lhs = gimple_call_lhs (call); + enum memmodel model = MEMMODEL_SYNC_SEQ_CST; + machine_mode mode = TYPE_MODE (TREE_TYPE (cmp)); + optab optab; + rtx_code code; + class expand_operand ops[5]; + + gcc_assert (flag_inline_atomics); + + if (gimple_call_num_args (call) == 4) + model = get_memmodel (gimple_call_arg (call, 3)); + + rtx mem = get_builtin_sync_mem (ptr, mode); + rtx op = expand_expr_force_mode (arg, mode); + + switch (gimple_call_internal_fn (call)) + { + case IFN_ATOMIC_ADD_FETCH_CMP_0: + code = PLUS; + optab = atomic_add_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_SUB_FETCH_CMP_0: + code = MINUS; + optab = atomic_sub_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_AND_FETCH_CMP_0: + code = AND; + optab = atomic_and_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_OR_FETCH_CMP_0: + code = IOR; + optab = atomic_or_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_XOR_FETCH_CMP_0: + code = XOR; + optab = atomic_xor_fetch_cmp_0_optab; + break; + default: + gcc_unreachable (); + } + + enum rtx_code comp = UNKNOWN; + switch (tree_to_uhwi (cmp)) + { + case ATOMIC_OP_FETCH_CMP_0_EQ: comp = EQ; break; + case ATOMIC_OP_FETCH_CMP_0_NE: comp = NE; break; + case ATOMIC_OP_FETCH_CMP_0_GT: comp = GT; break; + case ATOMIC_OP_FETCH_CMP_0_GE: comp = GE; break; + case ATOMIC_OP_FETCH_CMP_0_LT: comp = LT; break; + case ATOMIC_OP_FETCH_CMP_0_LE: comp = LE; break; + default: gcc_unreachable (); + } + + rtx target; + if (lhs == NULL_TREE) + target = gen_reg_rtx (TYPE_MODE (boolean_type_node)); + else + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + enum insn_code icode = direct_optab_handler (optab, mode); + gcc_assert (icode != CODE_FOR_nothing); + create_output_operand (&ops[0], target, TYPE_MODE (boolean_type_node)); + create_fixed_operand (&ops[1], mem); + create_convert_operand_to (&ops[2], op, mode, true); + create_integer_operand (&ops[3], model); + create_integer_operand (&ops[4], comp); + if (maybe_expand_insn (icode, 5, ops)) + return; + + rtx result = expand_atomic_fetch_op (gen_reg_rtx (mode), mem, op, + code, model, true); + if (lhs) + { + result = emit_store_flag_force (target, comp, result, const0_rtx, mode, + 0, 1); + if (result != target) + emit_move_insn (target, result); + } +} + /* Expand an atomic clear operation. void _atomic_clear (BOOL *obj, enum memmodel) EXP is the call expression. */ diff --git a/gcc/builtins.h b/gcc/builtins.h index 307a20fbadb..a395d53ec99 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -110,8 +110,10 @@ extern void expand_builtin_update_setjmp_buf (rtx); extern tree mathfn_built_in (tree, enum built_in_function fn); extern tree mathfn_built_in (tree, combined_fn); extern tree mathfn_built_in_type (combined_fn); -extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode); -extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode); +extern rtx builtin_strncpy_read_str (void *, void *, HOST_WIDE_INT, + fixed_size_mode); +extern rtx builtin_memset_read_str (void *, void *, HOST_WIDE_INT, + fixed_size_mode); extern rtx expand_builtin_saveregs (void); extern tree std_build_builtin_va_list (void); extern tree std_fn_abi_va_list (tree); @@ -120,6 +122,7 @@ extern void std_expand_builtin_va_start (tree, rtx); extern void expand_builtin_trap (void); extern void expand_ifn_atomic_bit_test_and (gcall *); extern void expand_ifn_atomic_compare_exchange (gcall *); +extern void expand_ifn_atomic_op_fetch_cmp_0 (gcall *); extern rtx expand_builtin (tree, rtx, rtx, machine_mode, int); extern enum built_in_function builtin_mathfn_code (const_tree); extern tree fold_builtin_expect (location_t, tree, tree, tree, tree); diff --git a/gcc/calls.c b/gcc/calls.c index 7d908c6a62b..32ba27e755f 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -3731,7 +3731,7 @@ expand_call (tree exp, rtx target, int ignore) So the entire argument block must then be preallocated (i.e., we ignore PUSH_ROUNDING in that case). */ - int must_preallocate = !PUSH_ARGS; + int must_preallocate = !targetm.calls.push_argument (0); /* Size of the stack reserved for parameter registers. */ int reg_parm_stack_space = 0; @@ -3839,7 +3839,7 @@ expand_call (tree exp, rtx target, int ignore) #endif if (! OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype : TREE_TYPE (fndecl))) - && reg_parm_stack_space > 0 && PUSH_ARGS) + && reg_parm_stack_space > 0 && targetm.calls.push_argument (0)) must_preallocate = 1; /* Set up a place to return a structure. */ @@ -5480,7 +5480,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, } else { - if (!PUSH_ARGS) + if (!targetm.calls.push_argument (0)) argblock = push_block (gen_int_mode (args_size.constant, Pmode), 0, 0); } diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index c2d57de6411..096de98ea79 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -1917,7 +1917,7 @@ const pta processor_alias_table[] = M_CPU_TYPE (INTEL_GOLDMONT), P_PROC_SSE4_2}, {"goldmont-plus", PROCESSOR_GOLDMONT_PLUS, CPU_GLM, PTA_GOLDMONT_PLUS, M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2}, - {"tremont", PROCESSOR_TREMONT, CPU_GLM, PTA_TREMONT, + {"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT, M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2}, {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL, M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F}, diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h index 4c5b19e262b..80195cea5b2 100644 --- a/gcc/config/bpf/bpf.h +++ b/gcc/config/bpf/bpf.h @@ -288,9 +288,6 @@ enum reg_class never used when passing arguments. However, we still have to define the constants below. */ -/* If nonzero, push insns will be used to pass outgoing arguments. */ -#define PUSH_ARGS 0 - /* If nonzero, function arguments will be evaluated from last to first, rather than from first to last. */ #define PUSH_ARGS_REVERSED 1 diff --git a/gcc/config/cr16/cr16.c b/gcc/config/cr16/cr16.c index 079706f7a91..75040fb2fa7 100644 --- a/gcc/config/cr16/cr16.c +++ b/gcc/config/cr16/cr16.c @@ -158,6 +158,8 @@ static void cr16_print_operand_address (FILE *, machine_mode, rtx); #define TARGET_CLASS_LIKELY_SPILLED_P cr16_class_likely_spilled_p /* Passing function arguments. */ +#undef TARGET_PUSH_ARGUMENT +#define TARGET_PUSH_ARGUMENT hook_bool_uint_true #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG cr16_function_arg #undef TARGET_FUNCTION_ARG_ADVANCE diff --git a/gcc/config/cr16/cr16.h b/gcc/config/cr16/cr16.h index ae90610ad80..a60d9a79b0b 100644 --- a/gcc/config/cr16/cr16.h +++ b/gcc/config/cr16/cr16.h @@ -379,8 +379,6 @@ enum reg_class #define ACCUMULATE_OUTGOING_ARGS 0 -#define PUSH_ARGS 1 - #define PUSH_ROUNDING(BYTES) cr16_push_rounding (BYTES) #ifndef CUMULATIVE_ARGS diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c index 4ba04403002..d8d53555f4d 100644 --- a/gcc/config/i386/gnu-property.c +++ b/gcc/config/i386/gnu-property.c @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see #include "tm.h" #include "output.h" #include "linux-common.h" +#include "i386-protos.h" static void emit_gnu_property (unsigned int type, unsigned int data) @@ -60,7 +61,9 @@ file_end_indicate_exec_stack_and_gnu_property (void) { file_end_indicate_exec_stack (); - if (flag_cf_protection == CF_NONE && !ix86_needed) + if (flag_cf_protection == CF_NONE + && !ix86_needed + && !ix86_has_no_direct_extern_access) return; unsigned int feature_1 = 0; @@ -121,4 +124,9 @@ file_end_indicate_exec_stack_and_gnu_property (void) /* Generate GNU_PROPERTY_X86_ISA_1_NEEDED. */ if (isa_1) emit_gnu_property (0xc0008002, isa_1); + + if (ix86_has_no_direct_extern_access) + /* Emite a GNU_PROPERTY_1_NEEDED note with + GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS. */ + emit_gnu_property (0xb0008000, (1U << 0)); } diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 7721534751b..a439e42a12d 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -190,6 +190,82 @@ ix86_expand_clear (rtx dest) emit_insn (tmp); } +/* Return true if V can be broadcasted from an integer of WIDTH bits + which is returned in VAL_BROADCAST. Otherwise, return false. */ + +static bool +ix86_broadcast (HOST_WIDE_INT v, unsigned int width, + HOST_WIDE_INT &val_broadcast) +{ + wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT); + val_broadcast = wi::extract_uhwi (val, 0, width); + for (unsigned int i = width; i < HOST_BITS_PER_WIDE_INT; i += width) + { + HOST_WIDE_INT each = wi::extract_uhwi (val, i, width); + if (val_broadcast != each) + return false; + } + val_broadcast = sext_hwi (val_broadcast, width); + return true; +} + +/* Convert the CONST_WIDE_INT operand OP to broadcast in MODE. */ + +static rtx +ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op) +{ + /* Don't use integer vector broadcast if we can't move from GPR to SSE + register directly. */ + if (!TARGET_INTER_UNIT_MOVES_TO_VEC) + return nullptr; + + /* Convert CONST_WIDE_INT to a non-standard SSE constant integer + broadcast only if vector broadcast is available. */ + if (!TARGET_AVX + || !CONST_WIDE_INT_P (op) + || standard_sse_constant_p (op, mode)) + return nullptr; + + HOST_WIDE_INT val = CONST_WIDE_INT_ELT (op, 0); + HOST_WIDE_INT val_broadcast; + scalar_int_mode broadcast_mode; + if (TARGET_AVX2 + && ix86_broadcast (val, GET_MODE_BITSIZE (QImode), + val_broadcast)) + broadcast_mode = QImode; + else if (TARGET_AVX2 + && ix86_broadcast (val, GET_MODE_BITSIZE (HImode), + val_broadcast)) + broadcast_mode = HImode; + else if (ix86_broadcast (val, GET_MODE_BITSIZE (SImode), + val_broadcast)) + broadcast_mode = SImode; + else if (TARGET_64BIT + && ix86_broadcast (val, GET_MODE_BITSIZE (DImode), + val_broadcast)) + broadcast_mode = DImode; + else + return nullptr; + + /* Check if OP can be broadcasted from VAL. */ + for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++) + if (val != CONST_WIDE_INT_ELT (op, i)) + return nullptr; + + unsigned int nunits = (GET_MODE_SIZE (mode) + / GET_MODE_SIZE (broadcast_mode)); + machine_mode vector_mode; + if (!mode_for_vector (broadcast_mode, nunits).exists (&vector_mode)) + gcc_unreachable (); + rtx target = ix86_gen_scratch_sse_rtx (vector_mode); + bool ok = ix86_expand_vector_init_duplicate (false, vector_mode, + target, + GEN_INT (val_broadcast)); + gcc_assert (ok); + target = lowpart_subreg (mode, target, vector_mode); + return target; +} + void ix86_expand_move (machine_mode mode, rtx operands[]) { @@ -347,20 +423,29 @@ ix86_expand_move (machine_mode mode, rtx operands[]) && optimize) op1 = copy_to_mode_reg (mode, op1); - if (can_create_pseudo_p () - && CONST_DOUBLE_P (op1)) + if (can_create_pseudo_p ()) { - /* If we are loading a floating point constant to a register, - force the value to memory now, since we'll get better code - out the back end. */ + if (CONST_DOUBLE_P (op1)) + { + /* If we are loading a floating point constant to a + register, force the value to memory now, since we'll + get better code out the back end. */ - op1 = validize_mem (force_const_mem (mode, op1)); - if (!register_operand (op0, mode)) + op1 = validize_mem (force_const_mem (mode, op1)); + if (!register_operand (op0, mode)) + { + rtx temp = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (temp, op1)); + emit_move_insn (op0, temp); + return; + } + } + else if (GET_MODE_SIZE (mode) >= 16) { - rtx temp = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (temp, op1)); - emit_move_insn (op0, temp); - return; + rtx tmp = ix86_convert_const_wide_int_to_broadcast + (GET_MODE (op0), op1); + if (tmp != nullptr) + op1 = tmp; } } } @@ -368,6 +453,70 @@ ix86_expand_move (machine_mode mode, rtx operands[]) emit_insn (gen_rtx_SET (op0, op1)); } +/* OP is a memref of CONST_VECTOR, return scalar constant mem + if CONST_VECTOR is a vec_duplicate, else return NULL. */ +static rtx +ix86_broadcast_from_constant (machine_mode mode, rtx op) +{ + int nunits = GET_MODE_NUNITS (mode); + if (nunits < 2) + return nullptr; + + /* Don't use integer vector broadcast if we can't move from GPR to SSE + register directly. */ + if (!TARGET_INTER_UNIT_MOVES_TO_VEC + && INTEGRAL_MODE_P (mode)) + return nullptr; + + /* Convert CONST_VECTOR to a non-standard SSE constant integer + broadcast only if vector broadcast is available. */ + if (!(TARGET_AVX2 + || (TARGET_AVX + && (GET_MODE_INNER (mode) == SImode + || GET_MODE_INNER (mode) == DImode)) + || FLOAT_MODE_P (mode)) + || standard_sse_constant_p (op, mode)) + return nullptr; + + /* Don't broadcast from a 64-bit integer constant in 32-bit mode. + We can still put 64-bit integer constant in memory when + avx512 embed broadcast is available. */ + if (GET_MODE_INNER (mode) == DImode && !TARGET_64BIT + && (!TARGET_AVX512F + || (GET_MODE_SIZE (mode) < 64 && !TARGET_AVX512VL))) + return nullptr; + + if (GET_MODE_INNER (mode) == TImode) + return nullptr; + + rtx constant = get_pool_constant (XEXP (op, 0)); + if (GET_CODE (constant) != CONST_VECTOR) + return nullptr; + + /* There could be some rtx like + (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1"))) + but with "*.LC1" refer to V2DI constant vector. */ + if (GET_MODE (constant) != mode) + { + constant = simplify_subreg (mode, constant, GET_MODE (constant), + 0); + if (constant == nullptr || GET_CODE (constant) != CONST_VECTOR) + return nullptr; + } + + rtx first = XVECEXP (constant, 0, 0); + + for (int i = 1; i < nunits; ++i) + { + rtx tmp = XVECEXP (constant, 0, i); + /* Vector duplicate value. */ + if (!rtx_equal_p (tmp, first)) + return nullptr; + } + + return first; +} + void ix86_expand_vector_move (machine_mode mode, rtx operands[]) { @@ -407,7 +556,39 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) op1 = simplify_gen_subreg (mode, r, imode, SUBREG_BYTE (op1)); } else - op1 = validize_mem (force_const_mem (mode, op1)); + { + machine_mode mode = GET_MODE (op0); + rtx tmp = ix86_convert_const_wide_int_to_broadcast + (mode, op1); + if (tmp == nullptr) + op1 = validize_mem (force_const_mem (mode, op1)); + else + op1 = tmp; + } + } + + if (can_create_pseudo_p () + && GET_MODE_SIZE (mode) >= 16 + && VECTOR_MODE_P (mode) + && (MEM_P (op1) + && SYMBOL_REF_P (XEXP (op1, 0)) + && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0)))) + { + rtx first = ix86_broadcast_from_constant (mode, op1); + if (first != nullptr) + { + /* Broadcast to XMM/YMM/ZMM register from an integer + constant or scalar mem. */ + op1 = gen_reg_rtx (mode); + if (FLOAT_MODE_P (mode) + || (!TARGET_64BIT && GET_MODE_INNER (mode) == DImode)) + first = force_const_mem (GET_MODE_INNER (mode), first); + bool ok = ix86_expand_vector_init_duplicate (false, mode, + op1, first); + gcc_assert (ok); + emit_move_insn (op0, op1); + return; + } } /* We need to check memory alignment for SSE mode since attribute @@ -423,7 +604,11 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) arguments in memory. */ if (!register_operand (op0, mode) && !register_operand (op1, mode)) - op1 = force_reg (mode, op1); + { + rtx scratch = ix86_gen_scratch_sse_rtx (mode); + emit_move_insn (scratch, op1); + op1 = scratch; + } tmp[0] = op0; tmp[1] = op1; ix86_expand_vector_move_misalign (mode, tmp); @@ -435,7 +620,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) && !register_operand (op0, mode) && !register_operand (op1, mode)) { - emit_move_insn (op0, force_reg (GET_MODE (op0), op1)); + rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); + emit_move_insn (tmp, op1); + emit_move_insn (op0, tmp); return; } @@ -1871,13 +2058,9 @@ void ix86_expand_copysign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask; - - dest = operands[0]; - op0 = operands[1]; - op1 = operands[2]; + rtx dest, op0, op1, mask, op2, op3; - mode = GET_MODE (dest); + mode = GET_MODE (operands[0]); if (mode == SFmode) vmode = V4SFmode; @@ -1888,136 +2071,40 @@ ix86_expand_copysign (rtx operands[]) else gcc_unreachable (); - mask = ix86_build_signbit_mask (vmode, 0, 0); - - if (CONST_DOUBLE_P (op0)) - { - if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0))) - op0 = simplify_unary_operation (ABS, mode, op0, mode); - - if (mode == SFmode || mode == DFmode) - { - if (op0 == CONST0_RTX (mode)) - op0 = CONST0_RTX (vmode); - else - { - rtx v = ix86_build_const_vector (vmode, false, op0); - - op0 = force_reg (vmode, v); - } - } - else if (op0 != CONST0_RTX (mode)) - op0 = force_reg (mode, op0); - - emit_insn (gen_copysign3_const (mode, dest, op0, op1, mask)); - } - else + if (rtx_equal_p (operands[1], operands[2])) { - rtx nmask = ix86_build_signbit_mask (vmode, 0, 1); - - emit_insn (gen_copysign3_var - (mode, dest, NULL_RTX, op0, op1, nmask, mask)); - } -} - -/* Deconstruct a copysign operation into bit masks. Operand 0 is known to - be a constant, and so has already been expanded into a vector constant. */ - -void -ix86_split_copysign_const (rtx operands[]) -{ - machine_mode mode, vmode; - rtx dest, op0, mask, x; - - dest = operands[0]; - op0 = operands[1]; - mask = operands[3]; - - mode = GET_MODE (dest); - vmode = GET_MODE (mask); - - dest = lowpart_subreg (vmode, dest, mode); - x = gen_rtx_AND (vmode, dest, mask); - emit_insn (gen_rtx_SET (dest, x)); - - if (op0 != CONST0_RTX (vmode)) - { - x = gen_rtx_IOR (vmode, dest, op0); - emit_insn (gen_rtx_SET (dest, x)); - } -} - -/* Deconstruct a copysign operation into bit masks. Operand 0 is variable, - so we have to do two masks. */ - -void -ix86_split_copysign_var (rtx operands[]) -{ - machine_mode mode, vmode; - rtx dest, scratch, op0, op1, mask, nmask, x; - - dest = operands[0]; - scratch = operands[1]; - op0 = operands[2]; - op1 = operands[3]; - nmask = operands[4]; - mask = operands[5]; - - mode = GET_MODE (dest); - vmode = GET_MODE (mask); - - if (rtx_equal_p (op0, op1)) - { - /* Shouldn't happen often (it's useless, obviously), but when it does - we'd generate incorrect code if we continue below. */ - emit_move_insn (dest, op0); + emit_move_insn (operands[0], operands[1]); return; } - if (REG_P (mask) && REGNO (dest) == REGNO (mask)) /* alternative 0 */ - { - gcc_assert (REGNO (op1) == REGNO (scratch)); - - x = gen_rtx_AND (vmode, scratch, mask); - emit_insn (gen_rtx_SET (scratch, x)); + dest = lowpart_subreg (vmode, operands[0], mode); + op1 = lowpart_subreg (vmode, operands[2], mode); + mask = ix86_build_signbit_mask (vmode, 0, 0); - dest = mask; - op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_NOT (vmode, dest); - x = gen_rtx_AND (vmode, x, op0); - emit_insn (gen_rtx_SET (dest, x)); - } - else + if (CONST_DOUBLE_P (operands[1])) { - if (REGNO (op1) == REGNO (scratch)) /* alternative 1,3 */ - { - x = gen_rtx_AND (vmode, scratch, mask); - } - else /* alternative 2,4 */ + op0 = simplify_unary_operation (ABS, mode, operands[1], mode); + /* Optimize for 0, simplify b = copy_signf (0.0f, a) to b = mask & a. */ + if (op0 == CONST0_RTX (mode)) { - gcc_assert (REGNO (mask) == REGNO (scratch)); - op1 = lowpart_subreg (vmode, op1, mode); - x = gen_rtx_AND (vmode, scratch, op1); + emit_move_insn (dest, gen_rtx_AND (vmode, mask, op1)); + return; } - emit_insn (gen_rtx_SET (scratch, x)); - if (REGNO (op0) == REGNO (dest)) /* alternative 1,2 */ - { - dest = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_AND (vmode, dest, nmask); - } - else /* alternative 3,4 */ - { - gcc_assert (REGNO (nmask) == REGNO (dest)); - dest = nmask; - op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_AND (vmode, dest, op0); - } - emit_insn (gen_rtx_SET (dest, x)); + if (GET_MODE_SIZE (mode) < 16) + op0 = ix86_build_const_vector (vmode, false, op0); + op0 = force_reg (vmode, op0); } - - x = gen_rtx_IOR (vmode, dest, scratch); - emit_insn (gen_rtx_SET (dest, x)); + else + op0 = lowpart_subreg (vmode, operands[1], mode); + + op2 = gen_reg_rtx (vmode); + op3 = gen_reg_rtx (vmode); + emit_move_insn (op2, gen_rtx_AND (vmode, + gen_rtx_NOT (vmode, mask), + op0)); + emit_move_insn (op3, gen_rtx_AND (vmode, mask, op1)); + emit_move_insn (dest, gen_rtx_IOR (vmode, op2, op3)); } /* Expand an xorsign operation. */ @@ -2026,7 +2113,7 @@ void ix86_expand_xorsign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, op1, mask; + rtx dest, op0, op1, mask, x, temp; dest = operands[0]; op0 = operands[1]; @@ -2041,32 +2128,17 @@ ix86_expand_xorsign (rtx operands[]) else gcc_unreachable (); + temp = gen_reg_rtx (vmode); mask = ix86_build_signbit_mask (vmode, 0, 0); - emit_insn (gen_xorsign3_1 (mode, dest, op0, op1, mask)); -} - -/* Deconstruct an xorsign operation into bit masks. */ - -void -ix86_split_xorsign (rtx operands[]) -{ - machine_mode mode, vmode; - rtx dest, op0, mask, x; - - dest = operands[0]; - op0 = operands[1]; - mask = operands[3]; + op1 = lowpart_subreg (vmode, op1, mode); + x = gen_rtx_AND (vmode, op1, mask); + emit_insn (gen_rtx_SET (temp, x)); - mode = GET_MODE (dest); - vmode = GET_MODE (mask); + op0 = lowpart_subreg (vmode, op0, mode); + x = gen_rtx_XOR (vmode, temp, op0); dest = lowpart_subreg (vmode, dest, mode); - x = gen_rtx_AND (vmode, dest, mask); - emit_insn (gen_rtx_SET (dest, x)); - - op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_XOR (vmode, dest, op0); emit_insn (gen_rtx_SET (dest, x)); } @@ -8077,7 +8149,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, it an indirect call. */ if (flag_pic && GET_CODE (addr) == SYMBOL_REF - && !SYMBOL_REF_LOCAL_P (addr)) + && ix86_call_use_plt_p (addr)) { if (flag_plt && (SYMBOL_REF_DECL (addr) == NULL_TREE @@ -11142,6 +11214,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, char *opts = ix86_target_string (bisa, bisa2, 0, 0, NULL, NULL, (enum fpmath_unit) 0, (enum prefer_vector_width) 0, + PVW_NONE, PVW_NONE, false, add_abi_p); if (!opts) error ("%qE needs unknown isa option", fndecl); @@ -13604,7 +13677,7 @@ static bool expand_vec_perm_1 (struct expand_vec_perm_d *d); /* A subroutine of ix86_expand_vector_init. Store into TARGET a vector with all elements equal to VAR. Return true if successful. */ -static bool +bool ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode, rtx target, rtx val) { @@ -21088,4 +21161,85 @@ ix86_expand_divmod_libfunc (rtx libfunc, machine_mode mode, *rem_p = rem; } +void ix86_expand_atomic_fetch_op_loop (rtx target, rtx mem, rtx val, + enum rtx_code code, bool after, + bool doubleword) +{ + rtx old_reg, new_reg, old_mem, success, oldval, new_mem; + rtx_code_label *loop_label, *pause_label, *done_label; + machine_mode mode = GET_MODE (target); + + old_reg = gen_reg_rtx (mode); + new_reg = old_reg; + loop_label = gen_label_rtx (); + pause_label = gen_label_rtx (); + done_label = gen_label_rtx (); + old_mem = copy_to_reg (mem); + emit_label (loop_label); + emit_move_insn (old_reg, old_mem); + + /* return value for atomic_fetch_op. */ + if (!after) + emit_move_insn (target, old_reg); + + if (code == NOT) + { + new_reg = expand_simple_binop (mode, AND, new_reg, val, NULL_RTX, + true, OPTAB_LIB_WIDEN); + new_reg = expand_simple_unop (mode, code, new_reg, NULL_RTX, true); + } + else + new_reg = expand_simple_binop (mode, code, new_reg, val, NULL_RTX, + true, OPTAB_LIB_WIDEN); + + /* return value for atomic_op_fetch. */ + if (after) + emit_move_insn (target, new_reg); + + /* Load memory again inside loop. */ + new_mem = copy_to_reg (mem); + /* Compare mem value with expected value. */ + + if (doubleword) + { + machine_mode half_mode = (mode == DImode)? SImode : DImode; + rtx low_new_mem = gen_lowpart (half_mode, new_mem); + rtx low_old_mem = gen_lowpart (half_mode, old_mem); + rtx high_new_mem = gen_highpart (half_mode, new_mem); + rtx high_old_mem = gen_highpart (half_mode, old_mem); + emit_cmp_and_jump_insns (low_new_mem, low_old_mem, NE, NULL_RTX, + half_mode, 1, pause_label, + profile_probability::guessed_never ()); + emit_cmp_and_jump_insns (high_new_mem, high_old_mem, NE, NULL_RTX, + half_mode, 1, pause_label, + profile_probability::guessed_never ()); + } + else + emit_cmp_and_jump_insns (new_mem, old_mem, NE, NULL_RTX, + GET_MODE (old_mem), 1, pause_label, + profile_probability::guessed_never ()); + + success = NULL_RTX; + oldval = old_mem; + expand_atomic_compare_and_swap (&success, &oldval, mem, old_reg, + new_reg, false, MEMMODEL_SYNC_SEQ_CST, + MEMMODEL_RELAXED); + if (oldval != old_mem) + emit_move_insn (old_mem, oldval); + + emit_cmp_and_jump_insns (success, const0_rtx, EQ, const0_rtx, + GET_MODE (success), 1, loop_label, + profile_probability::guessed_never ()); + + emit_jump_insn (gen_jump (done_label)); + emit_barrier (); + + /* If mem is not expected, pause and loop back. */ + emit_label (pause_label); + emit_insn (gen_pause ()); + emit_jump_insn (gen_jump (loop_label)); + emit_barrier (); + emit_label (done_label); +} + #include "gt-i386-expand.h" diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c index 77783a154b6..47736f74a00 100644 --- a/gcc/config/i386/i386-features.c +++ b/gcc/config/i386/i386-features.c @@ -2174,83 +2174,8 @@ make_pass_insert_endbr_and_patchable_area (gcc::context *ctxt) return new pass_insert_endbr_and_patchable_area (ctxt); } -/* Replace all one-value const vector that are referenced by SYMBOL_REFs in x - with embedded broadcast. i.e.transform - - vpaddq .LC0(%rip), %zmm0, %zmm0 - ret - .LC0: - .quad 3 - .quad 3 - .quad 3 - .quad 3 - .quad 3 - .quad 3 - .quad 3 - .quad 3 - - to - - vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0 - ret - .LC0: - .quad 3 */ -static void -replace_constant_pool_with_broadcast (rtx_insn *insn) -{ - subrtx_ptr_iterator::array_type array; - FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL) - { - rtx *loc = *iter; - rtx x = *loc; - rtx broadcast_mem, vec_dup, constant, first; - machine_mode mode; - - /* Constant pool. */ - if (!MEM_P (x) - || !SYMBOL_REF_P (XEXP (x, 0)) - || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0))) - continue; - - /* Const vector. */ - mode = GET_MODE (x); - if (!VECTOR_MODE_P (mode)) - return; - constant = get_pool_constant (XEXP (x, 0)); - if (GET_CODE (constant) != CONST_VECTOR) - return; - - /* There could be some rtx like - (mem/u/c:V16QI (symbol_ref/u:DI ("*.LC1"))) - but with "*.LC1" refer to V2DI constant vector. */ - if (GET_MODE (constant) != mode) - { - constant = simplify_subreg (mode, constant, GET_MODE (constant), 0); - if (constant == NULL_RTX || GET_CODE (constant) != CONST_VECTOR) - return; - } - first = XVECEXP (constant, 0, 0); - - for (int i = 1; i < GET_MODE_NUNITS (mode); ++i) - { - rtx tmp = XVECEXP (constant, 0, i); - /* Vector duplicate value. */ - if (!rtx_equal_p (tmp, first)) - return; - } - - /* Replace with embedded broadcast. */ - broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first); - vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem); - validate_change (insn, loc, vec_dup, 0); - - /* At most 1 memory_operand in an insn. */ - return; - } -} - /* At entry of the nearest common dominator for basic blocks with - conversions, generate a single + conversions/rcp/sqrt/rsqrt/round, generate a single vxorps %xmmN, %xmmN, %xmmN for all vcvtss2sd op, %xmmN, %xmmX @@ -2287,10 +2212,6 @@ remove_partial_avx_dependency (void) if (!NONDEBUG_INSN_P (insn)) continue; - /* Handle AVX512 embedded broadcast here to save compile time. */ - if (TARGET_AVX512F) - replace_constant_pool_with_broadcast (insn); - set = single_set (insn); if (!set) continue; @@ -2299,15 +2220,51 @@ remove_partial_avx_dependency (void) != AVX_PARTIAL_XMM_UPDATE_TRUE) continue; - if (!v4sf_const0) - v4sf_const0 = gen_reg_rtx (V4SFmode); - /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF, - SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and - vec_merge with subreg. */ + SI -> SF, SI -> DF, DI -> SF, DI -> DF, sqrt, rsqrt, rcp, + round, to vec_dup and vec_merge with subreg. */ rtx src = SET_SRC (set); rtx dest = SET_DEST (set); machine_mode dest_mode = GET_MODE (dest); + bool convert_p = false; + switch (GET_CODE (src)) + { + case FLOAT: + case FLOAT_EXTEND: + case FLOAT_TRUNCATE: + case UNSIGNED_FLOAT: + convert_p = true; + break; + default: + break; + } + + /* Only hanlde conversion here. */ + machine_mode src_mode + = convert_p ? GET_MODE (XEXP (src, 0)) : VOIDmode; + switch (src_mode) + { + case E_SFmode: + case E_DFmode: + if (TARGET_USE_VECTOR_FP_CONVERTS + || !TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY) + continue; + break; + case E_SImode: + case E_DImode: + if (TARGET_USE_VECTOR_CONVERTS + || !TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY) + continue; + break; + case E_VOIDmode: + gcc_assert (!convert_p); + break; + default: + gcc_unreachable (); + } + + if (!v4sf_const0) + v4sf_const0 = gen_reg_rtx (V4SFmode); rtx zero; machine_mode dest_vecmode; @@ -2422,16 +2379,6 @@ remove_partial_avx_dependency (void) return 0; } -static bool -remove_partial_avx_dependency_gate () -{ - return (TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY - && TARGET_SSE_MATH - && optimize - && optimize_function_for_speed_p (cfun)); -} - namespace { const pass_data pass_data_remove_partial_avx_dependency = @@ -2457,7 +2404,11 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return remove_partial_avx_dependency_gate (); + return (TARGET_AVX + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && TARGET_SSE_MATH + && optimize + && optimize_function_for_speed_p (cfun)); } virtual unsigned int execute (function *) @@ -2474,68 +2425,6 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt) return new pass_remove_partial_avx_dependency (ctxt); } -/* For const vector having one duplicated value, there's no need to put - whole vector in the constant pool when target supports embedded broadcast. */ -static unsigned int -constant_pool_broadcast (void) -{ - timevar_push (TV_MACH_DEP); - rtx_insn *insn; - - for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) - { - if (INSN_P (insn)) - replace_constant_pool_with_broadcast (insn); - } - timevar_pop (TV_MACH_DEP); - return 0; -} - -namespace { - -const pass_data pass_data_constant_pool_broadcast = -{ - RTL_PASS, /* type */ - "cpb", /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_MACH_DEP, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - TODO_df_finish, /* todo_flags_finish */ -}; - -class pass_constant_pool_broadcast : public rtl_opt_pass -{ -public: - pass_constant_pool_broadcast (gcc::context *ctxt) - : rtl_opt_pass (pass_data_constant_pool_broadcast, ctxt) - {} - - /* opt_pass methods: */ - virtual bool gate (function *) - { - /* Return false if rpad pass gate is true. - replace_constant_pool_with_broadcast is called - from both this pass and rpad pass. */ - return (TARGET_AVX512F && !remove_partial_avx_dependency_gate ()); - } - - virtual unsigned int execute (function *) - { - return constant_pool_broadcast (); - } -}; // class pass_cpb - -} // anon namespace - -rtl_opt_pass * -make_pass_constant_pool_broadcast (gcc::context *ctxt) -{ - return new pass_constant_pool_broadcast (ctxt); -} - /* This compares the priority of target features in function DECL1 and DECL2. It returns positive value if DECL1 is higher priority, negative value if DECL2 is higher priority and 0 if they are the diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def index dbddfd8e48f..4e7014be034 100644 --- a/gcc/config/i386/i386-modes.def +++ b/gcc/config/i386/i386-modes.def @@ -107,19 +107,10 @@ INT_MODE (XI, 64); PARTIAL_INT_MODE (HI, 16, P2QI); PARTIAL_INT_MODE (SI, 32, P2HI); -/* Mode used for signed overflow checking of TImode. As - MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that - rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc., - so OImode is too large. For the overflow checking we actually need - just 1 or 2 bits beyond TImode precision. Use 160 bits to have - a multiple of 32. */ +/* Mode used for signed overflow checking of TImode. For the overflow + checking we actually need just 1 or 2 bits beyond TImode precision. + Use 160 bits to have a multiple of 32. */ PARTIAL_INT_MODE (OI, 160, POI); -/* Keep the OI and XI modes from confusing the compiler into thinking - that these modes could actually be used for computation. They are - only holders for vectors during data movement. Include POImode precision - though. */ -#define MAX_BITSIZE_MODE_ANY_INT (160) - /* The symbol Pmode stands for one of the above machine modes (usually SImode). The tm.h file specifies which one. It is not a distinct mode. */ diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 19632b5fd6b..9f838862ee2 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -131,7 +131,7 @@ along with GCC; see the file COPYING3. If not see | m_ICELAKE_CLIENT | m_ICELAKE_SERVER | m_CASCADELAKE \ | m_TIGERLAKE | m_COOPERLAKE | m_SAPPHIRERAPIDS \ | m_ROCKETLAKE) -#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512) +#define m_CORE_AVX2 (m_HASWELL | m_SKYLAKE | m_CORE_AVX512) #define m_CORE_ALL (m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2) #define m_GOLDMONT (HOST_WIDE_INT_1U<x_ix86_tune_string; enum fpmath_unit orig_fpmath_set = opts_set->x_ix86_fpmath; enum prefer_vector_width orig_pvw_set = opts_set->x_prefer_vector_width_type; + enum prefer_vector_width orig_ix86_move_max_set + = opts_set->x_ix86_move_max; + enum prefer_vector_width orig_ix86_store_max_set + = opts_set->x_ix86_store_max; int orig_tune_defaulted = ix86_tune_defaulted; int orig_arch_specified = ix86_arch_specified; char *option_strings[IX86_FUNCTION_SPECIFIC_MAX] = { NULL, NULL }; @@ -1382,6 +1408,8 @@ ix86_valid_target_attribute_tree (tree fndecl, tree args, opts->x_ix86_tune_string = orig_tune_string; opts_set->x_ix86_fpmath = orig_fpmath_set; opts_set->x_prefer_vector_width_type = orig_pvw_set; + opts_set->x_ix86_move_max = orig_ix86_move_max_set; + opts_set->x_ix86_store_max = orig_ix86_store_max_set; opts->x_ix86_excess_precision = orig_ix86_excess_precision; opts->x_ix86_unsafe_math_optimizations = orig_ix86_unsafe_math_optimizations; @@ -2882,6 +2910,48 @@ ix86_option_override_internal (bool main_args_p, && (opts_set->x_prefer_vector_width_type == PVW_NONE)) opts->x_prefer_vector_width_type = PVW_AVX256; + if (opts_set->x_ix86_move_max == PVW_NONE) + { + /* Set the maximum number of bits can be moved from memory to + memory efficiently. */ + if (ix86_tune_features[X86_TUNE_AVX512_MOVE_BY_PIECES]) + opts->x_ix86_move_max = PVW_AVX512; + else if (ix86_tune_features[X86_TUNE_AVX256_MOVE_BY_PIECES]) + opts->x_ix86_move_max = PVW_AVX256; + else + { + opts->x_ix86_move_max = opts->x_prefer_vector_width_type; + if (opts_set->x_ix86_move_max == PVW_NONE) + { + if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)) + opts->x_ix86_move_max = PVW_AVX512; + else + opts->x_ix86_move_max = PVW_AVX128; + } + } + } + + if (opts_set->x_ix86_store_max == PVW_NONE) + { + /* Set the maximum number of bits can be stored to memory + efficiently. */ + if (ix86_tune_features[X86_TUNE_AVX512_STORE_BY_PIECES]) + opts->x_ix86_store_max = PVW_AVX512; + else if (ix86_tune_features[X86_TUNE_AVX256_STORE_BY_PIECES]) + opts->x_ix86_store_max = PVW_AVX256; + else + { + opts->x_ix86_store_max = opts->x_prefer_vector_width_type; + if (opts_set->x_ix86_store_max == PVW_NONE) + { + if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)) + opts->x_ix86_store_max = PVW_AVX512; + else + opts->x_ix86_store_max = PVW_AVX128; + } + } + } + if (opts->x_ix86_recip_name) { char *p = ASTRDUP (opts->x_ix86_recip_name); @@ -3894,6 +3964,36 @@ ix86_handle_fentry_name (tree *node, tree name, tree args, return NULL_TREE; } +/* Handle a "nodirect_extern_access" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_nodirect_extern_access_attribute (tree *pnode, tree name, + tree ARG_UNUSED (args), + int ARG_UNUSED (flags), + bool *no_add_attrs) +{ + tree node = *pnode; + + if (VAR_OR_FUNCTION_DECL_P (node)) + { + if ((!TREE_STATIC (node) && TREE_CODE (node) != FUNCTION_DECL + && !DECL_EXTERNAL (node)) || !TREE_PUBLIC (node)) + { + warning (OPT_Wattributes, + "%qE attribute have effect only on public objects", name); + *no_add_attrs = true; + } + } + else + { + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; + } + + return NULL_TREE; +} + /* Table of valid machine attributes. */ const struct attribute_spec ix86_attribute_table[] = { @@ -3974,6 +4074,8 @@ const struct attribute_spec ix86_attribute_table[] = ix86_handle_fentry_name, NULL }, { "cf_check", 0, 0, true, false, false, false, ix86_handle_fndecl_attribute, NULL }, + { "nodirect_extern_access", 0, 0, true, false, false, false, + handle_nodirect_extern_access_attribute, NULL }, /* End element. */ { NULL, 0, 0, false, false, false, false, NULL, NULL } diff --git a/gcc/config/i386/i386-options.h b/gcc/config/i386/i386-options.h index cdaca2644f4..e218e24d15b 100644 --- a/gcc/config/i386/i386-options.h +++ b/gcc/config/i386/i386-options.h @@ -26,8 +26,10 @@ char *ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, int flags, int flags2, const char *arch, const char *tune, enum fpmath_unit fpmath, - enum prefer_vector_width pvw, bool add_nl_p, - bool add_abi_p); + enum prefer_vector_width pvw, + enum prefer_vector_width move_max, + enum prefer_vector_width store_max, + bool add_nl_p, bool add_abi_p); extern enum attr_cpu ix86_schedule; diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def index 44df00e94ac..29baf8acd0b 100644 --- a/gcc/config/i386/i386-passes.def +++ b/gcc/config/i386/i386-passes.def @@ -33,4 +33,3 @@ along with GCC; see the file COPYING3. If not see INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area); INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency); - INSERT_PASS_AFTER (pass_combine, 1, pass_constant_pool_broadcast); diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 941e91636d8..4fce0684ec8 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -50,6 +50,8 @@ extern void ix86_reset_previous_fndecl (void); extern bool ix86_using_red_zone (void); +extern rtx ix86_gen_scratch_sse_rtx (machine_mode); + extern unsigned int ix86_regmode_natural_size (machine_mode); #ifdef RTX_CODE extern int standard_80387_constant_p (rtx); @@ -77,7 +79,7 @@ extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, rtx, rtx, bool); extern bool constant_address_p (rtx); extern bool legitimate_pic_operand_p (rtx); extern bool legitimate_pic_address_disp_p (rtx); -extern bool ix86_force_load_from_GOT_p (rtx); +extern bool ix86_force_load_from_GOT_p (rtx, bool = false); extern void print_reg (rtx, int, FILE*); extern void ix86_print_operand (FILE *, rtx, int); @@ -132,10 +134,7 @@ extern void ix86_expand_fp_absneg_operator (enum rtx_code, machine_mode, extern void ix86_split_fp_absneg_operator (enum rtx_code, machine_mode, rtx[]); extern void ix86_expand_copysign (rtx []); -extern void ix86_split_copysign_const (rtx []); -extern void ix86_split_copysign_var (rtx []); extern void ix86_expand_xorsign (rtx []); -extern void ix86_split_xorsign (rtx []); extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[]); extern bool ix86_match_ccmode (rtx, machine_mode); extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx); @@ -152,6 +151,7 @@ extern void ix86_expand_sse_movcc (rtx, rtx, rtx, rtx); extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool); extern bool ix86_expand_int_addcc (rtx[]); extern rtx_insn *ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); +extern bool ix86_call_use_plt_p (rtx); extern void ix86_split_call_vzeroupper (rtx, rtx); extern void x86_initialize_trampoline (rtx, rtx, rtx); extern rtx ix86_zero_extend_to_Pmode (rtx); @@ -216,6 +216,8 @@ extern rtx ix86_split_stack_guard (void); extern void ix86_move_vector_high_sse_to_mmx (rtx); extern void ix86_split_mmx_pack (rtx[], enum rtx_code); extern void ix86_split_mmx_punpck (rtx[], bool); +extern void ix86_expand_atomic_fetch_op_loop (rtx, rtx, rtx, enum rtx_code, + bool, bool); #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); @@ -258,6 +260,8 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); extern void ix86_expand_sse2_abs (rtx, rtx); +extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx, + rtx); /* In i386-c.c */ extern void ix86_target_macros (void); @@ -393,4 +397,5 @@ extern rtl_opt_pass *make_pass_insert_endbr_and_patchable_area (gcc::context *); extern rtl_opt_pass *make_pass_remove_partial_avx_dependency (gcc::context *); -extern rtl_opt_pass *make_pass_constant_pool_broadcast (gcc::context *); + +extern bool ix86_has_no_direct_extern_access; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 48300af9a09..5d3a5a5a43d 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -362,6 +362,9 @@ unsigned int ix86_default_incoming_stack_boundary; /* Alignment for incoming stack boundary in bits. */ unsigned int ix86_incoming_stack_boundary; +/* True if there is no direct access to extern symbols. */ +bool ix86_has_no_direct_extern_access; + /* Calling abi specific va_list type nodes. */ tree sysv_va_list_type_node; tree ms_va_list_type_node; @@ -4199,6 +4202,18 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) } } +/* Implement TARGET_PUSH_ARGUMENT. */ + +static bool +ix86_push_argument (unsigned int npush) +{ + /* If SSE2 is available, use vector move to put large argument onto + stack. NB: In 32-bit mode, use 8-byte vector move. */ + return ((!TARGET_SSE2 || npush < (TARGET_64BIT ? 16 : 8)) + && TARGET_PUSH_ARGS + && !ACCUMULATE_OUTGOING_ARGS); +} + /* Create the va_list data type. */ @@ -7953,8 +7968,17 @@ ix86_finalize_stack_frame_flags (void) assumed stack realignment might be needed or -fno-omit-frame-pointer is used, but in the end nothing that needed the stack alignment had been spilled nor stack access, clear frame_pointer_needed and say we - don't need stack realignment. */ - if ((stack_realign || (!flag_omit_frame_pointer && optimize)) + don't need stack realignment. + + When vector register is used for piecewise move and store, we don't + increase stack_alignment_needed as there is no register spill for + piecewise move and store. Since stack_realign_needed is set to true + by checking stack_alignment_estimated which is updated by pseudo + vector register usage, we also need to check stack_realign_needed to + eliminate frame pointer. */ + if ((stack_realign + || (!flag_omit_frame_pointer && optimize) + || crtl->stack_realign_needed) && frame_pointer_needed && crtl->is_leaf && crtl->sp_is_unchanging @@ -9252,12 +9276,15 @@ ix86_expand_epilogue (int style) rtx sa = EH_RETURN_STACKADJ_RTX; rtx_insn *insn; - /* %ecx can't be used for both DRAP register and eh_return. */ - if (crtl->drap_reg) - gcc_assert (REGNO (crtl->drap_reg) != CX_REG); + /* Stack realignment doesn't work with eh_return. */ + if (crtl->stack_realign_needed) + sorry ("Stack realignment not supported with " + "%<__builtin_eh_return%>"); /* regparm nested functions don't work with eh_return. */ - gcc_assert (!ix86_static_chain_on_stack); + if (ix86_static_chain_on_stack) + sorry ("regparm nested function not supported with " + "%<__builtin_eh_return%>"); if (frame_pointer_needed) { @@ -10311,13 +10338,17 @@ darwin_local_data_pic (rtx disp) } /* True if the function symbol operand X should be loaded from GOT. + If CALL_P is true, X is a call operand. + + NB: -mno-direct-extern-access doesn't force load from GOT for + call. NB: In 32-bit mode, only non-PIC is allowed in inline assembly statements, since a PIC register could not be available at the call site. */ bool -ix86_force_load_from_GOT_p (rtx x) +ix86_force_load_from_GOT_p (rtx x, bool call_p) { return ((TARGET_64BIT || (!flag_pic && HAVE_AS_IX86_GOT32X)) && !TARGET_PECOFF && !TARGET_MACHO @@ -10325,11 +10356,16 @@ ix86_force_load_from_GOT_p (rtx x) && ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC && GET_CODE (x) == SYMBOL_REF - && SYMBOL_REF_FUNCTION_P (x) - && (!flag_plt - || (SYMBOL_REF_DECL (x) - && lookup_attribute ("noplt", - DECL_ATTRIBUTES (SYMBOL_REF_DECL (x))))) + && ((!call_p + && (!ix86_direct_extern_access + || (SYMBOL_REF_DECL (x) + && lookup_attribute ("nodirect_extern_access", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (x)))))) + || (SYMBOL_REF_FUNCTION_P (x) + && (!flag_plt + || (SYMBOL_REF_DECL (x) + && lookup_attribute ("noplt", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (x))))))) && !SYMBOL_REF_LOCAL_P (x)); } @@ -10417,7 +10453,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x) /* FALLTHRU */ case E_OImode: case E_XImode: - if (!standard_sse_constant_p (x, mode)) + if (!standard_sse_constant_p (x, mode) + && GET_MODE_SIZE (TARGET_AVX512F + ? XImode + : (TARGET_AVX + ? OImode + : (TARGET_SSE2 + ? TImode : DImode))) < GET_MODE_SIZE (mode)) return false; default: break; @@ -10590,7 +10632,11 @@ legitimate_pic_address_disp_p (rtx disp) } else if (!SYMBOL_REF_FAR_ADDR_P (op0) && (SYMBOL_REF_LOCAL_P (op0) - || (HAVE_LD_PIE_COPYRELOC + || ((ix86_direct_extern_access + && !(SYMBOL_REF_DECL (op0) + && lookup_attribute ("nodirect_extern_access", + DECL_ATTRIBUTES (SYMBOL_REF_DECL (op0))))) + && HAVE_LD_PIE_COPYRELOC && flag_pie && !SYMBOL_REF_WEAK (op0) && !SYMBOL_REF_FUNCTION_P (op0))) @@ -11997,7 +12043,7 @@ output_pic_addr_const (FILE *file, rtx x, int code) assemble_name (file, name); } if (!TARGET_MACHO && !(TARGET_64BIT && TARGET_PECOFF) - && code == 'P' && ! SYMBOL_REF_LOCAL_P (x)) + && code == 'P' && ix86_call_use_plt_p (x)) fputs ("@PLT", file); break; @@ -13523,7 +13569,7 @@ ix86_print_operand (FILE *file, rtx x, int code) if (code == 'P') { - if (ix86_force_load_from_GOT_p (x)) + if (ix86_force_load_from_GOT_p (x, true)) { /* For inline assembly statement, load function address from GOT with 'P' operand modifier to avoid PLT. */ @@ -14121,11 +14167,26 @@ ix86_check_avx_upper_register (const_rtx exp) && GET_MODE_BITSIZE (GET_MODE (exp)) > 128); } +/* Check if a 256bit or 512bit AVX register is referenced in stores. */ + +static void +ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data) +{ + if (ix86_check_avx_upper_register (dest)) + { + bool *used = (bool *) data; + *used = true; + } +} + /* Return needed mode for entity in optimize_mode_switching pass. */ static int ix86_avx_u128_mode_needed (rtx_insn *insn) { + if (DEBUG_INSN_P (insn)) + return AVX_U128_ANY; + if (CALL_P (insn)) { rtx link; @@ -14145,6 +14206,14 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) } } + /* Needed mode is set to AVX_U128_CLEAN if there are no 256bit + nor 512bit registers used in the function return register. */ + bool avx_upper_reg_found = false; + note_stores (insn, ix86_check_avx_upper_stores, + &avx_upper_reg_found); + if (avx_upper_reg_found) + return AVX_U128_DIRTY; + /* If the function is known to preserve some SSE registers, RA and previous passes can legitimately rely on that for modes wider than 256 bits. It's only safe to issue a @@ -14157,11 +14226,37 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) return AVX_U128_CLEAN; } + subrtx_iterator::array_type array; + + rtx set = single_set (insn); + if (set) + { + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + if (ix86_check_avx_upper_register (dest)) + { + /* This is an YMM/ZMM load. Return AVX_U128_DIRTY if the + source isn't zero. */ + if (standard_sse_constant_p (src, GET_MODE (dest)) != 1) + return AVX_U128_DIRTY; + else + return AVX_U128_ANY; + } + else + { + FOR_EACH_SUBRTX (iter, array, src, NONCONST) + if (ix86_check_avx_upper_register (*iter)) + return AVX_U128_DIRTY; + } + + /* This isn't YMM/ZMM load/store. */ + return AVX_U128_ANY; + } + /* Require DIRTY mode if a 256bit or 512bit AVX register is referenced. Hardware changes state only when a 256bit register is written to, but we need to prevent the compiler from moving optimal insertion point above eventual read from 256bit or 512 bit register. */ - subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, PATTERN (insn), NONCONST) if (ix86_check_avx_upper_register (*iter)) return AVX_U128_DIRTY; @@ -14245,18 +14340,6 @@ ix86_mode_needed (int entity, rtx_insn *insn) return 0; } -/* Check if a 256bit or 512bit AVX register is referenced in stores. */ - -static void -ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data) - { - if (ix86_check_avx_upper_register (dest)) - { - bool *used = (bool *) data; - *used = true; - } - } - /* Calculate mode of upper 128bit AVX registers after the insn. */ static int @@ -15705,6 +15788,26 @@ ix86_zero_extend_to_Pmode (rtx exp) return force_reg (Pmode, convert_to_mode (Pmode, exp, 1)); } +/* Return true if the function is called via PLT. */ + +bool +ix86_call_use_plt_p (rtx call_op) +{ + if (SYMBOL_REF_LOCAL_P (call_op)) + { + if (SYMBOL_REF_DECL (call_op)) + { + /* NB: All ifunc functions must be called via PLT. */ + cgraph_node *node + = cgraph_node::get (SYMBOL_REF_DECL (call_op)); + if (node && node->ifunc_resolver) + return true; + } + return false; + } + return true; +} + /* Return true if the function being called was marked with attribute "noplt" or using -fno-plt and we are compiling for non-PIC. We need to handle the non-PIC case in the backend because there is no easy @@ -16785,6 +16888,8 @@ ix86_sched_init_global (FILE *, int, int) case PROCESSOR_NEHALEM: case PROCESSOR_SANDYBRIDGE: case PROCESSOR_HASWELL: + case PROCESSOR_TREMONT: + case PROCESSOR_ALDERLAKE: case PROCESSOR_GENERIC: /* Do not perform multipass scheduling for pre-reload schedule to save compile time. */ @@ -20280,6 +20385,11 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, case UNSPEC: if (XINT (x, 1) == UNSPEC_TP) *total = 0; + else if (XINT(x, 1) == UNSPEC_VTERNLOG) + { + *total = cost->sse_op; + return true; + } return false; case VEC_SELECT: @@ -21828,10 +21938,10 @@ int asm_preferred_eh_data_format (int code, int global) { /* PE-COFF is effectively always -fPIC because of the .reloc section. */ - if (flag_pic || TARGET_PECOFF) + if (flag_pic || TARGET_PECOFF || !ix86_direct_extern_access) { int type = DW_EH_PE_sdata8; - if (!TARGET_64BIT + if (ptr_mode == SImode || ix86_cmodel == CM_SMALL_PIC || (ix86_cmodel == CM_MEDIUM_PIC && (global || code))) type = DW_EH_PE_sdata4; @@ -22907,10 +23017,28 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) static bool ix86_binds_local_p (const_tree exp) { - return default_binds_local_p_3 (exp, flag_shlib != 0, true, true, - (!flag_pic - || (TARGET_64BIT - && HAVE_LD_PIE_COPYRELOC != 0))); + bool direct_extern_access + = (ix86_direct_extern_access + && !(VAR_OR_FUNCTION_DECL_P (exp) + && lookup_attribute ("nodirect_extern_access", + DECL_ATTRIBUTES (exp)))); + if (!direct_extern_access) + ix86_has_no_direct_extern_access = true; + return default_binds_local_p_3 (exp, flag_shlib != 0, true, + direct_extern_access, + (direct_extern_access + && (!flag_pic + || (TARGET_64BIT + && HAVE_LD_PIE_COPYRELOC != 0)))); +} + +/* If flag_pic or ix86_direct_extern_access is false, then neither + local nor global relocs should be placed in readonly memory. */ + +static int +ix86_reloc_rw_mask (void) +{ + return (flag_pic || !ix86_direct_extern_access) ? 3 : 0; } #endif @@ -23046,6 +23174,15 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode, } } +/* Implement the TARGET_GEN_MEMSET_SCRATCH_RTX hook. Return a scratch + register in MODE for vector load and store. */ + +rtx +ix86_gen_scratch_sse_rtx (machine_mode mode) +{ + return gen_reg_rtx (mode); +} + /* Address space support. This is not "far pointers" in the 16-bit sense, but an easy way @@ -23576,6 +23713,9 @@ ix86_run_selftests (void) #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST ix86_address_cost +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true + #undef TARGET_FLAGS_REGNUM #define TARGET_FLAGS_REGNUM FLAGS_REG #undef TARGET_FIXED_CONDITION_CODE_REGS @@ -23625,6 +23765,8 @@ ix86_run_selftests (void) #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision #undef TARGET_PROMOTE_PROTOTYPES #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true +#undef TARGET_PUSH_ARGUMENT +#define TARGET_PUSH_ARGUMENT ix86_push_argument #undef TARGET_SETUP_INCOMING_VARARGS #define TARGET_SETUP_INCOMING_VARARGS ix86_setup_incoming_varargs #undef TARGET_MUST_PASS_IN_STACK @@ -23929,6 +24071,14 @@ ix86_run_selftests (void) #define TARGET_GET_MULTILIB_ABI_NAME \ ix86_get_multilib_abi_name +#undef TARGET_IFUNC_REF_LOCAL_OK +#define TARGET_IFUNC_REF_LOCAL_OK hook_bool_void_true + +#if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES +# undef TARGET_ASM_RELOC_RW_MASK +# define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask +#endif + static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED) { #ifdef OPTION_GLIBC @@ -23944,6 +24094,9 @@ static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED) #undef TARGET_LIBC_HAS_FAST_FUNCTION #define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function +#undef TARGET_GEN_MEMSET_SCRATCH_RTX +#define TARGET_GEN_MEMSET_SCRATCH_RTX ix86_gen_scratch_sse_rtx + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ac0e5da623c..270487337aa 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -554,6 +554,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; ix86_tune_features[X86_TUNE_PARTIAL_REG_DEPENDENCY] #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \ ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY] +#define TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY \ + ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY] +#define TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY \ + ix86_tune_features[X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY] #define TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ ix86_tune_features[X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL] #define TARGET_SSE_UNALIGNED_STORE_OPTIMAL \ @@ -1692,13 +1696,8 @@ enum reg_class || TARGET_64BIT_MS_ABI \ || (TARGET_MACHO && crtl->profile)) -/* If defined, a C expression whose value is nonzero when we want to use PUSH - instructions to pass outgoing arguments. */ - -#define PUSH_ARGS (TARGET_PUSH_ARGS && !ACCUMULATE_OUTGOING_ARGS) - /* We want the stack and args grow in opposite directions, even if - PUSH_ARGS is 0. */ + targetm.calls.push_argument returns false. */ #define PUSH_ARGS_REVERSED 1 /* Offset of first parameter from the argument pointer register value. */ @@ -1936,6 +1935,8 @@ typedef struct ix86_args { #define LEGITIMATE_PIC_OPERAND_P(X) legitimate_pic_operand_p (X) +#define STRIP_UNARY(X) (UNARY_P (X) ? XEXP (X, 0) : X) + #define SYMBOLIC_CONST(X) \ (GET_CODE (X) == SYMBOL_REF \ || GET_CODE (X) == LABEL_REF \ @@ -1983,24 +1984,45 @@ typedef struct ix86_args { /* Define this as 1 if `char' should by default be signed; else as 0. */ #define DEFAULT_SIGNED_CHAR 1 -/* Max number of bytes we can move from memory to memory - in one reasonably fast instruction. */ -#define MOVE_MAX 16 - -/* MOVE_MAX_PIECES is the number of bytes at a time which we can - move efficiently, as opposed to MOVE_MAX which is the maximum - number of bytes we can move with a single instruction. - - ??? We should use TImode in 32-bit mode and use OImode or XImode - if they are available. But since by_pieces_ninsns determines the - widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in - 64-bit mode. */ -#define MOVE_MAX_PIECES \ - ((TARGET_64BIT \ - && TARGET_SSE2 \ - && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ - && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ - ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD) +/* The constant maximum number of bytes that a single instruction can + move quickly between memory and registers or between two memory + locations. */ +#define MAX_MOVE_MAX 64 + +/* Max number of bytes we can move from memory to memory in one + reasonably fast instruction, as opposed to MOVE_MAX_PIECES which + is the number of bytes at a time which we can move efficiently. + MOVE_MAX_PIECES defaults to MOVE_MAX. */ + +#define MOVE_MAX \ + ((TARGET_AVX512F \ + && (ix86_move_max == PVW_AVX512 \ + || ix86_store_max == PVW_AVX512)) \ + ? 64 \ + : ((TARGET_AVX \ + && (ix86_move_max >= PVW_AVX256 \ + || ix86_store_max >= PVW_AVX256)) \ + ? 32 \ + : ((TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? 16 : UNITS_PER_WORD))) + +/* STORE_MAX_PIECES is the number of bytes at a time that we can store + efficiently. Allow 16/32/64 bytes only if inter-unit move is enabled + since vec_duplicate enabled by inter-unit move is used to implement + store_by_pieces of 16/32/64 bytes. */ +#define STORE_MAX_PIECES \ + (TARGET_INTER_UNIT_MOVES_TO_VEC \ + ? ((TARGET_AVX512F && ix86_store_max == PVW_AVX512) \ + ? 64 \ + : ((TARGET_AVX \ + && ix86_store_max >= PVW_AVX256) \ + ? 32 \ + : ((TARGET_SSE2 \ + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ + ? 16 : UNITS_PER_WORD))) \ + : UNITS_PER_WORD) /* If a memory-to-memory move would take MOVE_RATIO or more simple move-instruction pairs, we will do a cpymem or libcall instead. @@ -3090,6 +3112,12 @@ extern enum attr_cpu ix86_schedule; #define NUM_X86_64_MS_CLOBBERED_REGS 12 #endif +/* __builtin_eh_return can't handle stack realignment, so disable MMX/SSE + in 32-bit libgcc functions that call it. */ +#ifndef __x86_64__ +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-mmx,no-sse"))) +#endif + /* Local variables: version-control: t diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 238ce09672e..61d037da921 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -122,6 +122,9 @@ UNSPEC_RSQRT UNSPEC_PSADBW + ;; For AVX512F support + UNSPEC_SCALEF + ;; Generic math support UNSPEC_COPYSIGN UNSPEC_XORSIGN @@ -4378,7 +4381,8 @@ (float_extend:DF (match_operand:SF 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!REG_P (operands[1]) || (!TARGET_AVX && REGNO (operands[0]) != REGNO (operands[1]))) @@ -4540,7 +4544,8 @@ (float_truncate:SF (match_operand:DF 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!REG_P (operands[1]) || (!TARGET_AVX && REGNO (operands[0]) != REGNO (operands[1]))) @@ -5052,7 +5057,8 @@ [(set (match_operand:MODEF 0 "sse_reg_operand") (float:MODEF (match_operand:SWI48 1 "nonimmediate_operand")))] "!TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY && epilogue_completed + && TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY + && epilogue_completed && optimize_function_for_speed_p (cfun) && (!EXT_REX_SSE_REG_P (operands[0]) || TARGET_AVX512VL)" @@ -10475,50 +10481,6 @@ || (TARGET_SSE && (mode == TFmode))" "ix86_expand_copysign (operands); DONE;") -(define_insn_and_split "@copysign3_const" - [(set (match_operand:SSEMODEF 0 "register_operand" "=Yv") - (unspec:SSEMODEF - [(match_operand: 1 "nonimm_or_0_operand" "YvmC") - (match_operand:SSEMODEF 2 "register_operand" "0") - (match_operand: 3 "nonimmediate_operand" "Yvm")] - UNSPEC_COPYSIGN))] - "(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))" - "#" - "&& reload_completed" - [(const_int 0)] - "ix86_split_copysign_const (operands); DONE;") - -(define_insn "@copysign3_var" - [(set (match_operand:SSEMODEF 0 "register_operand" "=Yv,Yv,Yv,Yv,Yv") - (unspec:SSEMODEF - [(match_operand:SSEMODEF 2 "register_operand" "Yv,0,0,Yv,Yv") - (match_operand:SSEMODEF 3 "register_operand" "1,1,Yv,1,Yv") - (match_operand: 4 - "nonimmediate_operand" "X,Yvm,Yvm,0,0") - (match_operand: 5 - "nonimmediate_operand" "0,Yvm,1,Yvm,1")] - UNSPEC_COPYSIGN)) - (clobber (match_scratch: 1 "=Yv,Yv,Yv,Yv,Yv"))] - "(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))" - "#") - -(define_split - [(set (match_operand:SSEMODEF 0 "register_operand") - (unspec:SSEMODEF - [(match_operand:SSEMODEF 2 "register_operand") - (match_operand:SSEMODEF 3 "register_operand") - (match_operand: 4) - (match_operand: 5)] - UNSPEC_COPYSIGN)) - (clobber (match_scratch: 1))] - "((SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || (TARGET_SSE && (mode == TFmode))) - && reload_completed" - [(const_int 0)] - "ix86_split_copysign_var (operands); DONE;") - (define_expand "xorsign3" [(match_operand:MODEF 0 "register_operand") (match_operand:MODEF 1 "register_operand") @@ -10531,19 +10493,6 @@ ix86_expand_xorsign (operands); DONE; }) - -(define_insn_and_split "@xorsign3_1" - [(set (match_operand:MODEF 0 "register_operand" "=&Yv") - (unspec:MODEF - [(match_operand:MODEF 1 "register_operand" "Yv") - (match_operand:MODEF 2 "register_operand" "0") - (match_operand: 3 "nonimmediate_operand" "Yvm")] - UNSPEC_XORSIGN))] - "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" - "#" - "&& reload_completed" - [(const_int 0)] - "ix86_split_xorsign (operands); DONE;") ;; One complement instructions @@ -17312,6 +17261,17 @@ DONE; }) +(define_insn "avx512f_scalef2" + [(set (match_operand:MODEF 0 "register_operand" "=v") + (unspec:MODEF + [(match_operand:MODEF 1 "register_operand" "v") + (match_operand:MODEF 2 "nonimmediate_operand" "vm")] + UNSPEC_SCALEF))] + "TARGET_AVX512F" + "vscalef\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_expand "ldexpxf3" [(match_operand:XF 0 "register_operand") (match_operand:XF 1 "register_operand") @@ -17332,17 +17292,30 @@ [(use (match_operand:MODEF 0 "register_operand")) (use (match_operand:MODEF 1 "general_operand")) (use (match_operand:SI 2 "register_operand"))] - "TARGET_USE_FANCY_MATH_387 - && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) + "((TARGET_USE_FANCY_MATH_387 + && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) + || TARGET_MIX_SSE_I387)) + || (TARGET_AVX512F && TARGET_SSE_MATH)) && flag_unsafe_math_optimizations" { - rtx op0 = gen_reg_rtx (XFmode); - rtx op1 = gen_reg_rtx (XFmode); + /* Prefer avx512f version. */ + if (TARGET_AVX512F && TARGET_SSE_MATH) + { + rtx op2 = gen_reg_rtx (mode); + operands[1] = force_reg (mode, operands[1]); - emit_insn (gen_extendxf2 (op1, operands[1])); - emit_insn (gen_ldexpxf3 (op0, op1, operands[2])); - emit_insn (gen_truncxf2 (operands[0], op0)); + emit_insn (gen_floatsi2 (op2, operands[2])); + emit_insn (gen_avx512f_scalef2 (operands[0], operands[1], op2)); + } + else + { + rtx op0 = gen_reg_rtx (XFmode); + rtx op1 = gen_reg_rtx (XFmode); + + emit_insn (gen_extendxf2 (op1, operands[1])); + emit_insn (gen_ldexpxf3 (op0, op1, operands[2])); + emit_insn (gen_truncxf2 (operands[0], op0)); + } DONE; }) diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index f62b0ebd3b4..b527281606e 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -404,6 +404,10 @@ momit-leaf-frame-pointer Target Mask(OMIT_LEAF_FRAME_POINTER) Save Omit the frame pointer in leaf functions. +mrelax-cmpxchg-loop +Target Mask(RELAX_CMPXCHG_LOOP) Save +Relax cmpxchg loop for atomic_fetch_{or,xor,and,nand} by adding load and cmp before cmpxchg, execute pause and loop back to load and compare if load value is not expected. + mpc32 Target RejectNegative Set 80387 floating-point precision to 32-bit. @@ -620,6 +624,14 @@ Enum(prefer_vector_width) String(256) Value(PVW_AVX256) EnumValue Enum(prefer_vector_width) String(512) Value(PVW_AVX512) +mmove-max= +Target RejectNegative Joined Var(ix86_move_max) Enum(prefer_vector_width) Init(PVW_NONE) Save +Maximum number of bits that can be moved from memory to memory efficiently. + +mstore-max= +Target RejectNegative Joined Var(ix86_store_max) Enum(prefer_vector_width) Init(PVW_NONE) Save +Maximum number of bits that can be stored to memory efficiently. + ;; ISA support m32 @@ -1190,3 +1202,7 @@ Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property. mmwait Target Mask(ISA2_MWAIT) Var(ix86_isa_flags2) Save Support MWAIT and MONITOR built-in functions and code generation. + +mdirect-extern-access +Target Var(ix86_direct_extern_access) Init(1) +Do not use GOT to access external symbols. diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index b815aca0da7..98844736b92 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1036,6 +1036,13 @@ (ior (match_test "op == const1_rtx") (match_test "op == constm1_rtx"))))) +;; True for registers, or (not: registers). Used to optimize 3-operand +;; bitwise operation. +(define_predicate "reg_or_notreg_operand" + (ior (match_operand 0 "register_operand") + (and (match_code "not") + (match_test "register_operand (XEXP (op, 0), mode)")))) + ;; True if OP is acceptable as operand of DImode shift expander. (define_predicate "shiftdi_operand" (if_then_else (match_test "TARGET_64BIT") @@ -1526,6 +1533,38 @@ (and (match_code "mem") (match_test "MEM_ALIGN (op) < GET_MODE_BITSIZE (mode)"))) +;; Return true if OP is a parallel for an mov{d,q,dqa,ps,pd} vec_select, +;; where one of the two operands of the vec_concat is const0_operand. +(define_predicate "movq_parallel" + (match_code "parallel") +{ + unsigned nelt = XVECLEN (op, 0); + unsigned nelt2 = nelt >> 1; + unsigned i; + + if (nelt < 2) + return false; + + /* Validate that all of the elements are constants, + lower halves of permute are lower halves of the first operand, + upper halves of permute come from any of the second operand. */ + for (i = 0; i < nelt; ++i) + { + rtx er = XVECEXP (op, 0, i); + unsigned HOST_WIDE_INT ei; + + if (!CONST_INT_P (er)) + return 0; + ei = INTVAL (er); + if (i < nelt2 && ei != i) + return 0; + if (i >= nelt2 && (ei < nelt || ei >= nelt << 1)) + return 0; + } + + return 1; +}) + ;; Return true if OP is a vzeroall operation, known to be a PARALLEL. (define_predicate "vzeroall_operation" (match_code "parallel") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fcfcba0134d..b921ea54845 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -95,7 +95,6 @@ UNSPEC_RCP14 UNSPEC_RSQRT14 UNSPEC_FIXUPIMM - UNSPEC_SCALEF UNSPEC_VTERNLOG UNSPEC_GETEXP UNSPEC_GETMANT @@ -810,19 +809,22 @@ ;; Mapping of vector modes to a vector mode of double size (define_mode_attr ssedoublevecmode - [(V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI") + [(V64QI "V128QI") (V32HI "V64HI") (V16SI "V32SI") (V8DI "V16DI") + (V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI") (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI") + (V16SF "V32SF") (V8DF "V16DF") (V8SF "V16SF") (V4DF "V8DF") (V4SF "V8SF") (V2DF "V4DF")]) ;; Mapping of vector modes to a vector mode of half size +;; instead of V1DI/V1DF, DI/DF are used for V2DI/V2DF although they are scalar. (define_mode_attr ssehalfvecmode [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") (V4TI "V2TI") (V32QI "V16QI") (V16HI "V8HI") (V8SI "V4SI") (V4DI "V2DI") - (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") + (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") (V2DI "DI") (V16SF "V8SF") (V8DF "V4DF") (V8SF "V4SF") (V4DF "V2DF") - (V4SF "V2SF")]) + (V4SF "V2SF") (V2DF "DF")]) (define_mode_attr ssehalfvecmodelower [(V64QI "v32qi") (V32HI "v16hi") (V16SI "v8si") (V8DI "v4di") (V4TI "v2ti") @@ -889,7 +891,9 @@ ;; Mapping of vector modes to VPTERNLOG suffix (define_mode_attr ternlogsuffix [(V8DI "q") (V4DI "q") (V2DI "q") + (V8DF "q") (V4DF "q") (V2DF "q") (V16SI "d") (V8SI "d") (V4SI "d") + (V16SF "d") (V8SF "d") (V4SF "d") (V32HI "d") (V16HI "d") (V8HI "d") (V64QI "d") (V32QI "d") (V16QI "d")]) @@ -9752,7 +9756,7 @@ (unspec:VI48_AVX512VL [(match_operand:VI48_AVX512VL 1 "register_operand" "0") (match_operand:VI48_AVX512VL 2 "register_operand" "v") - (match_operand:VI48_AVX512VL 3 "nonimmediate_operand" "vm") + (match_operand:VI48_AVX512VL 3 "bcst_vector_operand" "vmBr") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "TARGET_AVX512F" @@ -9761,13 +9765,245 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "*_vternlog_all" + [(set (match_operand:V 0 "register_operand" "=v") + (unspec:V + [(match_operand:V 1 "register_operand" "0") + (match_operand:V 2 "register_operand" "v") + (match_operand:V 3 "bcst_vector_operand" "vmBr") + (match_operand:SI 4 "const_0_to_255_operand")] + UNSPEC_VTERNLOG))] + "TARGET_AVX512F" + "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +;; There must be lots of other combinations like +;; +;; (any_logic:V +;; (any_logic:V op1 op2) +;; (any_logic:V op1 op3)) +;; +;; (any_logic:V +;; (any_logic:V +;; (any_logic:V op1, op2) +;; op3) +;; op1) +;; +;; and so on. + +(define_code_iterator any_logic1 [and ior xor]) +(define_code_iterator any_logic2 [and ior xor]) +(define_code_attr logic_op [(and "&") (ior "|") (xor "^")]) + +(define_insn_and_split "*_vpternlog_1" + [(set (match_operand:V 0 "register_operand") + (any_logic:V + (any_logic1:V + (match_operand:V 1 "reg_or_notreg_operand") + (match_operand:V 2 "reg_or_notreg_operand")) + (any_logic2:V + (match_operand:V 3 "reg_or_notreg_operand") + (match_operand:V 4 "reg_or_notreg_operand"))))] + "( == 64 || TARGET_AVX512VL) + && ix86_pre_reload_split () + && (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[4])) + || rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[4])) + || rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[3])) + || rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[3])))" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:V + [(match_dup 6) + (match_dup 2) + (match_dup 1) + (match_dup 5)] + UNSPEC_VTERNLOG))] +{ + /* VPTERNLOGD reg6, reg2, reg1, imm8. */ + int reg6 = 0xF0; + int reg2 = 0xCC; + int reg1 = 0xAA; + int reg3 = 0; + int reg4 = 0; + int reg_mask, tmp1, tmp2; + if (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[4]))) + { + reg4 = reg1; + reg3 = reg6; + operands[6] = operands[3]; + } + else if (rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[4]))) + { + reg4 = reg2; + reg3 = reg6; + operands[6] = operands[3]; + } + else if (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[3]))) + { + reg4 = reg6; + reg3 = reg1; + operands[6] = operands[4]; + } + else + { + reg4 = reg6; + reg3 = reg2; + operands[6] = operands[4]; + } + + reg1 = UNARY_P (operands[1]) ? ~reg1 : reg1; + reg2 = UNARY_P (operands[2]) ? ~reg2 : reg2; + reg3 = UNARY_P (operands[3]) ? ~reg3 : reg3; + reg4 = UNARY_P (operands[4]) ? ~reg4 : reg4; + + tmp1 = reg1 reg2; + tmp2 = reg3 reg4; + reg_mask = tmp1 tmp2; + reg_mask &= 0xFF; + + operands[1] = STRIP_UNARY (operands[1]); + operands[2] = STRIP_UNARY (operands[2]); + operands[6] = STRIP_UNARY (operands[6]); + operands[5] = GEN_INT (reg_mask); +}) + +(define_insn_and_split "*_vpternlog_2" + [(set (match_operand:V 0 "register_operand") + (any_logic:V + (any_logic1:V + (any_logic2:V + (match_operand:V 1 "reg_or_notreg_operand") + (match_operand:V 2 "reg_or_notreg_operand")) + (match_operand:V 3 "reg_or_notreg_operand")) + (match_operand:V 4 "reg_or_notreg_operand")))] + "( == 64 || TARGET_AVX512VL) + && ix86_pre_reload_split () + && (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[4])) + || rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[4])) + || rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[3])) + || rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[3])))" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:V + [(match_dup 6) + (match_dup 2) + (match_dup 1) + (match_dup 5)] + UNSPEC_VTERNLOG))] +{ + /* VPTERNLOGD reg6, reg2, reg1, imm8. */ + int reg6 = 0xF0; + int reg2 = 0xCC; + int reg1 = 0xAA; + int reg3 = 0; + int reg4 = 0; + int reg_mask, tmp1, tmp2; + if (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[4]))) + { + reg4 = reg1; + reg3 = reg6; + operands[6] = operands[3]; + } + else if (rtx_equal_p (STRIP_UNARY (operands[2]), + STRIP_UNARY (operands[4]))) + { + reg4 = reg2; + reg3 = reg6; + operands[6] = operands[3]; + } + else if (rtx_equal_p (STRIP_UNARY (operands[1]), + STRIP_UNARY (operands[3]))) + { + reg4 = reg6; + reg3 = reg1; + operands[6] = operands[4]; + } + else + { + reg4 = reg6; + reg3 = reg2; + operands[6] = operands[4]; + } + + reg1 = UNARY_P (operands[1]) ? ~reg1 : reg1; + reg2 = UNARY_P (operands[2]) ? ~reg2 : reg2; + reg3 = UNARY_P (operands[3]) ? ~reg3 : reg3; + reg4 = UNARY_P (operands[4]) ? ~reg4 : reg4; + + tmp1 = reg1 reg2; + tmp2 = tmp1 reg3; + reg_mask = tmp2 reg4; + reg_mask &= 0xFF; + + operands[1] = STRIP_UNARY (operands[1]); + operands[2] = STRIP_UNARY (operands[2]); + operands[6] = STRIP_UNARY (operands[6]); + operands[5] = GEN_INT (reg_mask); +}) + +(define_insn_and_split "*_vpternlog_3" + [(set (match_operand:V 0 "register_operand") + (any_logic:V + (any_logic1:V + (match_operand:V 1 "reg_or_notreg_operand") + (match_operand:V 2 "reg_or_notreg_operand")) + (match_operand:V 3 "reg_or_notreg_operand")))] + "( == 64 || TARGET_AVX512VL) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:V + [(match_dup 3) + (match_dup 2) + (match_dup 1) + (match_dup 4)] + UNSPEC_VTERNLOG))] +{ + /* VPTERNLOGD reg3, reg2, reg1, imm8. */ + int reg3 = 0xF0; + int reg2 = 0xCC; + int reg1 = 0xAA; + int reg_mask, tmp1; + + reg1 = UNARY_P (operands[1]) ? ~reg1 : reg1; + reg2 = UNARY_P (operands[2]) ? ~reg2 : reg2; + reg3 = UNARY_P (operands[3]) ? ~reg3 : reg3; + + tmp1 = reg1 reg2; + reg_mask = tmp1 reg3; + reg_mask &= 0xFF; + + operands[1] = STRIP_UNARY (operands[1]); + operands[2] = STRIP_UNARY (operands[2]); + operands[3] = STRIP_UNARY (operands[3]); + operands[4] = GEN_INT (reg_mask); +}) + + (define_insn "_vternlog_mask" [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") (vec_merge:VI48_AVX512VL (unspec:VI48_AVX512VL [(match_operand:VI48_AVX512VL 1 "register_operand" "0") (match_operand:VI48_AVX512VL 2 "register_operand" "v") - (match_operand:VI48_AVX512VL 3 "nonimmediate_operand" "vm") + (match_operand:VI48_AVX512VL 3 "bcst_vector_operand" "vmBr") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG) (match_dup 1) @@ -15924,11 +16160,11 @@ (set_attr "prefix" "orig,maybe_evex,orig,orig,maybe_evex") (set_attr "mode" "TI,TI,V4SF,V2SF,V2SF")]) -(define_insn "*vec_concatv4si_0" - [(set (match_operand:V4SI 0 "register_operand" "=v,x") - (vec_concat:V4SI - (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y") - (match_operand:V2SI 2 "const0_operand" " C,C")))] +(define_insn "*vec_concat_0" + [(set (match_operand:VI124_128 0 "register_operand" "=v,x") + (vec_concat:VI124_128 + (match_operand: 1 "nonimmediate_operand" "vm,?!*y") + (match_operand: 2 "const0_operand" " C,C")))] "TARGET_SSE2" "@ %vmovq\t{%1, %0|%0, %1} @@ -22115,6 +22351,24 @@ (set_attr "prefix" "maybe_evex") (set_attr "mode" "")]) +(define_insn_and_split "*vec_concat_0_1" + [(set (match_operand:V 0 "register_operand") + (vec_select:V + (vec_concat: + (match_operand:V 1 "nonimmediate_operand") + (match_operand:V 2 "const0_operand")) + (match_parallel 3 "movq_parallel" + [(match_operand 4 "const_int_operand")])))] + "ix86_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) + (vec_concat:V (match_dup 1) (match_dup 5)))] +{ + operands[1] = gen_lowpart (mode, operands[1]); + operands[5] = CONST0_RTX (mode); +}) + (define_insn "vcvtph2ps" [(set (match_operand:V4SF 0 "register_operand" "=v") (vec_select:V4SF @@ -24284,3 +24538,34 @@ "TARGET_WIDEKL" "aes\t%0" [(set_attr "type" "other")]) + +;; Modes handled by broadcast patterns. NB: Allow V64QI and V32HI with +;; TARGET_AVX512F since ix86_expand_vector_init_duplicate can expand +;; without TARGET_AVX512BW which is used by memset vector broadcast +;; expander to XI with: +;; vmovd %edi, %xmm15 +;; vpbroadcastb %xmm15, %ymm15 +;; vinserti64x4 $0x1, %ymm15, %zmm15, %zmm15 + +(define_mode_iterator INT_BROADCAST_MODE + [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F && TARGET_64BIT") + (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")]) + +;; Broadcast from an integer. NB: Enable broadcast only if we can move +;; from GPR to SSE register directly. +(define_expand "vec_duplicate" + [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand") + (vec_duplicate:INT_BROADCAST_MODE + (match_operand: 1 "nonimmediate_operand")))] + "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC" +{ + if (!ix86_expand_vector_init_duplicate (false, + GET_MODE (operands[0]), + operands[0], + operands[1])) + gcc_unreachable (); + DONE; +}) diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 9716a0b2f2c..9c177d424ed 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -525,6 +525,123 @@ (set (reg:CCZ FLAGS_REG) (unspec_volatile:CCZ [(const_int 0)] UNSPECV_CMPXCHG))])]) +(define_expand "atomic_fetch_" + [(match_operand:SWI124 0 "register_operand") + (any_logic:SWI124 + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , false, + false); + DONE; +}) + +(define_expand "atomic__fetch" + [(match_operand:SWI124 0 "register_operand") + (any_logic:SWI124 + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , true, + false); + DONE; +}) + +(define_expand "atomic_fetch_nand" + [(match_operand:SWI124 0 "register_operand") + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, false, + false); + DONE; +}) + +(define_expand "atomic_nand_fetch" + [(match_operand:SWI124 0 "register_operand") + (match_operand:SWI124 1 "memory_operand") + (match_operand:SWI124 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, true, + false); + DONE; +}) + +(define_expand "atomic_fetch_" + [(match_operand:CASMODE 0 "register_operand") + (any_logic:CASMODE + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , false, + doubleword); + DONE; +}) + +(define_expand "atomic__fetch" + [(match_operand:CASMODE 0 "register_operand") + (any_logic:CASMODE + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand")) + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], , true, + doubleword); + DONE; +}) + +(define_expand "atomic_fetch_nand" + [(match_operand:CASMODE 0 "register_operand") + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, false, + doubleword); + DONE; +}) + +(define_expand "atomic_nand_fetch" + [(match_operand:CASMODE 0 "register_operand") + (match_operand:CASMODE 1 "memory_operand") + (match_operand:CASMODE 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] + "TARGET_CMPXCHG && TARGET_RELAX_CMPXCHG_LOOP" +{ + bool doubleword = (mode == DImode && !TARGET_64BIT) + || (mode == TImode); + ix86_expand_atomic_fetch_op_loop (operands[0], operands[1], + operands[2], NOT, true, + doubleword); + DONE; +}) + + ;; For operand 2 nonmemory_operand predicate is used instead of ;; register_operand to allow combiner to better optimize atomic ;; additions of constants. @@ -821,3 +938,107 @@ (const_int 0))] "" "lock{%;} %K2btr{}\t{%1, %0|%0, %1}") + +(define_expand "atomic__fetch_cmp_0" + [(match_operand:QI 0 "register_operand") + (plusminus:SWI (match_operand:SWI 1 "memory_operand") + (match_operand:SWI 2 "nonmemory_operand")) + (match_operand:SI 3 "const_int_operand") ;; model + (match_operand:SI 4 "const_int_operand")] + "" +{ + if (INTVAL (operands[4]) == GT || INTVAL (operands[4]) == LE) + FAIL; + emit_insn (gen_atomic__fetch_cmp_0_1 (operands[1], + operands[2], + operands[3])); + ix86_expand_setcc (operands[0], (enum rtx_code) INTVAL (operands[4]), + gen_rtx_REG (CCGOCmode, FLAGS_REG), const0_rtx); + DONE; +}) + +(define_insn "atomic_add_fetch_cmp_0_1" + [(set (reg:CCGOC FLAGS_REG) + (compare:CCGOC + (plus:SWI + (unspec_volatile:SWI + [(match_operand:SWI 0 "memory_operand" "+m") + (match_operand:SI 2 "const_int_operand")] ;; model + UNSPECV_XCHG) + (match_operand:SWI 1 "nonmemory_operand" "")) + (const_int 0))) + (set (match_dup 0) + (plus:SWI (match_dup 0) (match_dup 1)))] + "" +{ + if (incdec_operand (operands[1], mode)) + { + if (operands[1] == const1_rtx) + return "lock{%;} %K2inc{}\t%0"; + else + return "lock{%;} %K2dec{}\t%0"; + } + + if (x86_maybe_negate_const_int (&operands[1], mode)) + return "lock{%;} %K2sub{}\t{%1, %0|%0, %1}"; + + return "lock{%;} %K2add{}\t{%1, %0|%0, %1}"; +}) + +(define_insn "atomic_sub_fetch_cmp_0_1" + [(set (reg:CCGOC FLAGS_REG) + (compare:CCGOC + (minus:SWI + (unspec_volatile:SWI + [(match_operand:SWI 0 "memory_operand" "+m") + (match_operand:SI 2 "const_int_operand")] ;; model + UNSPECV_XCHG) + (match_operand:SWI 1 "nonmemory_operand" "")) + (const_int 0))) + (set (match_dup 0) + (minus:SWI (match_dup 0) (match_dup 1)))] + "" +{ + if (incdec_operand (operands[1], mode)) + { + if (operands[1] != const1_rtx) + return "lock{%;} %K2inc{}\t%0"; + else + return "lock{%;} %K2dec{}\t%0"; + } + + if (x86_maybe_negate_const_int (&operands[1], mode)) + return "lock{%;} %K2add{}\t{%1, %0|%0, %1}"; + + return "lock{%;} %K2sub{}\t{%1, %0|%0, %1}"; +}) + +(define_expand "atomic__fetch_cmp_0" + [(match_operand:QI 0 "register_operand") + (any_logic:SWI (match_operand:SWI 1 "memory_operand") + (match_operand:SWI 2 "nonmemory_operand")) + (match_operand:SI 3 "const_int_operand") ;; model + (match_operand:SI 4 "const_int_operand")] + "" +{ + emit_insn (gen_atomic__fetch_cmp_0_1 (operands[1], operands[2], + operands[3])); + ix86_expand_setcc (operands[0], (enum rtx_code) INTVAL (operands[4]), + gen_rtx_REG (CCNOmode, FLAGS_REG), const0_rtx); + DONE; +}) + +(define_insn "atomic__fetch_cmp_0_1" + [(set (reg:CCNO FLAGS_REG) + (compare:CCNO + (any_logic:SWI + (unspec_volatile:SWI + [(match_operand:SWI 0 "memory_operand" "+m") + (match_operand:SI 2 "const_int_operand")] ;; model + UNSPECV_XCHG) + (match_operand:SWI 1 "nonmemory_operand" "")) + (const_int 0))) + (set (match_dup 0) + (any_logic:SWI (match_dup 0) (match_dup 1)))] + "" + "lock{%;} %K2{}\t{%1, %0|%0, %1}") diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index ffe810f2bcb..dd5563d2e64 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -2070,6 +2070,126 @@ struct processor_costs icelake_cost = { "16", /* Func alignment. */ }; +/* alderlake_cost should produce code tuned for alderlake family of CPUs. */ +static stringop_algs alderlake_memcpy[2] = { + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}, + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}}; +static stringop_algs alderlake_memset[2] = { + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}, + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}}; +static const +struct processor_costs alderlake_cost = { + { + /* Start of register allocator costs. integer->integer move cost is 2. */ + 6, /* cost for loading QImode using movzbl */ + {6, 6, 6}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {6, 6, 6}, /* cost of storing integer registers */ + 4, /* cost of reg,reg fld/fst */ + {6, 6, 12}, /* cost of loading fp registers + in SFmode, DFmode and XFmode */ + {6, 6, 12}, /* cost of storing fp registers + in SFmode, DFmode and XFmode */ + 2, /* cost of moving MMX register */ + {6, 6}, /* cost of loading MMX registers + in SImode and DImode */ + {6, 6}, /* cost of storing MMX registers + in SImode and DImode */ + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register */ + {6, 6, 6, 10, 15}, /* cost of loading SSE registers + in 32,64,128,256 and 512-bit */ + {6, 6, 6, 10, 15}, /* cost of storing SSE registers + in 32,64,128,256 and 512-bit */ + 6, 6, /* SSE->integer and integer->SSE moves */ + 6, 6, /* mask->integer and integer->mask moves */ + {6, 6, 6}, /* cost of loading mask register + in QImode, HImode, SImode. */ + {6, 6, 6}, /* cost if storing mask register + in QImode, HImode, SImode. */ + 2, /* cost of moving mask register. */ + /* End of register allocator costs. */ + }, + + COSTS_N_INSNS (1), /* cost of an add instruction */ + COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ + COSTS_N_INSNS (1), /* variable shift costs */ + COSTS_N_INSNS (1), /* constant shift costs */ + {COSTS_N_INSNS (3), /* cost of starting multiply for QI */ + COSTS_N_INSNS (4), /* HI */ + COSTS_N_INSNS (3), /* SI */ + COSTS_N_INSNS (4), /* DI */ + COSTS_N_INSNS (4)}, /* other */ + 0, /* cost of multiply per each bit set */ + {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */ + COSTS_N_INSNS (22), /* HI */ + COSTS_N_INSNS (30), /* SI */ + COSTS_N_INSNS (74), /* DI */ + COSTS_N_INSNS (74)}, /* other */ + COSTS_N_INSNS (1), /* cost of movsx */ + COSTS_N_INSNS (1), /* cost of movzx */ + 8, /* "large" insn */ + 17, /* MOVE_RATIO */ + 17, /* CLEAR_RATIO */ + {6, 6, 6}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {6, 6, 6}, /* cost of storing integer registers */ + {6, 6, 6, 10, 15}, /* cost of loading SSE register + in 32bit, 64bit, 128bit, 256bit and 512bit */ + {6, 6, 6, 10, 15}, /* cost of storing SSE register + in 32bit, 64bit, 128bit, 256bit and 512bit */ + {6, 6, 6, 10, 15}, /* cost of unaligned loads. */ + {6, 6, 6, 10, 15}, /* cost of unaligned storess. */ + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register */ + 6, /* cost of moving SSE register to integer. */ + 18, 6, /* Gather load static, per_elt. */ + 18, 6, /* Gather store static, per_elt. */ + 32, /* size of l1 cache. */ + 512, /* size of l2 cache. */ + 64, /* size of prefetch block */ + 6, /* number of parallel prefetches */ + 3, /* Branch cost */ + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (5), /* cost of FMUL instruction. */ + COSTS_N_INSNS (17), /* cost of FDIV instruction. */ + COSTS_N_INSNS (1), /* cost of FABS instruction. */ + COSTS_N_INSNS (1), /* cost of FCHS instruction. */ + COSTS_N_INSNS (14), /* cost of FSQRT instruction. */ + + COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */ + COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */ + COSTS_N_INSNS (4), /* cost of MULSS instruction. */ + COSTS_N_INSNS (5), /* cost of MULSD instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SS instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SD instruction. */ + COSTS_N_INSNS (13), /* cost of DIVSS instruction. */ + COSTS_N_INSNS (17), /* cost of DIVSD instruction. */ + COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ + COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ + 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + alderlake_memcpy, + alderlake_memset, + COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ + COSTS_N_INSNS (2), /* cond_not_taken_branch_cost. */ + "16:11:8", /* Loop alignment. */ + "16:11:8", /* Jump alignment. */ + "0:0:8", /* Label alignment. */ + "16", /* Func alignment. */ +}; + /* BTVER1 has optimized REP instruction for medium sized blocks, but for very small blocks it is better to use loop. For large blocks, libcall can do nontemporary accesses and beat inline considerably. */ @@ -2734,6 +2854,130 @@ struct processor_costs slm_cost = { "16", /* Func alignment. */ }; +static stringop_algs tremont_memcpy[2] = { + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}, + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}}; +static stringop_algs tremont_memset[2] = { + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}, + {libcall, + {{256, rep_prefix_1_byte, true}, + {256, loop, false}, + {-1, libcall, false}}}}; +static const +struct processor_costs tremont_cost = { + { + /* Start of register allocator costs. integer->integer move cost is 2. */ + 6, /* cost for loading QImode using movzbl */ + {6, 6, 6}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {6, 6, 6}, /* cost of storing integer registers */ + 4, /* cost of reg,reg fld/fst */ + {6, 6, 12}, /* cost of loading fp registers + in SFmode, DFmode and XFmode */ + {6, 6, 12}, /* cost of storing fp registers + in SFmode, DFmode and XFmode */ + 2, /* cost of moving MMX register */ + {6, 6}, /* cost of loading MMX registers + in SImode and DImode */ + {6, 6}, /* cost of storing MMX registers + in SImode and DImode */ + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register */ + {6, 6, 6, 10, 15}, /* cost of loading SSE registers + in 32,64,128,256 and 512-bit */ + {6, 6, 6, 10, 15}, /* cost of storing SSE registers + in 32,64,128,256 and 512-bit */ + 6, 6, /* SSE->integer and integer->SSE moves */ + 6, 6, /* mask->integer and integer->mask moves */ + {6, 6, 6}, /* cost of loading mask register + in QImode, HImode, SImode. */ + {6, 6, 6}, /* cost if storing mask register + in QImode, HImode, SImode. */ + 2, /* cost of moving mask register. */ + /* End of register allocator costs. */ + }, + + COSTS_N_INSNS (1), /* cost of an add instruction */ + /* Setting cost to 2 makes our current implementation of synth_mult result in + use of unnecessary temporary registers causing regression on several + SPECfp benchmarks. */ + COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ + COSTS_N_INSNS (1), /* variable shift costs */ + COSTS_N_INSNS (1), /* constant shift costs */ + {COSTS_N_INSNS (3), /* cost of starting multiply for QI */ + COSTS_N_INSNS (4), /* HI */ + COSTS_N_INSNS (3), /* SI */ + COSTS_N_INSNS (4), /* DI */ + COSTS_N_INSNS (4)}, /* other */ + 0, /* cost of multiply per each bit set */ + {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */ + COSTS_N_INSNS (22), /* HI */ + COSTS_N_INSNS (30), /* SI */ + COSTS_N_INSNS (74), /* DI */ + COSTS_N_INSNS (74)}, /* other */ + COSTS_N_INSNS (1), /* cost of movsx */ + COSTS_N_INSNS (1), /* cost of movzx */ + 8, /* "large" insn */ + 17, /* MOVE_RATIO */ + 17, /* CLEAR_RATIO */ + {6, 6, 6}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {6, 6, 6}, /* cost of storing integer registers */ + {6, 6, 6, 10, 15}, /* cost of loading SSE register + in 32bit, 64bit, 128bit, 256bit and 512bit */ + {6, 6, 6, 10, 15}, /* cost of storing SSE register + in 32bit, 64bit, 128bit, 256bit and 512bit */ + {6, 6, 6, 10, 15}, /* cost of unaligned loads. */ + {6, 6, 6, 10, 15}, /* cost of unaligned storess. */ + 2, 3, 4, /* cost of moving XMM,YMM,ZMM register */ + 6, /* cost of moving SSE register to integer. */ + 18, 6, /* Gather load static, per_elt. */ + 18, 6, /* Gather store static, per_elt. */ + 32, /* size of l1 cache. */ + 512, /* size of l2 cache. */ + 64, /* size of prefetch block */ + 6, /* number of parallel prefetches */ + /* Benchmarks shows large regressions on K8 sixtrack benchmark when this + value is increased to perhaps more appropriate value of 5. */ + 3, /* Branch cost */ + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (5), /* cost of FMUL instruction. */ + COSTS_N_INSNS (17), /* cost of FDIV instruction. */ + COSTS_N_INSNS (1), /* cost of FABS instruction. */ + COSTS_N_INSNS (1), /* cost of FCHS instruction. */ + COSTS_N_INSNS (14), /* cost of FSQRT instruction. */ + + COSTS_N_INSNS (1), /* cost of cheap SSE instruction. */ + COSTS_N_INSNS (3), /* cost of ADDSS/SD SUBSS/SD insns. */ + COSTS_N_INSNS (4), /* cost of MULSS instruction. */ + COSTS_N_INSNS (5), /* cost of MULSD instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SS instruction. */ + COSTS_N_INSNS (5), /* cost of FMA SD instruction. */ + COSTS_N_INSNS (13), /* cost of DIVSS instruction. */ + COSTS_N_INSNS (17), /* cost of DIVSD instruction. */ + COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ + COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ + 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + tremont_memcpy, + tremont_memset, + COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ + COSTS_N_INSNS (2), /* cond_not_taken_branch_cost. */ + "16:11:8", /* Loop alignment. */ + "16:11:8", /* Jump alignment. */ + "0:0:8", /* Label alignment. */ + "16", /* Func alignment. */ +}; + static stringop_algs intel_memcpy[2] = { {libcall, {{11, loop, false}, {-1, rep_prefix_4_byte, false}}}, {libcall, {{32, loop, false}, {64, rep_prefix_4_byte, false}, diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c index 2bcc64b865a..74e534e9bf0 100644 --- a/gcc/config/i386/x86-tune-sched.c +++ b/gcc/config/i386/x86-tune-sched.c @@ -71,6 +71,8 @@ ix86_issue_rate (void) case PROCESSOR_NEHALEM: case PROCESSOR_SANDYBRIDGE: case PROCESSOR_HASWELL: + case PROCESSOR_TREMONT: + case PROCESSOR_ALDERLAKE: case PROCESSOR_GENERIC: return 4; @@ -430,6 +432,8 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost, case PROCESSOR_NEHALEM: case PROCESSOR_SANDYBRIDGE: case PROCESSOR_HASWELL: + case PROCESSOR_TREMONT: + case PROCESSOR_ALDERLAKE: case PROCESSOR_GENERIC: /* Stack engine allows to execute push&pop instructions in parall. */ if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP) diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index eb057a67750..7e9a61d64ba 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -42,7 +42,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see DEF_TUNE (X86_TUNE_SCHEDULE, "schedule", m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL | m_KNL | m_KNM | m_K6_GEODE | m_AMD_MULTIPLE | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC) + | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE |m_GENERIC) /* X86_TUNE_PARTIAL_REG_DEPENDENCY: Enable more register renaming on modern chips. Preffer stores affecting whole integer register @@ -51,7 +51,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule", DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_INTEL - | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT + | m_KNL | m_KNM | m_AMD_MULTIPLE | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY: This knob promotes all store @@ -62,7 +62,22 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", that can be partly masked by careful scheduling of moves. */ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 - | m_BDVER | m_ZNVER | m_GENERIC) + | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC) + +/* X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY: This knob avoids + partial write to the destination in scalar SSE conversion from FP + to FP. */ +DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY, + "sse_partial_reg_fp_converts_dependency", + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_ZNVER | m_ALDERLAKE| m_GENERIC) + +/* X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY: This knob avoids partial + write to the destination in scalar SSE conversion from integer to FP. */ +DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY, + "sse_partial_reg_converts_dependency", + m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10 + | m_BDVER | m_ZNVER | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies are resolved on SSE register parts instead of whole registers, so we may @@ -88,14 +103,14 @@ DEF_TUNE (X86_TUNE_MOVX, "movx", m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_KNL | m_KNM | m_INTEL | m_GOLDMONT_PLUS | m_GEODE | m_AMD_MULTIPLE - | m_CORE_AVX2 | m_TREMONT | m_GENERIC) + | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by full sized loads. */ DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall", m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL | m_KNL | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS | m_AMD_MULTIPLE - | m_TREMONT | m_GENERIC) + | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_FUSE_CMP_AND_BRANCH_32: Fuse compare with a subsequent conditional jump instruction for 32 bit TARGET. */ @@ -136,7 +151,7 @@ DEF_TUNE (X86_TUNE_FUSE_ALU_AND_BRANCH, "fuse_alu_and_branch", DEF_TUNE (X86_TUNE_ACCUMULATE_OUTGOING_ARGS, "accumulate_outgoing_args", m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL - | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ATHLON_K8) + | m_GOLDMONT | m_GOLDMONT_PLUS | m_ATHLON_K8) /* X86_TUNE_PROLOGUE_USING_MOVE: Do not use push/pop in prologues that are considered on critical path. */ @@ -150,14 +165,15 @@ DEF_TUNE (X86_TUNE_EPILOGUE_USING_MOVE, "epilogue_using_move", /* X86_TUNE_USE_LEAVE: Use "leave" instruction in epilogues where it fits. */ DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave", - m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC) + m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_TREMONT + | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_PUSH_MEMORY: Enable generation of "push mem" instructions. Some chips, like 486 and Pentium works faster with separate load and push instructions. */ DEF_TUNE (X86_TUNE_PUSH_MEMORY, "push_memory", m_386 | m_P4_NOCONA | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE - | m_GENERIC) + | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_SINGLE_PUSH: Enable if single push insn is preferred over esp subtraction. */ @@ -198,8 +214,7 @@ DEF_TUNE (X86_TUNE_PAD_RETURNS, "pad_returns", than 4 branch instructions in the 16 byte window. */ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit", m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM - | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL | m_ATHLON_K8 - | m_AMDFAM10) + | m_GOLDMONT | m_GOLDMONT_PLUS | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) /*****************************************************************************/ /* Integer instruction selection tuning */ @@ -228,23 +243,23 @@ DEF_TUNE (X86_TUNE_READ_MODIFY, "read_modify", ~(m_PENT | m_LAKEMONT | m_PPRO)) DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec", ~(m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_BONNELL | m_SILVERMONT | m_INTEL | m_KNL | m_KNM | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)) + | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)) /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred for DFmode copies */ DEF_TUNE (X86_TUNE_INTEGER_DFMODE_MOVES, "integer_dfmode_moves", ~(m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_GEODE | m_AMD_MULTIPLE | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)) + | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC)) /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag will impact LEA instruction selection. */ DEF_TUNE (X86_TUNE_OPT_AGU, "opt_agu", m_BONNELL | m_SILVERMONT | m_KNL - | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL) + | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS | m_INTEL) /* X86_TUNE_AVOID_LEA_FOR_ADDR: Avoid lea for address computation. */ DEF_TUNE (X86_TUNE_AVOID_LEA_FOR_ADDR, "avoid_lea_for_addr", - m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT + m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_KNL | m_KNM) /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is @@ -263,7 +278,7 @@ DEF_TUNE (X86_TUNE_SLOW_IMUL_IMM8, "slow_imul_imm8", a conditional move. */ DEF_TUNE (X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE, "avoid_mem_opnd_for_cmove", m_BONNELL | m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_KNL - | m_KNM | m_TREMONT | m_INTEL) + | m_KNM | m_INTEL) /* X86_TUNE_SINGLE_STRINGOP: Enable use of single string operations, such as MOVS and STOS (without a REP prefix) to move/set sequences of bytes. */ @@ -273,7 +288,7 @@ DEF_TUNE (X86_TUNE_SINGLE_STRINGOP, "single_stringop", m_386 | m_P4_NOCONA) move/set sequences of bytes with known size. */ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB, "prefer_known_rep_movsb_stosb", - m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512) + m_SKYLAKE | m_ALDERLAKE | m_TREMONT | m_CORE_AVX512) /* X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES: Enable generation of compact prologues and epilogues by issuing a misaligned moves. This @@ -282,30 +297,31 @@ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB, FIXME: This may actualy be a win on more targets than listed here. */ DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES, "misaligned_move_string_pro_epilogues", - m_386 | m_486 | m_CORE_ALL | m_AMD_MULTIPLE | m_GENERIC) + m_386 | m_486 | m_CORE_ALL | m_AMD_MULTIPLE | m_TREMONT + | m_ALDERLAKE |m_GENERIC) /* X86_TUNE_USE_SAHF: Controls use of SAHF. */ DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT - | m_GENERIC) + | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO instructions. */ DEF_TUNE (X86_TUNE_USE_CLTD, "use_cltd", ~(m_PENT | m_LAKEMONT | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL - | m_K6 | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT)) + | m_K6 | m_GOLDMONT | m_GOLDMONT_PLUS)) /* X86_TUNE_USE_BT: Enable use of BT (bit test) instructions. */ DEF_TUNE (X86_TUNE_USE_BT, "use_bt", m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_LAKEMONT | m_AMD_MULTIPLE | m_GOLDMONT | m_GOLDMONT_PLUS - | m_TREMONT | m_GENERIC) + | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency for bit-manipulation instructions. */ DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi", - m_SANDYBRIDGE | m_CORE_AVX2 | m_GENERIC) + m_SANDYBRIDGE | m_CORE_AVX2 | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based on hardware capabilities. Bdver3 hardware has a loop buffer which makes @@ -317,18 +333,18 @@ DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4) if-converted sequence to one. */ DEF_TUNE (X86_TUNE_ONE_IF_CONV_INSN, "one_if_conv_insn", m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_CORE_ALL | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC) + | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_AVOID_MFENCE: Use lock prefixed instructions instead of mfence. */ DEF_TUNE (X86_TUNE_AVOID_MFENCE, "avoid_mfence", - m_CORE_ALL | m_BDVER | m_ZNVER | m_GENERIC) + m_CORE_ALL | m_BDVER | m_ZNVER | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_EXPAND_ABS: This enables a new abs pattern by generating instructions for abs (x) = (((signed) x >> (W-1) ^ x) - (signed) x >> (W-1)) instead of cmove or SSE max/abs instructions. */ DEF_TUNE (X86_TUNE_EXPAND_ABS, "expand_abs", m_CORE_ALL | m_SILVERMONT | m_KNL | m_KNM | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT ) + | m_GOLDMONT_PLUS) /*****************************************************************************/ /* 387 instruction selection tuning */ @@ -345,7 +361,8 @@ DEF_TUNE (X86_TUNE_USE_HIMODE_FIOP, "use_himode_fiop", DEF_TUNE (X86_TUNE_USE_SIMODE_FIOP, "use_simode_fiop", ~(m_PENT | m_LAKEMONT | m_PPRO | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_AMD_MULTIPLE - | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC)) + | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE + | m_GENERIC)) /* X86_TUNE_USE_FFREEP: Use freep instruction instead of fstp. */ DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE) @@ -354,7 +371,7 @@ DEF_TUNE (X86_TUNE_USE_FFREEP, "use_ffreep", m_AMD_MULTIPLE) DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, "ext_80387_constants", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_ATHLON_K8 | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_GENERIC) + | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE | m_GENERIC) /*****************************************************************************/ /* SSE instruction selection tuning */ @@ -369,15 +386,15 @@ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, "general_regs_sse_spill", of a sequence loading registers by parts. */ DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal", m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM - | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS - | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC) + | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE + | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC) /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores instead of a sequence loading registers by parts. */ DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal", m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS - | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC) + | m_TREMONT | m_ALDERLAKE | m_BDVER | m_ZNVER | m_GENERIC) /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL: Use packed single precision 128bit instructions instead of double where possible. */ @@ -386,13 +403,13 @@ DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optim /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores. */ DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores", - m_AMD_MULTIPLE | m_CORE_ALL | m_GENERIC) + m_AMD_MULTIPLE | m_CORE_ALL | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to xorps/xorpd and other variants. */ DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor", m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER - | m_GENERIC) + | m_TREMONT | m_ALDERLAKE | m_GENERIC) /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer to SSE registers. If disabled, the moves will be done by storing @@ -419,7 +436,7 @@ DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSIONS, "inter_unit_conversions", fp converts to destination register. */ DEF_TUNE (X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS, "split_mem_opnd_for_fp_converts", m_SILVERMONT | m_KNL | m_KNM | m_GOLDMONT | m_GOLDMONT_PLUS - | m_TREMONT | m_INTEL) + | m_INTEL) /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion from FP to FP. This form of instructions avoids partial write to the @@ -434,15 +451,16 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10) /* X86_TUNE_SLOW_SHUFB: Indicates tunings with slow pshufb instruction. */ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb", m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_GOLDMONT - | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL) + | m_GOLDMONT_PLUS | m_INTEL) /* X86_TUNE_AVOID_4BYTE_PREFIXES: Avoid instructions requiring 4+ bytes of prefixes. */ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes", - m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_INTEL) + m_SILVERMONT | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_ALDERLAKE + | m_INTEL) /* X86_TUNE_USE_GATHER: Use gather instructions. */ DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather", - ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC)) + ~(m_ZNVER1 | m_ZNVER2 | m_GENERIC | m_ALDERLAKE)) /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or smaller FMA chain. */ @@ -479,6 +497,27 @@ DEF_TUNE (X86_TUNE_AVX128_OPTIMAL, "avx128_optimal", m_BDVER | m_BTVER2 instructions in the auto-vectorizer. */ DEF_TUNE (X86_TUNE_AVX256_OPTIMAL, "avx256_optimal", m_CORE_AVX512) +/* X86_TUNE_AVX256_MOVE_BY_PIECES: Optimize move_by_pieces with 256-bit + AVX instructions. */ +DEF_TUNE (X86_TUNE_AVX256_MOVE_BY_PIECES, "avx256_move_by_pieces", + m_CORE_AVX512) + +/* X86_TUNE_AVX256_STORE_BY_PIECES: Optimize store_by_pieces with 256-bit + AVX instructions. */ +DEF_TUNE (X86_TUNE_AVX256_STORE_BY_PIECES, "avx256_store_by_pieces", + m_CORE_AVX512) + +/* X86_TUNE_AVX512_MOVE_BY_PIECES: Optimize move_by_pieces with 512-bit + AVX instructions. */ +DEF_TUNE (X86_TUNE_AVX512_MOVE_BY_PIECES, "avx512_move_by_pieces", + m_SAPPHIRERAPIDS) + +/* X86_TUNE_AVX512_STORE_BY_PIECES: Optimize store_by_pieces with 512-bit + AVX instructions. */ +DEF_TUNE (X86_TUNE_AVX512_STORE_BY_PIECES, "avx512_store_by_pieces", + m_SAPPHIRERAPIDS) + +/*****************************************************************************/ /*****************************************************************************/ /* Historical relics: tuning flags that helps a specific old CPU designs */ /*****************************************************************************/ diff --git a/gcc/config/m32c/m32c.c b/gcc/config/m32c/m32c.c index b1cb3591da6..d22bdd79c71 100644 --- a/gcc/config/m32c/m32c.c +++ b/gcc/config/m32c/m32c.c @@ -1296,6 +1296,9 @@ m32c_push_rounding (poly_int64 n) return (n + 1) & ~1; } +#undef TARGET_PUSH_ARGUMENT +#define TARGET_PUSH_ARGUMENT hook_bool_uint_true + /* Passing Arguments in Registers */ /* Implements TARGET_FUNCTION_ARG. Arguments are passed partly in diff --git a/gcc/config/m32c/m32c.h b/gcc/config/m32c/m32c.h index 635f5924c20..228a73d1c42 100644 --- a/gcc/config/m32c/m32c.h +++ b/gcc/config/m32c/m32c.h @@ -472,7 +472,6 @@ enum reg_class /* Passing Function Arguments on the Stack */ -#define PUSH_ARGS 1 #define PUSH_ROUNDING(N) m32c_push_rounding (N) #define CALL_POPS_ARGS(C) 0 diff --git a/gcc/config/nios2/nios2.h b/gcc/config/nios2/nios2.h index 1840a466f96..dfca12cc525 100644 --- a/gcc/config/nios2/nios2.h +++ b/gcc/config/nios2/nios2.h @@ -297,7 +297,6 @@ typedef struct nios2_args ((REGNO) >= FIRST_ARG_REGNO && (REGNO) <= LAST_ARG_REGNO) /* Passing function arguments on stack. */ -#define PUSH_ARGS 0 #define ACCUMULATE_OUTGOING_ARGS 1 /* We define TARGET_RETURN_IN_MEMORY, so set to zero. */ diff --git a/gcc/config/pru/pru.h b/gcc/config/pru/pru.h index 4c35a7d7ee3..9b6be323e6d 100644 --- a/gcc/config/pru/pru.h +++ b/gcc/config/pru/pru.h @@ -339,7 +339,6 @@ typedef struct pru_args ((REGNO) >= FIRST_ARG_REGNUM && (REGNO) <= LAST_ARG_REGNUM) /* Passing function arguments on stack. */ -#define PUSH_ARGS 0 #define ACCUMULATE_OUTGOING_ARGS 1 /* We define TARGET_RETURN_IN_MEMORY, so set to zero. */ diff --git a/gcc/defaults.h b/gcc/defaults.h index 91216593e75..ba79a8e48ed 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -801,15 +801,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define NEXT_OBJC_RUNTIME 0 #endif -/* Supply a default definition for PUSH_ARGS. */ -#ifndef PUSH_ARGS -#ifdef PUSH_ROUNDING -#define PUSH_ARGS !ACCUMULATE_OUTGOING_ARGS -#else -#define PUSH_ARGS 0 -#endif -#endif - /* Decide whether a function's arguments should be processed from first to last or from last to first. @@ -820,7 +811,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef PUSH_ARGS_REVERSED #if defined (STACK_GROWS_DOWNWARD) != defined (ARGS_GROW_DOWNWARD) -#define PUSH_ARGS_REVERSED PUSH_ARGS +#define PUSH_ARGS_REVERSED targetm.calls.push_argument (0) #endif #endif diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 689ec7de4d3..d7e3bd60bd7 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7072,6 +7072,12 @@ On x86 targets, the @code{fentry_section} attribute sets the name of the section to record function entry instrumentation calls in when enabled with @option{-pg -mrecord-mcount} +@item nodirect_extern_access +@cindex @code{nodirect_extern_access} function attribute +@opindex mno-direct-extern-access +This attribute, attached to a global variable or function, is the +counterpart to option @option{-mno-direct-extern-access}. + @end table @node Xstormy16 Function Attributes diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index f3168f2ebab..ed985c20997 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1375,6 +1375,7 @@ See RS/6000 and PowerPC Options. -mcld -mcx16 -msahf -mmovbe -mcrc32 -mmwait @gol -mrecip -mrecip=@var{opt} @gol -mvzeroupper -mprefer-avx128 -mprefer-vector-width=@var{opt} @gol +-mmove-max=@var{bits} -mstore-max=@var{bits} @gol -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol -mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl @gol -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -msha -maes @gol @@ -1407,10 +1408,10 @@ See RS/6000 and PowerPC Options. -mstack-protector-guard-reg=@var{reg} @gol -mstack-protector-guard-offset=@var{offset} @gol -mstack-protector-guard-symbol=@var{symbol} @gol --mgeneral-regs-only -mcall-ms2sysv-xlogues @gol +-mgeneral-regs-only -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol -mindirect-branch-register -mharden-sls=@var{choice} @gol --mindirect-branch-cs-prefix -mneeded} +-mindirect-branch-cs-prefix -mneeded -mno-direct-extern-access} @emph{x86 Windows Options} @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol @@ -31160,6 +31161,18 @@ This option instructs GCC to use 128-bit AVX instructions instead of This option instructs GCC to use @var{opt}-bit vector width in instructions instead of default on the selected platform. +@item -mmove-max=@var{bits} +@opindex mmove-max +This option instructs GCC to set the maximum number of bits can be +moved from memory to memory efficiently to @var{bits}. The valid +@var{bits} are 128, 256 and 512. + +@item -mstore-max=@var{bits} +@opindex mstore-max +This option instructs GCC to set the maximum number of bits can be +stored to memory efficiently to @var{bits}. The valid @var{bits} are +128, 256 and 512. + @table @samp @item none No extra limitations applied to GCC other than defined by the selected platform. @@ -31682,6 +31695,13 @@ Generate code that uses only the general-purpose registers. This prevents the compiler from using floating-point, vector, mask and bound registers. +@item -mrelax-cmpxchg-loop +@opindex mrelax-cmpxchg-loop +Relax cmpxchg loop by emitting an early load and compare before cmpxchg, +execute pause if load value is not expected. This reduces excessive +cachline bouncing when and works for all atomic logic fetch builtins +that generates compare and swap loop. + @item -mindirect-branch=@var{choice} @opindex mindirect-branch Convert indirect call and jump with @var{choice}. The default is @@ -31831,6 +31851,19 @@ x32 environments. @opindex mneeded Emit GNU_PROPERTY_X86_ISA_1_NEEDED GNU property for Linux target to indicate the micro-architecture ISA level required to execute the binary. + +@item -mno-direct-extern-access +@opindex mno-direct-extern-access +@opindex mdirect-extern-access +Without @option{-fpic} nor @option{-fPIC}, always use the GOT pointer +to access external symbols. With @option{-fpic} or @option{-fPIC}, +treat access to protected symbols as local symbols. The default is +@option{-mdirect-extern-access}. + +@strong{Warning:} shared libraries compiled with +@option{-mno-direct-extern-access} and executable compiled with +@option{-mdirect-extern-access} may not be binary compatible if +protected symbols are used in shared libraries and executable. @end table @node x86 Windows Options diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index ac761100894..289f02c14ca 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -7805,6 +7805,30 @@ If these patterns are not defined, attempts will be made to use counterparts. If none of these are available a compare-and-swap loop will be used. +@cindex @code{atomic_add_fetch_cmp_0@var{mode}} instruction pattern +@cindex @code{atomic_sub_fetch_cmp_0@var{mode}} instruction pattern +@cindex @code{atomic_and_fetch_cmp_0@var{mode}} instruction pattern +@cindex @code{atomic_or_fetch_cmp_0@var{mode}} instruction pattern +@cindex @code{atomic_xor_fetch_cmp_0@var{mode}} instruction pattern +@item @samp{atomic_add_fetch_cmp_0@var{mode}} +@itemx @samp{atomic_sub_fetch_cmp_0@var{mode}} +@itemx @samp{atomic_and_fetch_cmp_0@var{mode}} +@itemx @samp{atomic_or_fetch_cmp_0@var{mode}} +@itemx @samp{atomic_xor_fetch_cmp_0@var{mode}} +These patterns emit code for an atomic operation on memory with memory +model semantics if the fetch result is used only in a comparison against +zero. +Operand 0 is an output operand which contains a boolean result of comparison +of the value after the operation against zero. Operand 1 is the memory on +which the atomic operation is performed. Operand 2 is the second operand +to the binary operator. Operand 3 is the memory model to be used by the +operation. Operand 4 is an integer holding the comparison code, one of +@code{EQ}, @code{NE}, @code{LT}, @code{GT}, @code{LE} or @code{GE}. + +If these patterns are not defined, attempts will be made to use separate +atomic operation and fetch pattern followed by comparison of the result +against zero. + @cindex @code{mem_thread_fence} instruction pattern @item @samp{mem_thread_fence} This pattern emits code required to implement a thread fence with diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b370bc76b25..1432aa4c080 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3807,14 +3807,17 @@ cases of mismatch, it also makes for better code on certain machines. The default is to not promote prototypes. @end deftypefn -@defmac PUSH_ARGS -A C expression. If nonzero, push insns will be used to pass -outgoing arguments. -If the target machine does not have a push instruction, set it to zero. -That directs GCC to use an alternate strategy: to -allocate the entire argument block and then store the arguments into -it. When @code{PUSH_ARGS} is nonzero, @code{PUSH_ROUNDING} must be defined too. -@end defmac +@deftypefn {Target Hook} bool TARGET_PUSH_ARGUMENT (unsigned int @var{npush}) +This target hook returns @code{true} if push instructions will be +used to pass outgoing arguments. When the push instruction usage is +optional, @var{npush} is nonzero to indicate the number of bytes to +push. Otherwise, @var{npush} is zero. If the target machine does not +have a push instruction or push instruction should be avoided, +@code{false} should be returned. That directs GCC to use an alternate +strategy: to allocate the entire argument block and then store the +arguments into it. If this target hook may return @code{true}, +@code{PUSH_ROUNDING} must be defined. +@end deftypefn @defmac PUSH_ARGS_REVERSED A C expression. If nonzero, function arguments will be evaluated from @@ -6767,6 +6770,13 @@ in code size, for example where the number of insns emitted to perform a move would be greater than that of a library call. @end deftypefn +@deftypefn {Target Hook} bool TARGET_OVERLAP_OP_BY_PIECES_P (void) +This target hook should return true if when the @code{by_pieces} +infrastructure is used, an offset adjusted unaligned memory operation +in the smallest integer mode for the last piece operation of a memory +region can be generated to avoid doing more than one smaller operations. +@end deftypefn + @deftypefn {Target Hook} int TARGET_COMPARE_BY_PIECES_BRANCH_RATIO (machine_mode @var{mode}) When expanding a block comparison in MODE, gcc can try to reduce the number of branches at the expense of more memory operations. This hook @@ -11937,6 +11947,13 @@ This function prepares to emit a conditional comparison within a sequence @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the compares. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_GEN_MEMSET_SCRATCH_RTX (machine_mode @var{mode}) +This hook should return an rtx for a scratch register in @var{mode} to +be used when expanding memset calls. The backend can use a hard scratch +register to avoid stack realignment when expanding memset. The default +is @code{gen_reg_rtx}. +@end deftypefn + @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned @var{nunroll}, class loop *@var{loop}) This target hook returns a new value for the number of times @var{loop} should be unrolled. The parameter @var{nunroll} is the number of times @@ -12159,6 +12176,11 @@ The support includes the assembler, linker and dynamic linker. The default value of this hook is based on target's libc. @end deftypefn +@deftypefn {Target Hook} bool TARGET_IFUNC_REF_LOCAL_OK (void) +Return true if it is OK to reference indirect function resolvers +locally. The default is to return false. +@end deftypefn + @deftypefn {Target Hook} {unsigned int} TARGET_ATOMIC_ALIGN_FOR_MODE (machine_mode @var{mode}) If defined, this function returns an appropriate alignment in bits for an atomic object of machine_mode @var{mode}. If 0 is returned then the default alignment for the specified mode is used. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 2974dae2701..1ebc966f487 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3100,14 +3100,7 @@ control passing certain arguments in registers. @hook TARGET_PROMOTE_PROTOTYPES -@defmac PUSH_ARGS -A C expression. If nonzero, push insns will be used to pass -outgoing arguments. -If the target machine does not have a push instruction, set it to zero. -That directs GCC to use an alternate strategy: to -allocate the entire argument block and then store the arguments into -it. When @code{PUSH_ARGS} is nonzero, @code{PUSH_ROUNDING} must be defined too. -@end defmac +@hook TARGET_PUSH_ARGUMENT @defmac PUSH_ARGS_REVERSED A C expression. If nonzero, function arguments will be evaluated from @@ -4588,6 +4581,8 @@ If you don't define this, a reasonable default is used. @hook TARGET_USE_BY_PIECES_INFRASTRUCTURE_P +@hook TARGET_OVERLAP_OP_BY_PIECES_P + @hook TARGET_COMPARE_BY_PIECES_BRANCH_RATIO @defmac MOVE_MAX_PIECES @@ -8030,6 +8025,8 @@ lists. @hook TARGET_GEN_CCMP_NEXT +@hook TARGET_GEN_MEMSET_SCRATCH_RTX + @hook TARGET_LOOP_UNROLL_ADJUST @defmac POWI_MAX_MULTS @@ -8145,6 +8142,8 @@ and the associated definitions of those functions. @hook TARGET_HAS_IFUNC_P +@hook TARGET_IFUNC_REF_LOCAL_OK + @hook TARGET_ATOMIC_ALIGN_FOR_MODE @hook TARGET_ATOMIC_ASSIGN_EXPAND_FENV diff --git a/gcc/expr.c b/gcc/expr.c index 14a25c25450..fa76dff26b4 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -746,7 +746,7 @@ static unsigned int alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) { scalar_int_mode tmode - = int_mode_for_size (max_pieces * BITS_PER_UNIT, 1).require (); + = int_mode_for_size (max_pieces * BITS_PER_UNIT, 0).require (); if (align >= GET_MODE_ALIGNMENT (tmode)) align = GET_MODE_ALIGNMENT (tmode); @@ -769,15 +769,36 @@ alignment_for_piecewise_move (unsigned int max_pieces, unsigned int align) return align; } -/* Return the widest integer mode that is narrower than SIZE bytes. */ +/* Return the widest QI vector, if QI_MODE is true, or integer mode + that is narrower than SIZE bytes. */ -static scalar_int_mode -widest_int_mode_for_size (unsigned int size) +static fixed_size_mode +widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector) { - scalar_int_mode result = NARROWEST_INT_MODE; + fixed_size_mode result = NARROWEST_INT_MODE; gcc_checking_assert (size > 1); + /* Use QI vector only if size is wider than a WORD. */ + if (qi_vector && size > UNITS_PER_WORD) + { + machine_mode mode; + fixed_size_mode candidate; + FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) + if (is_a (mode, &candidate) + && GET_MODE_INNER (candidate) == QImode) + { + if (GET_MODE_SIZE (candidate) >= size) + break; + if (optab_handler (vec_duplicate_optab, candidate) + != CODE_FOR_nothing) + result = candidate; + } + + if (result != NARROWEST_INT_MODE) + return result; + } + opt_scalar_int_mode tmode; FOR_EACH_MODE_IN_CLASS (tmode, MODE_INT) if (GET_MODE_SIZE (tmode.require ()) < size) @@ -815,12 +836,29 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned int align, unsigned int max_size, by_pieces_operation op) { unsigned HOST_WIDE_INT n_insns = 0; + fixed_size_mode mode; + + if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES) + { + /* NB: Round up L and ALIGN to the widest integer mode for + MAX_SIZE. */ + mode = widest_fixed_size_mode_for_size (max_size, + op == SET_BY_PIECES); + if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) + { + unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode)); + if (up > l) + l = up; + align = GET_MODE_ALIGNMENT (mode); + } + } align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align); while (max_size > 1 && l > 0) { - scalar_int_mode mode = widest_int_mode_for_size (max_size); + mode = widest_fixed_size_mode_for_size (max_size, + op == SET_BY_PIECES); enum insn_code icode; unsigned int modesize = GET_MODE_SIZE (mode); @@ -888,7 +926,7 @@ class pieces_addr void *m_cfndata; public: pieces_addr (rtx, bool, by_pieces_constfn, void *); - rtx adjust (scalar_int_mode, HOST_WIDE_INT); + rtx adjust (fixed_size_mode, HOST_WIDE_INT, by_pieces_prev * = nullptr); void increment_address (HOST_WIDE_INT); void maybe_predec (HOST_WIDE_INT); void maybe_postinc (HOST_WIDE_INT); @@ -990,10 +1028,12 @@ pieces_addr::decide_autoinc (machine_mode ARG_UNUSED (mode), bool reverse, but we still modify the MEM's properties. */ rtx -pieces_addr::adjust (scalar_int_mode mode, HOST_WIDE_INT offset) +pieces_addr::adjust (fixed_size_mode mode, HOST_WIDE_INT offset, + by_pieces_prev *prev) { if (m_constfn) - return m_constfn (m_cfndata, offset, mode); + /* Pass the previous data to m_constfn. */ + return m_constfn (m_cfndata, prev, offset, mode); if (m_obj == NULL_RTX) return NULL_RTX; if (m_auto) @@ -1041,13 +1081,25 @@ pieces_addr::maybe_postinc (HOST_WIDE_INT size) class op_by_pieces_d { + private: + fixed_size_mode get_usable_mode (fixed_size_mode, unsigned int); + fixed_size_mode smallest_fixed_size_mode_for_size (unsigned int); + protected: pieces_addr m_to, m_from; - unsigned HOST_WIDE_INT m_len; + /* Make m_len read-only so that smallest_fixed_size_mode_for_size can + use it to check the valid mode size. */ + const unsigned HOST_WIDE_INT m_len; HOST_WIDE_INT m_offset; unsigned int m_align; unsigned int m_max_size; bool m_reverse; + /* True if this is a stack push. */ + bool m_push; + /* True if targetm.overlap_op_by_pieces_p () returns true. */ + bool m_overlap_op_by_pieces; + /* True if QI vector mode can be used. */ + bool m_qi_vector_mode; /* Virtual functions, overriden by derived classes for the specific operation. */ @@ -1058,8 +1110,9 @@ class op_by_pieces_d } public: - op_by_pieces_d (rtx, bool, rtx, bool, by_pieces_constfn, void *, - unsigned HOST_WIDE_INT, unsigned int); + op_by_pieces_d (unsigned int, rtx, bool, rtx, bool, by_pieces_constfn, + void *, unsigned HOST_WIDE_INT, unsigned int, bool, + bool = false); void run (); }; @@ -1067,17 +1120,21 @@ class op_by_pieces_d objects named TO and FROM, which are identified as loads or stores by TO_LOAD and FROM_LOAD. If FROM is a load, the optional FROM_CFN and its associated FROM_CFN_DATA can be used to replace loads with - constant values. LEN describes the length of the operation. */ + constant values. MAX_PIECES describes the maximum number of bytes + at a time which can be moved efficiently. LEN describes the length + of the operation. */ -op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, - rtx from, bool from_load, +op_by_pieces_d::op_by_pieces_d (unsigned int max_pieces, rtx to, + bool to_load, rtx from, bool from_load, by_pieces_constfn from_cfn, void *from_cfn_data, unsigned HOST_WIDE_INT len, - unsigned int align) + unsigned int align, bool push, + bool qi_vector_mode) : m_to (to, to_load, NULL, NULL), m_from (from, from_load, from_cfn, from_cfn_data), - m_len (len), m_max_size (MOVE_MAX_PIECES + 1) + m_len (len), m_max_size (max_pieces + 1), + m_push (push), m_qi_vector_mode (qi_vector_mode) { int toi = m_to.get_addr_inc (); int fromi = m_from.get_addr_inc (); @@ -1098,7 +1155,9 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, if (by_pieces_ninsns (len, align, m_max_size, MOVE_BY_PIECES) > 2) { /* Find the mode of the largest comparison. */ - scalar_int_mode mode = widest_int_mode_for_size (m_max_size); + fixed_size_mode mode + = widest_fixed_size_mode_for_size (m_max_size, + m_qi_vector_mode); m_from.decide_autoinc (mode, m_reverse, len); m_to.decide_autoinc (mode, m_reverse, len); @@ -1106,6 +1165,56 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align); m_align = align; + + m_overlap_op_by_pieces = targetm.overlap_op_by_pieces_p (); +} + +/* This function returns the largest usable integer mode for LEN bytes + whose size is no bigger than size of MODE. */ + +fixed_size_mode +op_by_pieces_d::get_usable_mode (fixed_size_mode mode, unsigned int len) +{ + unsigned int size; + do + { + size = GET_MODE_SIZE (mode); + if (len >= size && prepare_mode (mode, m_align)) + break; + /* widest_fixed_size_mode_for_size checks SIZE > 1. */ + mode = widest_fixed_size_mode_for_size (size, m_qi_vector_mode); + } + while (1); + return mode; +} + +/* Return the smallest integer or QI vector mode that is not narrower + than SIZE bytes. */ + +fixed_size_mode +op_by_pieces_d::smallest_fixed_size_mode_for_size (unsigned int size) +{ + /* Use QI vector only for > size of WORD. */ + if (m_qi_vector_mode && size > UNITS_PER_WORD) + { + machine_mode mode; + fixed_size_mode candidate; + FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) + if (is_a (mode, &candidate) + && GET_MODE_INNER (candidate) == QImode) + { + /* Don't return a mode wider than M_LEN. */ + if (GET_MODE_SIZE (candidate) > m_len) + break; + + if (GET_MODE_SIZE (candidate) >= size + && (optab_handler (vec_duplicate_optab, candidate) + != CODE_FOR_nothing)) + return candidate; + } + } + + return smallest_int_mode_for_size (size * BITS_PER_UNIT); } /* This function contains the main loop used for expanding a block @@ -1116,50 +1225,98 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, void op_by_pieces_d::run () { - while (m_max_size > 1 && m_len > 0) + if (m_len == 0) + return; + + unsigned HOST_WIDE_INT length = m_len; + + /* widest_fixed_size_mode_for_size checks M_MAX_SIZE > 1. */ + fixed_size_mode mode + = widest_fixed_size_mode_for_size (m_max_size, m_qi_vector_mode); + mode = get_usable_mode (mode, length); + + by_pieces_prev to_prev = { nullptr, mode }; + by_pieces_prev from_prev = { nullptr, mode }; + + do { - scalar_int_mode mode = widest_int_mode_for_size (m_max_size); + unsigned int size = GET_MODE_SIZE (mode); + rtx to1 = NULL_RTX, from1; - if (prepare_mode (mode, m_align)) + while (length >= size) { - unsigned int size = GET_MODE_SIZE (mode); - rtx to1 = NULL_RTX, from1; + if (m_reverse) + m_offset -= size; - while (m_len >= size) - { - if (m_reverse) - m_offset -= size; + to1 = m_to.adjust (mode, m_offset, &to_prev); + to_prev.data = to1; + to_prev.mode = mode; + from1 = m_from.adjust (mode, m_offset, &from_prev); + from_prev.data = from1; + from_prev.mode = mode; - to1 = m_to.adjust (mode, m_offset); - from1 = m_from.adjust (mode, m_offset); + m_to.maybe_predec (-(HOST_WIDE_INT)size); + m_from.maybe_predec (-(HOST_WIDE_INT)size); - m_to.maybe_predec (-(HOST_WIDE_INT)size); - m_from.maybe_predec (-(HOST_WIDE_INT)size); + generate (to1, from1, mode); - generate (to1, from1, mode); + m_to.maybe_postinc (size); + m_from.maybe_postinc (size); - m_to.maybe_postinc (size); - m_from.maybe_postinc (size); + if (!m_reverse) + m_offset += size; - if (!m_reverse) - m_offset += size; + length -= size; + } - m_len -= size; - } + finish_mode (mode); - finish_mode (mode); - } + if (length == 0) + return; - m_max_size = GET_MODE_SIZE (mode); + if (!m_push && m_overlap_op_by_pieces) + { + /* NB: Generate overlapping operations if it is not a stack + push since stack push must not overlap. Get the smallest + fixed size mode for M_LEN bytes. */ + mode = smallest_fixed_size_mode_for_size (length); + mode = get_usable_mode (mode, GET_MODE_SIZE (mode)); + int gap = GET_MODE_SIZE (mode) - length; + if (gap > 0) + { + /* If size of MODE > M_LEN, generate the last operation + in MODE for the remaining bytes with ovelapping memory + from the previois operation. */ + if (m_reverse) + m_offset += gap; + else + m_offset -= gap; + length += gap; + } + } + else + { + /* widest_fixed_size_mode_for_size checks SIZE > 1. */ + mode = widest_fixed_size_mode_for_size (size, + m_qi_vector_mode); + mode = get_usable_mode (mode, length); + } } + while (1); /* The code above should have handled everything. */ - gcc_assert (!m_len); + gcc_assert (!length); } /* Derived class from op_by_pieces_d, providing support for block move operations. */ +#ifdef PUSH_ROUNDING +#define PUSHG_P(to) ((to) == nullptr) +#else +#define PUSHG_P(to) false +#endif + class move_by_pieces_d : public op_by_pieces_d { insn_gen_fn m_gen_fun; @@ -1169,7 +1326,8 @@ class move_by_pieces_d : public op_by_pieces_d public: move_by_pieces_d (rtx to, rtx from, unsigned HOST_WIDE_INT len, unsigned int align) - : op_by_pieces_d (to, false, from, true, NULL, NULL, len, align) + : op_by_pieces_d (MOVE_MAX_PIECES, to, false, from, true, NULL, + NULL, len, align, PUSHG_P (to)) { } rtx finish_retmode (memop_ret); @@ -1263,8 +1421,10 @@ class store_by_pieces_d : public op_by_pieces_d public: store_by_pieces_d (rtx to, by_pieces_constfn cfn, void *cfn_data, - unsigned HOST_WIDE_INT len, unsigned int align) - : op_by_pieces_d (to, false, NULL_RTX, true, cfn, cfn_data, len, align) + unsigned HOST_WIDE_INT len, unsigned int align, + bool qi_vector_mode) + : op_by_pieces_d (STORE_MAX_PIECES, to, false, NULL_RTX, true, cfn, + cfn_data, len, align, false, qi_vector_mode) { } rtx finish_retmode (memop_ret); @@ -1319,7 +1479,7 @@ store_by_pieces_d::finish_retmode (memop_ret retmode) int can_store_by_pieces (unsigned HOST_WIDE_INT len, - rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode), + by_pieces_constfn constfun, void *constfundata, unsigned int align, bool memsetp) { unsigned HOST_WIDE_INT l; @@ -1353,7 +1513,8 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len, max_size = STORE_MAX_PIECES + 1; while (max_size > 1 && l > 0) { - scalar_int_mode mode = widest_int_mode_for_size (max_size); + fixed_size_mode mode + = widest_fixed_size_mode_for_size (max_size, memsetp); icode = optab_handler (mov_optab, mode); if (icode != CODE_FOR_nothing @@ -1366,8 +1527,12 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len, if (reverse) offset -= size; - cst = (*constfun) (constfundata, offset, mode); - if (!targetm.legitimate_constant_p (mode, cst)) + cst = (*constfun) (constfundata, nullptr, offset, mode); + /* All CONST_VECTORs can be loaded for memset since + vec_duplicate_optab is a precondition to pick a + vector mode for the memset expander. */ + if (!((memsetp && VECTOR_MODE_P (mode)) + || targetm.legitimate_constant_p (mode, cst))) return 0; if (!reverse) @@ -1396,7 +1561,7 @@ can_store_by_pieces (unsigned HOST_WIDE_INT len, rtx store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, - rtx (*constfun) (void *, HOST_WIDE_INT, scalar_int_mode), + by_pieces_constfn constfun, void *constfundata, unsigned int align, bool memsetp, memop_ret retmode) { @@ -1411,7 +1576,8 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, memsetp ? SET_BY_PIECES : STORE_BY_PIECES, optimize_insn_for_speed_p ())); - store_by_pieces_d data (to, constfun, constfundata, len, align); + store_by_pieces_d data (to, constfun, constfundata, len, align, + memsetp); data.run (); if (retmode != RETURN_BEGIN) @@ -1420,15 +1586,6 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, return to; } -/* Callback routine for clear_by_pieces. - Return const0_rtx unconditionally. */ - -static rtx -clear_by_pieces_1 (void *, HOST_WIDE_INT, scalar_int_mode) -{ - return const0_rtx; -} - /* Generate several move instructions to clear LEN bytes of block TO. (A MEM rtx with BLKmode). ALIGN is maximum alignment we can assume. */ @@ -1438,7 +1595,10 @@ clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align) if (len == 0) return; - store_by_pieces_d data (to, clear_by_pieces_1, NULL, len, align); + /* Use builtin_memset_read_str to support vector mode broadcast. */ + char c = 0; + store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align, + true); data.run (); } @@ -1460,7 +1620,8 @@ class compare_by_pieces_d : public op_by_pieces_d compare_by_pieces_d (rtx op0, rtx op1, by_pieces_constfn op1_cfn, void *op1_cfn_data, HOST_WIDE_INT len, int align, rtx_code_label *fail_label) - : op_by_pieces_d (op0, true, op1, true, op1_cfn, op1_cfn_data, len, align) + : op_by_pieces_d (COMPARE_MAX_PIECES, op0, true, op1, true, op1_cfn, + op1_cfn_data, len, align, false) { m_fail_label = fail_label; } @@ -1729,7 +1890,7 @@ block_move_libcall_safe_for_call_parm (void) tree fn; /* If arguments are pushed on the stack, then they're safe. */ - if (PUSH_ARGS) + if (targetm.calls.push_argument (0)) return true; /* If registers go on the stack anyway, any argument is sure to clobber @@ -4540,11 +4701,19 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, skip = (reg_parm_stack_space == 0) ? 0 : used; #ifdef PUSH_ROUNDING + /* NB: Let the backend known the number of bytes to push and + decide if push insns should be generated. */ + unsigned int push_size; + if (CONST_INT_P (size)) + push_size = INTVAL (size); + else + push_size = 0; + /* Do it with several push insns if that doesn't take lots of insns and if there is no difficulty with push insns that skip bytes on the stack for alignment purposes. */ if (args_addr == 0 - && PUSH_ARGS + && targetm.calls.push_argument (push_size) && CONST_INT_P (size) && skip == 0 && MEM_ALIGN (xinner) >= align @@ -4749,7 +4918,7 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, anti_adjust_stack (gen_int_mode (extra, Pmode)); #ifdef PUSH_ROUNDING - if (args_addr == 0 && PUSH_ARGS) + if (args_addr == 0 && targetm.calls.push_argument (0)) emit_single_push_insn (mode, x, type); else #endif @@ -5646,7 +5815,8 @@ emit_storent_insn (rtx to, rtx from) /* Helper function for store_expr storing of STRING_CST. */ static rtx -string_cst_read_str (void *data, HOST_WIDE_INT offset, scalar_int_mode mode) +string_cst_read_str (void *data, void *, HOST_WIDE_INT offset, + fixed_size_mode mode) { tree str = (tree) data; @@ -5661,10 +5831,13 @@ string_cst_read_str (void *data, HOST_WIDE_INT offset, scalar_int_mode mode) size_t l = TREE_STRING_LENGTH (str) - offset; memcpy (p, TREE_STRING_POINTER (str) + offset, l); memset (p + l, '\0', GET_MODE_SIZE (mode) - l); - return c_readstr (p, mode, false); + return c_readstr (p, as_a (mode), false); } - return c_readstr (TREE_STRING_POINTER (str) + offset, mode, false); + /* The by-pieces infrastructure does not try to pick a vector mode + for storing STRING_CST. */ + return c_readstr (TREE_STRING_POINTER (str) + offset, + as_a (mode), false); } /* Generate code for computing expression EXP, @@ -6970,7 +7143,8 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, && eltmode == GET_MODE_INNER (mode) && ((icode = optab_handler (vec_duplicate_optab, mode)) != CODE_FOR_nothing) - && (elt = uniform_vector_p (exp))) + && (elt = uniform_vector_p (exp)) + && !VECTOR_TYPE_P (TREE_TYPE (elt))) { class expand_operand ops[2]; create_output_operand (&ops[0], target, mode); @@ -8421,6 +8595,19 @@ expand_constructor (tree exp, rtx target, enum expand_modifier modifier, return constructor; } + /* If the CTOR is available in static storage and not mostly + zeros and we can move it by pieces prefer to do so since + that's usually more efficient than performing a series of + stores from immediates. */ + if (avoid_temp_mem + && TREE_STATIC (exp) + && TREE_CONSTANT (exp) + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)) + && can_move_by_pieces (tree_to_uhwi (TYPE_SIZE_UNIT (type)), + TYPE_ALIGN (type)) + && ! mostly_zeros_p (exp)) + return NULL_RTX; + /* Handle calls that pass values in multiple non-contiguous locations. The Irix 6 ABI has examples of this. */ if (target == 0 || ! safe_from_p (target, exp, 1) diff --git a/gcc/expr.h b/gcc/expr.h index 1f0177a4cfa..084b26c80d8 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -107,7 +107,15 @@ enum block_op_methods BLOCK_OP_NO_LIBCALL_RET }; -typedef rtx (*by_pieces_constfn) (void *, HOST_WIDE_INT, scalar_int_mode); +typedef rtx (*by_pieces_constfn) (void *, void *, HOST_WIDE_INT, + fixed_size_mode); + +/* The second pointer passed to by_pieces_constfn. */ +struct by_pieces_prev +{ + rtx data; + fixed_size_mode mode; +}; extern rtx emit_block_move (rtx, rtx, rtx, enum block_op_methods); extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, diff --git a/gcc/hooks.c b/gcc/hooks.c index 680271f76a4..4f14abff206 100644 --- a/gcc/hooks.c +++ b/gcc/hooks.c @@ -520,6 +520,14 @@ hook_void_gcc_optionsp (struct gcc_options *) { } +/* Generic hook that takes an unsigned int and returns true. */ + +bool +hook_bool_uint_true (unsigned int) +{ + return true; +} + /* Generic hook that takes an unsigned int, an unsigned int pointer and returns false. */ diff --git a/gcc/hooks.h b/gcc/hooks.h index add9a742e41..71781c790a1 100644 --- a/gcc/hooks.h +++ b/gcc/hooks.h @@ -89,6 +89,7 @@ extern void hook_void_tree (tree); extern void hook_void_tree_treeptr (tree, tree *); extern void hook_void_int_int (int, int); extern void hook_void_gcc_optionsp (struct gcc_options *); +extern bool hook_bool_uint_true (unsigned int); extern bool hook_bool_uint_uintp_false (unsigned int, unsigned int *); extern int hook_int_uint_mode_1 (unsigned int, machine_mode); diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index d209a52f823..c158cda4fb9 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -3091,6 +3091,46 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall *call) expand_ifn_atomic_compare_exchange (call); } +/* Expand atomic add fetch and cmp with 0. */ + +static void +expand_ATOMIC_ADD_FETCH_CMP_0 (internal_fn, gcall *call) +{ + expand_ifn_atomic_op_fetch_cmp_0 (call); +} + +/* Expand atomic sub fetch and cmp with 0. */ + +static void +expand_ATOMIC_SUB_FETCH_CMP_0 (internal_fn, gcall *call) +{ + expand_ifn_atomic_op_fetch_cmp_0 (call); +} + +/* Expand atomic and fetch and cmp with 0. */ + +static void +expand_ATOMIC_AND_FETCH_CMP_0 (internal_fn, gcall *call) +{ + expand_ifn_atomic_op_fetch_cmp_0 (call); +} + +/* Expand atomic or fetch and cmp with 0. */ + +static void +expand_ATOMIC_OR_FETCH_CMP_0 (internal_fn, gcall *call) +{ + expand_ifn_atomic_op_fetch_cmp_0 (call); +} + +/* Expand atomic xor fetch and cmp with 0. */ + +static void +expand_ATOMIC_XOR_FETCH_CMP_0 (internal_fn, gcall *call) +{ + expand_ifn_atomic_op_fetch_cmp_0 (call); +} + /* Expand LAUNDER to assignment, lhs = arg0. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index daeace7a34e..e68f6557441 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -386,6 +386,11 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET, ECF_LEAF, NULL) DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF, NULL) DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF, NULL) DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF, NULL) +DEF_INTERNAL_FN (ATOMIC_ADD_FETCH_CMP_0, ECF_LEAF, NULL) +DEF_INTERNAL_FN (ATOMIC_SUB_FETCH_CMP_0, ECF_LEAF, NULL) +DEF_INTERNAL_FN (ATOMIC_AND_FETCH_CMP_0, ECF_LEAF, NULL) +DEF_INTERNAL_FN (ATOMIC_OR_FETCH_CMP_0, ECF_LEAF, NULL) +DEF_INTERNAL_FN (ATOMIC_XOR_FETCH_CMP_0, ECF_LEAF, NULL) /* To implement [[fallthrough]]. */ DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index c6599ce4894..e80707a0697 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -232,4 +232,13 @@ extern void expand_PHI (internal_fn, gcall *); extern bool vectorized_internal_fn_supported_p (internal_fn, tree); +enum { + ATOMIC_OP_FETCH_CMP_0_EQ = 0, + ATOMIC_OP_FETCH_CMP_0_NE = 1, + ATOMIC_OP_FETCH_CMP_0_LT = 2, + ATOMIC_OP_FETCH_CMP_0_LE = 3, + ATOMIC_OP_FETCH_CMP_0_GT = 4, + ATOMIC_OP_FETCH_CMP_0_GE = 5 +}; + #endif diff --git a/gcc/match.pd b/gcc/match.pd index e89601c0c14..e5e5deafef5 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -98,6 +98,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (define_operator_list COND_TERNARY IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS) +/* __atomic_fetch_or_*, __atomic_fetch_xor_*, __atomic_xor_fetch_* */ +(define_operator_list ATOMIC_FETCH_OR_XOR_N + BUILT_IN_ATOMIC_FETCH_OR_1 BUILT_IN_ATOMIC_FETCH_OR_2 + BUILT_IN_ATOMIC_FETCH_OR_4 BUILT_IN_ATOMIC_FETCH_OR_8 + BUILT_IN_ATOMIC_FETCH_OR_16 + BUILT_IN_ATOMIC_FETCH_XOR_1 BUILT_IN_ATOMIC_FETCH_XOR_2 + BUILT_IN_ATOMIC_FETCH_XOR_4 BUILT_IN_ATOMIC_FETCH_XOR_8 + BUILT_IN_ATOMIC_FETCH_XOR_16 + BUILT_IN_ATOMIC_XOR_FETCH_1 BUILT_IN_ATOMIC_XOR_FETCH_2 + BUILT_IN_ATOMIC_XOR_FETCH_4 BUILT_IN_ATOMIC_XOR_FETCH_8 + BUILT_IN_ATOMIC_XOR_FETCH_16) +/* __sync_fetch_and_or_*, __sync_fetch_and_xor_*, __sync_xor_and_fetch_* */ +(define_operator_list SYNC_FETCH_OR_XOR_N + BUILT_IN_SYNC_FETCH_AND_OR_1 BUILT_IN_SYNC_FETCH_AND_OR_2 + BUILT_IN_SYNC_FETCH_AND_OR_4 BUILT_IN_SYNC_FETCH_AND_OR_8 + BUILT_IN_SYNC_FETCH_AND_OR_16 + BUILT_IN_SYNC_FETCH_AND_XOR_1 BUILT_IN_SYNC_FETCH_AND_XOR_2 + BUILT_IN_SYNC_FETCH_AND_XOR_4 BUILT_IN_SYNC_FETCH_AND_XOR_8 + BUILT_IN_SYNC_FETCH_AND_XOR_16 + BUILT_IN_SYNC_XOR_AND_FETCH_1 BUILT_IN_SYNC_XOR_AND_FETCH_2 + BUILT_IN_SYNC_XOR_AND_FETCH_4 BUILT_IN_SYNC_XOR_AND_FETCH_8 + BUILT_IN_SYNC_XOR_AND_FETCH_16) +/* __atomic_fetch_and_*. */ +(define_operator_list ATOMIC_FETCH_AND_N + BUILT_IN_ATOMIC_FETCH_AND_1 BUILT_IN_ATOMIC_FETCH_AND_2 + BUILT_IN_ATOMIC_FETCH_AND_4 BUILT_IN_ATOMIC_FETCH_AND_8 + BUILT_IN_ATOMIC_FETCH_AND_16) +/* __sync_fetch_and_and_*. */ +(define_operator_list SYNC_FETCH_AND_AND_N + BUILT_IN_SYNC_FETCH_AND_AND_1 BUILT_IN_SYNC_FETCH_AND_AND_2 + BUILT_IN_SYNC_FETCH_AND_AND_4 BUILT_IN_SYNC_FETCH_AND_AND_8 + BUILT_IN_SYNC_FETCH_AND_AND_16) + /* With nop_convert? combine convert? and view_convert? in one pattern plus conditionalize on tree_nop_conversion_p conversions. */ (match (nop_convert @0) @@ -3672,6 +3705,84 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (vec_cond @0 (op! @3 @1) (op! @3 @2)))) #endif +#if GIMPLE +(match (nop_atomic_bit_test_and_p @0 @1 @4) + (bit_and (convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3)) + INTEGER_CST@1) + (with { + int ibit = tree_log2 (@0); + int ibit2 = tree_log2 (@1); + } + (if (ibit == ibit2 + && ibit >= 0 + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))))) + +(match (nop_atomic_bit_test_and_p @0 @1 @3) + (bit_and (convert?@3 (SYNC_FETCH_OR_XOR_N @2 INTEGER_CST@0)) + INTEGER_CST@1) + (with { + int ibit = tree_log2 (@0); + int ibit2 = tree_log2 (@1); + } + (if (ibit == ibit2 + && ibit >= 0 + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))))) + +(match (nop_atomic_bit_test_and_p @0 @0 @4) + (bit_and:c + (convert1?@4 + (ATOMIC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@5 @6)) @3)) + (convert2? @0)) + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))))) + +(match (nop_atomic_bit_test_and_p @0 @0 @4) + (bit_and:c + (convert1?@4 + (SYNC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@3 @5)))) + (convert2? @0)) + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))))) + +(match (nop_atomic_bit_test_and_p @0 @1 @3) + (bit_and@4 (convert?@3 (ATOMIC_FETCH_AND_N @2 INTEGER_CST@0 @5)) + INTEGER_CST@1) + (with { + int ibit = wi::exact_log2 (wi::zext (wi::bit_not (wi::to_wide (@0)), + TYPE_PRECISION(type))); + int ibit2 = tree_log2 (@1); + } + (if (ibit == ibit2 + && ibit >= 0 + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))))) + +(match (nop_atomic_bit_test_and_p @0 @1 @3) + (bit_and@4 + (convert?@3 (SYNC_FETCH_AND_AND_N @2 INTEGER_CST@0)) + INTEGER_CST@1) + (with { + int ibit = wi::exact_log2 (wi::zext (wi::bit_not (wi::to_wide (@0)), + TYPE_PRECISION(type))); + int ibit2 = tree_log2 (@1); + } + (if (ibit == ibit2 + && ibit >= 0 + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))))) + +(match (nop_atomic_bit_test_and_p @4 @0 @3) + (bit_and:c + (convert1?@3 + (ATOMIC_FETCH_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 integer_onep@6 @7))) @5)) + (convert2? @0)) + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4))))) + +(match (nop_atomic_bit_test_and_p @4 @0 @3) + (bit_and:c + (convert1?@3 + (SYNC_FETCH_AND_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 integer_onep@6 @7))))) + (convert2? @0)) + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4))))) + +#endif + /* (v ? w : 0) ? a : b is just (v & w) ? a : b Currently disabled after pass lvec because ARM understands VEC_COND_EXPR but not a plain v==w fed to BIT_IOR_EXPR. */ diff --git a/gcc/optabs.def b/gcc/optabs.def index b192a9d070b..318a2ffed37 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -441,6 +441,11 @@ OPTAB_D (atomic_sub_fetch_optab, "atomic_sub_fetch$I$a") OPTAB_D (atomic_sub_optab, "atomic_sub$I$a") OPTAB_D (atomic_xor_fetch_optab, "atomic_xor_fetch$I$a") OPTAB_D (atomic_xor_optab, "atomic_xor$I$a") +OPTAB_D (atomic_add_fetch_cmp_0_optab, "atomic_add_fetch_cmp_0$I$a") +OPTAB_D (atomic_sub_fetch_cmp_0_optab, "atomic_sub_fetch_cmp_0$I$a") +OPTAB_D (atomic_and_fetch_cmp_0_optab, "atomic_and_fetch_cmp_0$I$a") +OPTAB_D (atomic_or_fetch_cmp_0_optab, "atomic_or_fetch_cmp_0$I$a") +OPTAB_D (atomic_xor_fetch_cmp_0_optab, "atomic_xor_fetch_cmp_0$I$a") OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") diff --git a/gcc/rtl.h b/gcc/rtl.h index efdaad20993..d3165e311c3 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2455,6 +2455,8 @@ extern bool subreg_offset_representable_p (unsigned int, machine_mode, extern unsigned int subreg_regno (const_rtx); extern int simplify_subreg_regno (unsigned int, machine_mode, poly_uint64, machine_mode); +extern int lowpart_subreg_regno (unsigned int, machine_mode, + machine_mode); extern unsigned int subreg_nregs (const_rtx); extern unsigned int subreg_nregs_with_regno (unsigned int, const_rtx); extern unsigned HOST_WIDE_INT nonzero_bits (const_rtx, machine_mode); diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 67a49e65fd8..f4663b04c2e 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -4341,6 +4341,17 @@ simplify_subreg_regno (unsigned int xregno, machine_mode xmode, return (int) yregno; } +/* A wrapper around simplify_subreg_regno that uses subreg_lowpart_offset + (xmode, ymode) as the offset. */ + +int +lowpart_subreg_regno (unsigned int regno, machine_mode xmode, + machine_mode ymode) +{ + poly_uint64 offset = subreg_lowpart_offset (xmode, ymode); + return simplify_subreg_regno (regno, xmode, offset, ymode); +} + /* Return the final regno that a subreg expression refers to. */ unsigned int subreg_regno (const_rtx x) @@ -4886,7 +4897,7 @@ nonzero_bits1 (const_rtx x, scalar_int_mode mode, const_rtx known_x, /* If PUSH_ROUNDING is defined, it is possible for the stack to be momentarily aligned only to that amount, so we pick the least alignment. */ - if (x == stack_pointer_rtx && PUSH_ARGS) + if (x == stack_pointer_rtx && targetm.calls.push_argument (0)) { poly_uint64 rounded_1 = PUSH_ROUNDING (poly_int64 (1)); alignment = MIN (known_alignment (rounded_1), alignment); diff --git a/gcc/target.def b/gcc/target.def index 0ebfb58fa6f..ba947812bd4 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2692,6 +2692,15 @@ DEFHOOK rtx, (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, int cmp_code, tree op0, tree op1, int bit_code), NULL) +DEFHOOK +(gen_memset_scratch_rtx, + "This hook should return an rtx for a scratch register in @var{mode} to\n\ +be used when expanding memset calls. The backend can use a hard scratch\n\ +register to avoid stack realignment when expanding memset. The default\n\ +is @code{gen_reg_rtx}.", + rtx, (machine_mode mode), + gen_reg_rtx) + /* Return a new value for loop unroll size. */ DEFHOOK (loop_unroll_adjust, @@ -2956,6 +2965,14 @@ The default value of this hook is based on target's libc.", bool, (void), default_has_ifunc_p) +/* True if it is OK to reference indirect function resolvers locally. */ +DEFHOOK +(ifunc_ref_local_ok, + "Return true if it is OK to reference indirect function resolvers\n\ +locally. The default is to return false.", + bool, (void), + hook_bool_void_false) + /* True if it is OK to do sibling call optimization for the specified call expression EXP. DECL will be the called function, or NULL if this is an indirect call. */ @@ -3642,6 +3659,15 @@ move would be greater than that of a library call.", enum by_pieces_operation op, bool speed_p), default_use_by_pieces_infrastructure_p) +DEFHOOK +(overlap_op_by_pieces_p, + "This target hook should return true if when the @code{by_pieces}\n\ +infrastructure is used, an offset adjusted unaligned memory operation\n\ +in the smallest integer mode for the last piece operation of a memory\n\ +region can be generated to avoid doing more than one smaller operations.", + bool, (void), + hook_bool_void_false) + DEFHOOK (compare_by_pieces_branch_ratio, "When expanding a block comparison in MODE, gcc can try to reduce the\n\ @@ -4708,6 +4734,20 @@ Most ports do not need to implement anything for this hook.", void, (void), hook_void_void) +DEFHOOK +(push_argument, + "This target hook returns @code{true} if push instructions will be\n\ +used to pass outgoing arguments. When the push instruction usage is\n\ +optional, @var{npush} is nonzero to indicate the number of bytes to\n\ +push. Otherwise, @var{npush} is zero. If the target machine does not\n\ +have a push instruction or push instruction should be avoided,\n\ +@code{false} should be returned. That directs GCC to use an alternate\n\ +strategy: to allocate the entire argument block and then store the\n\ +arguments into it. If this target hook may return @code{true},\n\ +@code{PUSH_ROUNDING} must be defined.", + bool, (unsigned int npush), + default_push_argument) + DEFHOOK (strict_argument_naming, "Define this hook to return @code{true} if the location where a function\n\ diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 952fad422eb..4aab37e6a85 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -767,6 +767,18 @@ hook_void_CUMULATIVE_ARGS_tree (cumulative_args_t ca ATTRIBUTE_UNUSED, { } +/* Default implementation of TARGET_PUSH_ARGUMENT. */ + +bool +default_push_argument (unsigned int) +{ +#ifdef PUSH_ROUNDING + return !ACCUMULATE_OUTGOING_ARGS; +#else + return false; +#endif +} + void default_function_arg_advance (cumulative_args_t, const function_arg_info &) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 9928d064abd..458c3a6a4a9 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -149,6 +149,7 @@ extern const char *hook_invalid_arg_for_unprototyped_fn (const_tree, const_tree, const_tree); extern void default_function_arg_advance (cumulative_args_t, const function_arg_info &); +extern bool default_push_argument (unsigned int); extern HOST_WIDE_INT default_function_arg_offset (machine_mode, const_tree); extern pad_direction default_function_arg_padding (machine_mode, const_tree); extern rtx default_function_arg (cumulative_args_t, const function_arg_info &); diff --git a/gcc/testsuite/g++.dg/pr90773-1.h b/gcc/testsuite/g++.dg/pr90773-1.h new file mode 100644 index 00000000000..abdb78b078b --- /dev/null +++ b/gcc/testsuite/g++.dg/pr90773-1.h @@ -0,0 +1,14 @@ +class fixed_wide_int_storage { +public: + long val[10]; + int len; + fixed_wide_int_storage () + { + len = sizeof (val) / sizeof (val[0]); + for (int i = 0; i < len; i++) + val[i] = i; + } +}; + +extern void foo (fixed_wide_int_storage); +extern int record_increment(void); diff --git a/gcc/testsuite/g++.dg/pr90773-1a.C b/gcc/testsuite/g++.dg/pr90773-1a.C new file mode 100644 index 00000000000..3ab8d929f74 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr90773-1a.C @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-options "-O2" } +// { dg-additional-options "-mno-avx -msse2 -mtune=skylake" { target { i?86-*-* x86_64-*-* } } } + +#include "pr90773-1.h" + +int +record_increment(void) +{ + fixed_wide_int_storage x; + foo (x); + return 0; +} diff --git a/gcc/testsuite/g++.dg/pr90773-1b.C b/gcc/testsuite/g++.dg/pr90773-1b.C new file mode 100644 index 00000000000..9713b2dd612 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr90773-1b.C @@ -0,0 +1,5 @@ +// { dg-do compile } +// { dg-options "-O2" } +// { dg-additional-options "-mno-avx512f -march=skylake" { target { i?86-*-* x86_64-*-* } } } + +#include "pr90773-1a.C" diff --git a/gcc/testsuite/g++.dg/pr90773-1c.C b/gcc/testsuite/g++.dg/pr90773-1c.C new file mode 100644 index 00000000000..699357a88dc --- /dev/null +++ b/gcc/testsuite/g++.dg/pr90773-1c.C @@ -0,0 +1,5 @@ +// { dg-do compile } +// { dg-options "-O2" } +// { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* x86_64-*-* } } } + +#include "pr90773-1a.C" diff --git a/gcc/testsuite/g++.dg/pr90773-1d.C b/gcc/testsuite/g++.dg/pr90773-1d.C new file mode 100644 index 00000000000..bf9d8543c1b --- /dev/null +++ b/gcc/testsuite/g++.dg/pr90773-1d.C @@ -0,0 +1,19 @@ +// { dg-do run } +// { dg-options "-O2" } +// { dg-additional-options "-march=native" { target { i?86-*-* x86_64-*-* } } } +// { dg-additional-sources "pr90773-1a.C" } + +#include "pr90773-1.h" + +void +foo (fixed_wide_int_storage x) +{ + for (int i = 0; i < x.len; i++) + if (x.val[i] != i) + __builtin_abort (); +} + +int main () +{ + return record_increment (); +} diff --git a/gcc/testsuite/g++.target/i386/pr102566-1.C b/gcc/testsuite/g++.target/i386/pr102566-1.C new file mode 100644 index 00000000000..94a66d717cc --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-1.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1 << 0) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1 << 30) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1 << 31) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-2.C b/gcc/testsuite/g++.target/i386/pr102566-2.C new file mode 100644 index 00000000000..4f2aea961c2 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-2.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1 << 0) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1 << 30) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1 << 31) + return i.fetch_or(BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-3.C b/gcc/testsuite/g++.target/i386/pr102566-3.C new file mode 100644 index 00000000000..e88921dd155 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-3.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1 << 0) + return !(i.fetch_or(BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1 << 30) + return !(i.fetch_or(BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1 << 31) + return !(i.fetch_or(BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-4.C b/gcc/testsuite/g++.target/i386/pr102566-4.C new file mode 100644 index 00000000000..44d1362ac2e --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-4.C @@ -0,0 +1,29 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +typedef int __attribute__ ((mode (__word__))) int_type; + +#define BIT (1 << 0) + +bool +tbit0 (std::atomic &i) +{ + return i.fetch_or(BIT, std::memory_order_relaxed) & ~1; +} + +bool +tbit30 (std::atomic &i) +{ + return i.fetch_or(BIT, std::memory_order_relaxed) & ~2; +} + +bool +tbit31 (std::atomic &i) +{ + return i.fetch_or(BIT, std::memory_order_relaxed) & ~4; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchg" 3 } } */ +/* { dg-final { scan-assembler-not "bts" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-5a.C b/gcc/testsuite/g++.target/i386/pr102566-5a.C new file mode 100644 index 00000000000..f9595bee2ab --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-5a.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1 << 0) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1 << 30) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1 << 31) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-5b.C b/gcc/testsuite/g++.target/i386/pr102566-5b.C new file mode 100644 index 00000000000..d917b27a918 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-5b.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target { c++11 && { ! ia32 } } } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1ll << 0) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1ll << 30) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1ll << 63) + return i.fetch_and(~BIT, std::memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrq" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-6a.C b/gcc/testsuite/g++.target/i386/pr102566-6a.C new file mode 100644 index 00000000000..01d495eda23 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-6a.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target c++11 } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1 << 0) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1 << 30) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1 << 31) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr102566-6b.C b/gcc/testsuite/g++.target/i386/pr102566-6b.C new file mode 100644 index 00000000000..adc11fcbf2d --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr102566-6b.C @@ -0,0 +1,31 @@ +/* { dg-do compile { target { c++11 && { ! ia32 } } } } */ +/* { dg-options "-O2" } */ + +#include + +bool +tbit0 (std::atomic &i) +{ +#define BIT (1ll << 0) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit30 (std::atomic &i) +{ +#define BIT (1ll << 30) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +bool +tbit31 (std::atomic &i) +{ +#define BIT (1ll << 63) + return !(i.fetch_and(~BIT, std::memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrq" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/g++.target/i386/pr35513-1.C b/gcc/testsuite/g++.target/i386/pr35513-1.C new file mode 100644 index 00000000000..daa615662c5 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr35513-1.C @@ -0,0 +1,25 @@ +// { dg-do run { target property_1_needed } } +// { dg-options "-O2 -mno-direct-extern-access" } + +#include + +class Bug +{ +}; + +int throw_bug() +{ + throw Bug(); + + return 0; +} + +int main() +{ + try { + std::cout << throw_bug(); + } catch (Bug bug) { + }; + + return 0; +} diff --git a/gcc/testsuite/g++.target/i386/pr35513-2.C b/gcc/testsuite/g++.target/i386/pr35513-2.C new file mode 100644 index 00000000000..ecccdaeb666 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr35513-2.C @@ -0,0 +1,53 @@ +// { dg-do run { target property_1_needed } } +// { dg-options "-O2 -mno-direct-extern-access" } + +class Foo +{ +public: + Foo(int n) : n_(n) { } + int f() { return n_; } + + int badTest(); + int goodTest(); + +private: + + int n_; +}; + +int Foo::badTest() +{ + try { + throw int(99); + } + + catch (int &i) { + n_ = 16; + } + + return n_; +} + + +int Foo::goodTest() +{ + int n; + + try { + throw int(99); + } + + catch (int &i) { + n = 16; + } + + return n_; +} + +int main() +{ + Foo foo(5); + foo.goodTest(); + foo.badTest(); + return 0; +} diff --git a/gcc/testsuite/g++.target/i386/pr80566-1.C b/gcc/testsuite/g++.target/i386/pr80566-1.C new file mode 100644 index 00000000000..29da31d6bb6 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr80566-1.C @@ -0,0 +1,15 @@ +// { dg-do compile } +// { dg-options "-O2 -march=haswell -mtune-ctrl=avx256_store_by_pieces" } + +#include + +int * +foo() +{ + int * p = new int[16]; + memset(p,0,16*sizeof(int)); + return p; +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ diff --git a/gcc/testsuite/g++.target/i386/pr80566-2.C b/gcc/testsuite/g++.target/i386/pr80566-2.C new file mode 100644 index 00000000000..9ffd2c8cadb --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr80566-2.C @@ -0,0 +1,14 @@ +// { dg-do compile } +// { dg-options "-O2 -march=haswell -mtune-ctrl=avx256_move_by_pieces" } + +#include + +int * +foo(int * q) +{ + int * p = new int[16]; + memcpy(q,p,16*sizeof(int)); + return p; +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */ diff --git a/gcc/testsuite/gcc.dg/pr103184-1.c b/gcc/testsuite/gcc.dg/pr103184-1.c new file mode 100644 index 00000000000..e567f95f63f --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103184-1.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern char foo; +extern unsigned char bar; + +int +foo1 (void) +{ + return __sync_fetch_and_and (&foo, ~1) & 1; +} + +int +foo2 (void) +{ + return __sync_fetch_and_or (&foo, 1) & 1; +} + +int +foo3 (void) +{ + return __sync_fetch_and_xor (&foo, 1) & 1; +} + +unsigned short +bar1 (void) +{ + return __sync_fetch_and_and (&bar, ~1) & 1; +} + +unsigned short +bar2 (void) +{ + return __sync_fetch_and_or (&bar, 1) & 1; +} + +unsigned short +bar3 (void) +{ + return __sync_fetch_and_xor (&bar, 1) & 1; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchgb" 6 { target { x86_64-*-* i?86-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/pr103184-2.c b/gcc/testsuite/gcc.dg/pr103184-2.c new file mode 100644 index 00000000000..499761fdbfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103184-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include + +int +tbit0 (_Atomic int* a, int n) +{ +#define BIT (0x1 << n) + return atomic_fetch_or (a, BIT) & BIT; +#undef BIT +} diff --git a/gcc/testsuite/gcc.dg/pr103268-1.c b/gcc/testsuite/gcc.dg/pr103268-1.c new file mode 100644 index 00000000000..6d583d55d6d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103268-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern int si; +long +test_types (void) +{ + unsigned int u2 = __atomic_fetch_xor (&si, 0, 5); + return u2; +} diff --git a/gcc/testsuite/gcc.dg/pr103268-2.c b/gcc/testsuite/gcc.dg/pr103268-2.c new file mode 100644 index 00000000000..12283bb43d9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103268-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +extern long pscc_a_2_3; +extern int pscc_a_1_4; + +void +pscc (void) +{ + pscc_a_1_4 = __sync_fetch_and_and (&pscc_a_2_3, 1); +} + diff --git a/gcc/testsuite/gcc.dg/pr89984.c b/gcc/testsuite/gcc.dg/pr89984.c new file mode 100644 index 00000000000..471fe92bc86 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr89984.c @@ -0,0 +1,20 @@ +/* PR target/89984 */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +__attribute__((noipa)) float +foo (float x, float y) +{ + return x * __builtin_copysignf (1.0f, y) + y; +} + +int +main () +{ + if (foo (1.25f, 7.25f) != 1.25f + 7.25f + || foo (1.75f, -3.25f) != -1.75f + -3.25f + || foo (-2.25f, 7.5f) != -2.25f + 7.5f + || foo (-3.0f, -4.0f) != 3.0f + -4.0f) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98737-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98737-1.c new file mode 100644 index 00000000000..e313a7fa79d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98737-1.c @@ -0,0 +1,148 @@ +/* PR target/98737 */ +/* { dg-do compile { target i?86-*-* x86_64-*-* powerpc*-*-* aarch64*-*-* } } */ +/* { dg-options "-O2 -fdump-tree-optimized -fcompare-debug" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-tree-dump-not "__atomic_fetch_" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__sync_fetch_and_" "optimized" } } */ + +typedef signed char schar; +typedef unsigned long ulong; +typedef unsigned int uint; +typedef unsigned short ushort; +typedef unsigned char uchar; +long vlong; +int vint; +short vshort; +schar vschar; +ulong vulong; +uint vuint; +ushort vushort; +uchar vuchar; +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + ut z = f (&v##t, x, ##__VA_ARGS__); \ + t w = (t) z; \ + return w o x; \ +} +#define B(n, f, o, ...) \ + A(n##0, long, ulong, f, o, ##__VA_ARGS__) \ + A(n##1, int, uint, f, o, ##__VA_ARGS__) \ + A(n##2, short, ushort, f, o, ##__VA_ARGS__) \ + A(n##3, schar, uchar, f, o, ##__VA_ARGS__) \ + A(n##4, ulong, ulong, f, o, ##__VA_ARGS__) \ + A(n##5, uint, uint, f, o, ##__VA_ARGS__) \ + A(n##6, ushort, ushort, f, o, ##__VA_ARGS__) \ + A(n##7, uchar, uchar, f, o, ##__VA_ARGS__) + +B(00, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(01, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(02, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(03, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +B(04, __atomic_fetch_or, |, __ATOMIC_RELAXED) +B(05, __sync_fetch_and_add, +) +B(06, __sync_fetch_and_sub, -) +B(07, __sync_fetch_and_and, &) +B(08, __sync_fetch_and_xor, ^) +B(09, __sync_fetch_and_or, |) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + ut z = f (&v##t, 42, ##__VA_ARGS__); \ + t w = (t) z; \ + return w o 42; \ +} + +B(10, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(11, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(12, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(13, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +B(14, __atomic_fetch_or, |, __ATOMIC_RELAXED) +B(15, __sync_fetch_and_add, +) +B(16, __sync_fetch_and_sub, -) +B(17, __sync_fetch_and_and, &) +B(18, __sync_fetch_and_xor, ^) +B(19, __sync_fetch_and_or, |) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + ut z = f (&v##t, x, ##__VA_ARGS__); \ + t w = (t) z; \ + t v = w o x; \ + return v == 0; \ +} + +B(20, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(21, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(22, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(23, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +B(24, __atomic_fetch_or, |, __ATOMIC_RELAXED) +B(25, __sync_fetch_and_add, +) +B(26, __sync_fetch_and_sub, -) +B(27, __sync_fetch_and_and, &) +B(28, __sync_fetch_and_xor, ^) +B(29, __sync_fetch_and_or, |) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + ut z = f (&v##t, 42, ##__VA_ARGS__); \ + t w = (t) z; \ + t v = w o 42; \ + return v != 0; \ +} + +B(30, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(31, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(32, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(33, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +B(34, __atomic_fetch_or, |, __ATOMIC_RELAXED) +B(35, __sync_fetch_and_add, +) +B(36, __sync_fetch_and_sub, -) +B(37, __sync_fetch_and_and, &) +B(38, __sync_fetch_and_xor, ^) +B(39, __sync_fetch_and_or, |) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + return (t) (((t) f (&v##t, x, ##__VA_ARGS__)) \ + o x) != 0; \ +} + +B(40, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(41, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(42, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(43, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +B(44, __atomic_fetch_or, |, __ATOMIC_RELAXED) +B(45, __sync_fetch_and_add, +) +B(46, __sync_fetch_and_sub, -) +B(47, __sync_fetch_and_and, &) +B(48, __sync_fetch_and_xor, ^) +B(49, __sync_fetch_and_or, |) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + return (t) (((t) f (&v##t, 42, ##__VA_ARGS__))\ + o 42) == 0; \ +} + +B(50, __atomic_fetch_add, +, __ATOMIC_RELAXED) +B(51, __atomic_fetch_sub, -, __ATOMIC_RELAXED) +B(52, __atomic_fetch_and, &, __ATOMIC_RELAXED) +B(53, __atomic_fetch_xor, ^, __ATOMIC_RELAXED) +/* (whatever | 42) == 0 is 0, so we can't test this. */ +/* B(54, __atomic_fetch_or, |, __ATOMIC_RELAXED) */ +B(55, __sync_fetch_and_add, +) +B(56, __sync_fetch_and_sub, -) +B(57, __sync_fetch_and_and, &) +B(58, __sync_fetch_and_xor, ^) +/* B(59, __sync_fetch_and_or, |) */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98737-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98737-2.c new file mode 100644 index 00000000000..09149bcd4fe --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98737-2.c @@ -0,0 +1,123 @@ +/* PR target/98737 */ +/* { dg-do compile { target i?86-*-* x86_64-*-* powerpc*-*-* aarch64*-*-* } } */ +/* { dg-options "-O2 -fdump-tree-optimized -fcompare-debug" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-tree-dump-not "__atomic_\[^f]" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__sync_\[^f]" "optimized" } } */ + +typedef signed char schar; +typedef unsigned long ulong; +typedef unsigned int uint; +typedef unsigned short ushort; +typedef unsigned char uchar; +long vlong; +int vint; +short vshort; +schar vschar; +ulong vulong; +uint vuint; +ushort vushort; +uchar vuchar; +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + ut z = f (&v##t, x, ##__VA_ARGS__); \ + t w = (t) z; \ + return w o x; \ +} +#define B(n, f, o, ...) \ + A(n##0, long, ulong, f, o, ##__VA_ARGS__) \ + A(n##1, int, uint, f, o, ##__VA_ARGS__) \ + A(n##2, short, ushort, f, o, ##__VA_ARGS__) \ + A(n##3, schar, uchar, f, o, ##__VA_ARGS__) \ + A(n##4, ulong, ulong, f, o, ##__VA_ARGS__) \ + A(n##5, uint, uint, f, o, ##__VA_ARGS__) \ + A(n##6, ushort, ushort, f, o, ##__VA_ARGS__) \ + A(n##7, uchar, uchar, f, o, ##__VA_ARGS__) + +B(00, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(01, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(03, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(05, __sync_add_and_fetch, -) +B(06, __sync_sub_and_fetch, +) +B(08, __sync_xor_and_fetch, ^) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + ut z = f (&v##t, 42, ##__VA_ARGS__); \ + t w = (t) z; \ + return w o 42; \ +} + +B(10, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(11, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(13, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(15, __sync_add_and_fetch, -) +B(16, __sync_sub_and_fetch, +) +B(18, __sync_xor_and_fetch, ^) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + ut z = f (&v##t, x, ##__VA_ARGS__); \ + t w = (t) z; \ + t v = w o x; \ + return v == 0; \ +} + +B(20, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(21, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(23, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(25, __sync_add_and_fetch, -) +B(26, __sync_sub_and_fetch, +) +B(28, __sync_xor_and_fetch, ^) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + ut z = f (&v##t, 42, ##__VA_ARGS__); \ + t w = (t) z; \ + t v = w o 42; \ + return v != 0; \ +} + +B(30, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(31, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(33, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(35, __sync_add_and_fetch, -) +B(36, __sync_sub_and_fetch, +) +B(38, __sync_xor_and_fetch, ^) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (t x) \ +{ \ + return (t) (((t) f (&v##t, x, ##__VA_ARGS__)) \ + o x) != 0; \ +} + +B(40, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(41, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(43, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(45, __sync_add_and_fetch, -) +B(46, __sync_sub_and_fetch, +) +B(48, __sync_xor_and_fetch, ^) + +#undef A +#define A(n, t, ut, f, o, ...) \ +t fn##n (void) \ +{ \ + return (t) (((t) f (&v##t, 42, ##__VA_ARGS__))\ + o 42) == 0; \ +} + +B(50, __atomic_add_fetch, -, __ATOMIC_RELAXED) +B(51, __atomic_sub_fetch, +, __ATOMIC_RELAXED) +B(53, __atomic_xor_fetch, ^, __ATOMIC_RELAXED) +B(55, __sync_add_and_fetch, -) +B(56, __sync_sub_and_fetch, +) +B(58, __sync_xor_and_fetch, ^) diff --git a/gcc/testsuite/gcc.target/i386/avx-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-covert-1.c new file mode 100644 index 00000000000..b6c794ecbb8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-covert-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency,^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "vcvtss2sd" } } */ +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vcvtps2pd" } } */ +/* { dg-final { scan-assembler-not "vcvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c new file mode 100644 index 00000000000..c40c48b1b2d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-fp-covert-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency" } */ + +extern float f; +extern double d; + +void +foo (void) +{ + d = f; +} + +/* { dg-final { scan-assembler "vcvtss2sd" } } */ +/* { dg-final { scan-assembler-not "vcvtps2pd" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c b/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c new file mode 100644 index 00000000000..01bb64e66cc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-int-covert-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern int i; + +void +foo (void) +{ + f = i; +} + +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-pr102224.c b/gcc/testsuite/gcc.target/i386/avx-pr102224.c index be6b88c05db..7cb8b4cdecb 100644 --- a/gcc/testsuite/gcc.target/i386/avx-pr102224.c +++ b/gcc/testsuite/gcc.target/i386/avx-pr102224.c @@ -1,4 +1,4 @@ -/* PR tree-optimization/51581 */ +/* PR target/102224 */ /* { dg-do run } */ /* { dg-options "-O2 -mavx" } */ /* { dg-require-effective-target avx } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-pr89984.c b/gcc/testsuite/gcc.target/i386/avx-pr89984.c new file mode 100644 index 00000000000..3409adef5b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-pr89984.c @@ -0,0 +1,23 @@ +/* PR target/89984 */ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx" } */ +/* { dg-require-effective-target avx } */ + +#ifndef CHECK_H +#define CHECK_H "avx-check.h" +#endif +#ifndef TEST +#define TEST avx_test +#endif + +#define main main1 +#include "../../gcc.dg/pr89984.c" +#undef main + +#include CHECK_H + +static void +TEST (void) +{ + main1 (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-pr94680.c b/gcc/testsuite/gcc.target/i386/avx-pr94680.c new file mode 100644 index 00000000000..cb5041b6af3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-pr94680.c @@ -0,0 +1,107 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx -mno-avx512f -O2" } */ +/* { dg-final { scan-assembler-times {(?n)vmov[a-z0-9]*[ \t]*%xmm[0-9]} 12 } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ + +typedef float v8sf __attribute__((vector_size(32))); +typedef double v4df __attribute__ ((vector_size (32))); +typedef long long v4di __attribute__((vector_size(32))); +typedef int v8si __attribute__((vector_size(32))); +typedef short v16hi __attribute__ ((vector_size (32))); +typedef char v32qi __attribute__ ((vector_size (32))); + +v4df +foo_v4df (v4df x) +{ + return __builtin_shuffle (x, (v4df) { 0, 0, 0, 0 }, (v4di) { 0, 1, 4, 5 }); +} + +v4df +foo_v4df_l (v4df x) +{ + return __builtin_shuffle ((v4df) { 0, 0, 0, 0 }, x, (v4di) { 4, 5, 1, 2 }); +} + +v4di +foo_v4di (v4di x) +{ + return __builtin_shuffle (x, (v4di) { 0, 0, 0, 0 }, (v4di) { 0, 1, 4, 7 }); +} + +v4di +foo_v4di_l (v4di x) +{ + return __builtin_shuffle ((v4di) { 0, 0, 0, 0 }, x, (v4di) { 4, 5, 3, 1 }); +} + +v8sf +foo_v8sf (v8sf x) +{ + return __builtin_shuffle ((v8sf) { 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v8si) { 8, 9, 10, 11, 0, 1, 2, 3 }); +} + +v8sf +foo_v8sf_l (v8sf x) +{ + return __builtin_shuffle (x, (v8sf) { 0, 0, 0, 0, 0, 0, 0, 0 }, + (v8si) { 0, 1, 2, 3, 8, 9, 10, 11 }); +} + +v8si +foo_v8si (v8si x) +{ + return __builtin_shuffle (x, (v8si) { 0, 0, 0, 0, 0, 0, 0, 0 }, + (v8si) { 0, 1, 2, 3, 13, 12, 11, 15 }); +} + +v8si +foo_v8si_l (v8si x) +{ + return __builtin_shuffle ((v8si) { 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v8si) { 8, 9, 10, 11, 7, 6, 5, 4 }); +} + +v16hi +foo_v16hi (v16hi x) +{ + return __builtin_shuffle (x, (v16hi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v16hi) { 0, 1, 2, 3, 4, 5, 6, 7, + 24, 17, 26, 19, 28, 21, 30, 23 }); +} + +v16hi +foo_v16hi_l (v16hi x) +{ + return __builtin_shuffle ((v16hi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v16hi) { 16, 17, 18, 19, 20, 21, 22, 23, + 15, 0, 13, 2, 11, 4, 9, 6 }); +} + +v32qi +foo_v32qi (v32qi x) +{ + return __builtin_shuffle (x, (v32qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v32qi) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 32, 49, 34, 58, 36, 53, 38, 39, + 40, 60, 42, 43, 63, 45, 46, 47 }); +} + +v32qi +foo_v32qi_l (v32qi x) +{ + return __builtin_shuffle ((v32qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v32qi) { 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47, + 31, 0, 29, 2, 27, 4, 25, 6, + 23, 8, 21, 10, 19, 12, 17, 14 }); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c index a31b4a2a63a..9590f25da22 100644 --- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx -mtune=generic -dp" } */ +/* { dg-options "-O2 -mavx -mno-avx512f -mtune=generic -dp" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c index 803936eef01..36dcf7367f1 100644 --- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-15.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx -mtune=generic -dp" } */ +/* { dg-options "-O2 -mavx -mno-avx512f -mtune=generic -dp" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c new file mode 100644 index 00000000000..381ee9a7f96 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-28.c @@ -0,0 +1,17 @@ +/* PR target/101495 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2 -mtune=generic -dp" } */ + +#include + +extern __m256 x, y; +extern __m256 bar (void); + +__m256 +foo () +{ + x = y; + return bar (); +} + +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr101286.c b/gcc/testsuite/gcc.target/i386/avx2-pr101286.c new file mode 100644 index 00000000000..81917bfbc71 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr101286.c @@ -0,0 +1,11 @@ +/* PR target/101286 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-mavx2" } */ + +typedef __attribute__((__vector_size__ (2 * sizeof (__int128)))) __int128 V; + +V +foo (void) +{ + return (V){(__int128) 1 << 64 | 1, (__int128) 1 << 64 | 1}; +} diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-shiftqihi-constant-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-shiftqihi-constant-1.c index 78bf5d33689..fbc3de08119 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bw-shiftqihi-constant-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bw-shiftqihi-constant-1.c @@ -1,7 +1,8 @@ /* PR target/95524 */ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512bw" } */ -/* { dg-final { scan-assembler-times "vpand\[^\n\]*%zmm" 3 } } */ +/* { dg-final { scan-assembler-times "vpand\[^\n\]*%zmm" 2 } } */ +/* { dg-final { scan-assembler-times "vpternlogd\[^\n\]*%zmm" 1 } } */ typedef char v64qi __attribute__ ((vector_size (64))); typedef unsigned char v64uqi __attribute__ ((vector_size (64))); @@ -11,7 +12,6 @@ foo_ashiftrt_512 (v64qi a) return a >> 2; } /* { dg-final { scan-assembler-times "vpsraw\[^\n\]*%zmm" 1 } } */ -/* { dg-final { scan-assembler-times "vpxor\[^\n\]*%zmm" 1 } } */ /* { dg-final { scan-assembler-times "vpsubb\[^\n\]*%zmm" 1 } } */ __attribute__((noipa)) v64qi diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c index cb542d09058..0107df7741a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512dq-abs-copysign-1.c @@ -64,8 +64,8 @@ f6 (double x) } /* { dg-final { scan-assembler "vandps\[^\n\r\]*xmm16" } } */ -/* { dg-final { scan-assembler "vorps\[^\n\r\]*xmm16" } } */ +/* { dg-final { scan-assembler "vpternlogd\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vxorps\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vandpd\[^\n\r\]*xmm18" } } */ -/* { dg-final { scan-assembler "vorpd\[^\n\r\]*xmm18" } } */ +/* { dg-final { scan-assembler "vpternlogq\[^\n\r\]*xmm18" } } */ /* { dg-final { scan-assembler "vxorpd\[^\n\r\]*xmm18" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c index 0563e696316..a2664d87f29 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c @@ -2,8 +2,11 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512dq" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 5 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */ typedef int v16si __attribute__ ((vector_size (64))); typedef long long v8di __attribute__ ((vector_size (64))); diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c index ffbe95980ca..477f9ca1282 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c @@ -2,8 +2,9 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to8\\\}" 4 } } */ -/* { dg-final { scan-assembler-times "\[^n\n\]*\\\{1to16\\\}" 4 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */ typedef int v16si __attribute__ ((vector_size (64))); typedef long long v8di __attribute__ ((vector_size (64))); diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr94680.c b/gcc/testsuite/gcc.target/i386/avx512f-pr94680.c new file mode 100644 index 00000000000..c27431aae72 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr94680.c @@ -0,0 +1,144 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vbmi -O2" } */ +/* { dg-final { scan-assembler-times {(?n)vmov[a-z0-9]*[ \t]*%ymm[0-9]} 12} } */ +/* { dg-final { scan-assembler-not "pxor" } } */ + + +typedef float v16sf __attribute__((vector_size(64))); +typedef double v8df __attribute__ ((vector_size (64))); +typedef long long v8di __attribute__((vector_size(64))); +typedef int v16si __attribute__((vector_size(64))); +typedef short v32hi __attribute__ ((vector_size (64))); +typedef char v64qi __attribute__ ((vector_size (64))); + +v8df +foo_v8df (v8df x) +{ + return __builtin_shuffle (x, (v8df) { 0, 0, 0, 0, 0, 0, 0, 0 }, + (v8di) { 0, 1, 2, 3, 15, 14, 10, 11 }); +} + +v8df +foo_v8df_l (v8df x) +{ + return __builtin_shuffle ((v8df) { 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v8di) { 8, 9, 10, 11, 0, 1, 2, 3 }); +} + +v8di +foo_v8di (v8di x) +{ + return __builtin_shuffle (x, (v8di) { 0, 0, 0, 0, 0, 0, 0, 0 }, + (v8di) { 0, 1, 2, 3, 8, 9, 10, 11 }); +} + +v8di +foo_v8di_l (v8di x) +{ + return __builtin_shuffle ((v8di) { 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v8di) { 8, 9, 10, 11, 7, 6, 5, 4 }); +} + +v16sf +foo_v16sf (v16sf x) +{ + return __builtin_shuffle (x, (v16sf) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v16si) { 0, 1, 2, 3, 4, 5, 6, 7, + 16, 17, 18, 19, 20, 21, 22, 23 }); +} + +v16sf +foo_v16sf_l (v16sf x) +{ + return __builtin_shuffle ((v16sf) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v16si) { 16, 17, 18, 19, 20, 21, 22, 23, + 0, 15, 2, 13, 4, 11, 6, 9 }); +} + +v16si +foo_v16si (v16si x) +{ + return __builtin_shuffle (x, (v16si) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v16si) { 0, 1, 2, 3, 4, 5, 6, 7, + 31, 30, 29, 28, 20, 21, 22, 23 }); +} + +v16si +foo_v16si_l (v16si x) +{ + return __builtin_shuffle ((v16si) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v16si) { 16, 17, 18, 19, 20, 21, 22, 23, + 15, 0, 13, 2, 11, 4, 9, 6 }); +} + +v32hi +foo_v32hi (v32hi x) +{ + return __builtin_shuffle (x, (v32hi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v32hi) { 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 63, 33, 61, 35, 59, 37, 57, 39, + 55, 41, 53, 43, 51, 45, 49, 47 }); +} + +v32hi +foo_v32hi_l (v32hi x) +{ + return __builtin_shuffle ((v32hi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v32hi) { 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47, + 31, 0, 29, 2, 27, 4, 25, 6, + 23, 8, 21, 10, 19, 12, 17, 14 }); +} + +v64qi +foo_v64qi (v64qi x) +{ + return __builtin_shuffle (x, (v64qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v64qi) {0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, + 64, 127, 66, 125, 68, 123, 70, 121, + 72, 119, 74, 117, 76, 115, 78, 113, + 80, 111, 82, 109, 84, 107, 86, 105, + 88, 103, 90, 101, 92, 99, 94, 97 }); +} + +v64qi +foo_v64qi_l (v64qi x) +{ + return __builtin_shuffle ((v64qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v64qi) { 64, 65, 66, 67, 68, 69, 70, 71, + 72, 73, 74, 75, 76, 77, 78, 79, + 80, 81, 82, 83, 84, 85, 86, 87, + 88, 89, 90, 91, 92, 93, 94, 95, + 0, 63, 2, 61, 4, 59, 6, 57, + 8, 55, 10, 53, 12, 51, 14, 49, + 16, 47, 18, 45, 20, 43, 22, 41, + 24, 39, 26, 37, 28, 35, 30, 33 }); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c index 99a89f51202..ca49a585232 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c +++ b/gcc/testsuite/gcc.target/i386/avx512f_cond_move.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -mavx512f" } */ -/* { dg-final { scan-assembler-times "(?:vpblendmd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */ +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512" } */ +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vmovdqa32)\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 8 } } */ unsigned int x[128]; int y[128]; diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c index b375c5fad80..b27335b9d99 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-abs-copysign-1.c @@ -64,8 +64,8 @@ f6 (double x) } /* { dg-final { scan-assembler "vpandd\[^\n\r\]*xmm16" } } */ -/* { dg-final { scan-assembler "vpord\[^\n\r\]*xmm16" } } */ +/* { dg-final { scan-assembler "vpternlogd\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vpxord\[^\n\r\]*xmm16" } } */ /* { dg-final { scan-assembler "vpandq\[^\n\r\]*xmm18" } } */ -/* { dg-final { scan-assembler "vporq\[^\n\r\]*xmm18" } } */ +/* { dg-final { scan-assembler "vpternlogq\[^\n\r\]*xmm18" } } */ /* { dg-final { scan-assembler "vpxorq\[^\n\r\]*xmm18" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c index c06369d93fd..f8eb99f0b5f 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c @@ -2,9 +2,15 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 10 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 5 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 7 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 3 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 3 { target { ! ia32 } } } } */ typedef int v4si __attribute__ ((vector_size (16))); typedef int v8si __attribute__ ((vector_size (32))); diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c index 4998a9b8d51..32f6ac81841 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c @@ -2,9 +2,12 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mavx512f -mavx512vl" } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 8 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 4 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 4 { target { ! ia32 } } } } */ typedef int v4si __attribute__ ((vector_size (16))); typedef int v8si __attribute__ ((vector_size (32))); diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c index 57666ac60b6..658eb3e25bb 100644 --- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c +++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -mno-avx" } */ #include static inline __attribute__ ((cold)) void diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c b/gcc/testsuite/gcc.target/i386/eh_return-1.c new file mode 100644 index 00000000000..43f94f01a97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f -mtune-ctrl=avx256_move_by_pieces" } */ + +struct _Unwind_Context +{ + void *ra; + char array[48]; +}; + +extern long uw_install_context_1 (struct _Unwind_Context *); + +void +_Unwind_RaiseException (void) +{ + struct _Unwind_Context this_context, cur_context; + long offset = uw_install_context_1 (&this_context); + __builtin_memcpy (&this_context, &cur_context, + sizeof (struct _Unwind_Context)); + void *handler = __builtin_frob_return_addr ((&cur_context)->ra); + uw_install_context_1 (&cur_context); + __builtin_eh_return (offset, handler); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c b/gcc/testsuite/gcc.target/i386/eh_return-2.c new file mode 100644 index 00000000000..cb762f92cc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c @@ -0,0 +1,16 @@ +/* PR target/101772 */ +/* { dg-do compile } */ +/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 -mstackrealign" } */ + +struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context; + +void offset (int); + +struct _Unwind_Context { + void *reg[7]; +} _Unwind_Resume_or_Rethrow() { + struct _Unwind_Context cur_contextcur_context = + _Unwind_Resume_or_Rethrow_this_context; + offset(0); + __builtin_eh_return ((long) offset, 0); +} diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c index 4deff93c1e8..b0d3dc38a0c 100644 --- a/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c +++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c @@ -6,7 +6,7 @@ typedef double v2df __attribute__((vector_size (16))); static v2df __attribute__((noinline)) bar (v2df a) { - return a + (v2df){ 3.0, 3.0 }; + return a + (v2df){ 3.0, 4.0 }; } v2df __attribute__((noinline)) diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c b/gcc/testsuite/gcc.target/i386/incoming-11.c index a830c96f7d1..a06fdee477d 100644 --- a/gcc/testsuite/gcc.target/i386/incoming-11.c +++ b/gcc/testsuite/gcc.target/i386/incoming-11.c @@ -15,4 +15,4 @@ void f() for (i = 0; i < 100; i++) q[i] = 1; } -/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */ +/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c new file mode 100644 index 00000000000..5faee21f9b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-10.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c new file mode 100644 index 00000000000..b8917a7f917 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-11.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 64); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c new file mode 100644 index 00000000000..8a82baff5f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-12.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_move_by_pieces" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 64); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c new file mode 100644 index 00000000000..97e6067fec9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-13.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c new file mode 100644 index 00000000000..7addc4c0a28 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-14.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c new file mode 100644 index 00000000000..4fb94ce7bd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-15.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_move_by_pieces" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c new file mode 100644 index 00000000000..728eba5ea3d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-16.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 34); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c new file mode 100644 index 00000000000..28ab7a6d41c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mprefer-vector-width=256 -mavx512f -mmove-max=512" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c new file mode 100644 index 00000000000..b15a0db9ff0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c new file mode 100644 index 00000000000..a5b5b617578 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids -mmove-max=128 -mstore-max=128" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 8 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c new file mode 100644 index 00000000000..1feff48c5b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids -mmove-max=256 -mstore-max=256" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu(?:64|)\[ \\t\]+\[^\n\]*%ymm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c new file mode 100644 index 00000000000..ef439f20f74 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=sapphirerapids -march=x86-64 -mavx2" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu(?:64|)\[ \\t\]+\[^\n\]*%ymm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c new file mode 100644 index 00000000000..3d248d447ea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +void +foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src) +{ + __builtin_memcpy (dst, src, 17); +} + +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c new file mode 100644 index 00000000000..c13a2beb2f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +void +foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src) +{ + __builtin_memcpy (dst, src, 18); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c new file mode 100644 index 00000000000..238f88b275e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f -mtune=generic" } */ + +void +foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src) +{ + __builtin_memcpy (dst, src, 19); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c new file mode 100644 index 00000000000..2b8032684b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 64); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-10.c b/gcc/testsuite/gcc.target/i386/pieces-memset-10.c new file mode 100644 index 00000000000..a6390d1bd8f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-10.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 64); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-11.c b/gcc/testsuite/gcc.target/i386/pieces-memset-11.c new file mode 100644 index 00000000000..3802eb7c147 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-11.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 64); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-12.c b/gcc/testsuite/gcc.target/i386/pieces-memset-12.c new file mode 100644 index 00000000000..d9a10bc038e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-12.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-13.c b/gcc/testsuite/gcc.target/i386/pieces-memset-13.c new file mode 100644 index 00000000000..7f2cd3f58ec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-13.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 33); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-14.c b/gcc/testsuite/gcc.target/i386/pieces-memset-14.c new file mode 100644 index 00000000000..10bc085f83b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-14.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-15.c b/gcc/testsuite/gcc.target/i386/pieces-memset-15.c new file mode 100644 index 00000000000..2123958f836 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-15.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-16.c b/gcc/testsuite/gcc.target/i386/pieces-memset-16.c new file mode 100644 index 00000000000..1c5d124cecc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-16.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 17); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-17.c b/gcc/testsuite/gcc.target/i386/pieces-memset-17.c new file mode 100644 index 00000000000..6cdb33557c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-17.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 17); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-18.c b/gcc/testsuite/gcc.target/i386/pieces-memset-18.c new file mode 100644 index 00000000000..02f889899d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-18.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 18); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-19.c b/gcc/testsuite/gcc.target/i386/pieces-memset-19.c new file mode 100644 index 00000000000..7e9cf2e26d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-19.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 64); +} + +/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-2.c b/gcc/testsuite/gcc.target/i386/pieces-memset-2.c new file mode 100644 index 00000000000..4ebfc4df090 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 64); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-20.c b/gcc/testsuite/gcc.target/i386/pieces-memset-20.c new file mode 100644 index 00000000000..1dc4db180d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-20.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 64); +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-21.c b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c new file mode 100644 index 00000000000..a04f7eb55c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-21.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 66); +} + +/* { dg-final { scan-assembler-times "vpxor(?:d|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu(?:64|8)\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* { dg-final { scan-assembler "vzeroupper" } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-22.c b/gcc/testsuite/gcc.target/i386/pieces-memset-22.c new file mode 100644 index 00000000000..5f3c454ef8f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-22.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 33); +} + +/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-23.c b/gcc/testsuite/gcc.target/i386/pieces-memset-23.c new file mode 100644 index 00000000000..9232864024e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-23.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 33); +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-24.c b/gcc/testsuite/gcc.target/i386/pieces-memset-24.c new file mode 100644 index 00000000000..5243f270f16 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-24.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 33); +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-25.c b/gcc/testsuite/gcc.target/i386/pieces-memset-25.c new file mode 100644 index 00000000000..195ddb635eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-25.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 17); +} + +/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-26.c b/gcc/testsuite/gcc.target/i386/pieces-memset-26.c new file mode 100644 index 00000000000..13606b2da54 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-26.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 17); +} + +/* { dg-final { scan-assembler-times "pxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-27.c b/gcc/testsuite/gcc.target/i386/pieces-memset-27.c new file mode 100644 index 00000000000..c764f6ffbce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-27.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 17); +} + +/* { dg-final { scan-assembler-times "vpxor(?:d|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu(?:64|8|)\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-28.c b/gcc/testsuite/gcc.target/i386/pieces-memset-28.c new file mode 100644 index 00000000000..83c2d3f0fde --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-28.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 64); +} + +/* { dg-final { scan-assembler-times "pcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-29.c b/gcc/testsuite/gcc.target/i386/pieces-memset-29.c new file mode 100644 index 00000000000..3b07a64e3f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-29.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 64); +} + +/* { dg-final { scan-assembler-not "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c new file mode 100644 index 00000000000..765441a7c4a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512bw -mno-avx512vl -mavx512f -mtune=intel" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 66); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-30.c b/gcc/testsuite/gcc.target/i386/pieces-memset-30.c new file mode 100644 index 00000000000..59595e6d3c4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-30.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 64); +} + +/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-31.c b/gcc/testsuite/gcc.target/i386/pieces-memset-31.c new file mode 100644 index 00000000000..f7b5d5bfe1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-31.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 66); +} + +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-32.c b/gcc/testsuite/gcc.target/i386/pieces-memset-32.c new file mode 100644 index 00000000000..c5ca0bd17ba --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 33); +} + +/* { dg-final { scan-assembler-times "pcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-33.c b/gcc/testsuite/gcc.target/i386/pieces-memset-33.c new file mode 100644 index 00000000000..68646223b0e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-33.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 33); +} + +/* { dg-final { scan-assembler-not "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-34.c b/gcc/testsuite/gcc.target/i386/pieces-memset-34.c new file mode 100644 index 00000000000..52a16a0292d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-34.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 33); +} + +/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-35.c b/gcc/testsuite/gcc.target/i386/pieces-memset-35.c new file mode 100644 index 00000000000..2b9a4da8dac --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-35.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 34); +} + +/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-36.c b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c new file mode 100644 index 00000000000..d1f1263c7b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 17); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c new file mode 100644 index 00000000000..0c5056be54d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune-ctrl=avx256_store_by_pieces" } */ + +void +foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) +{ + __builtin_memset (dst, x, 66); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-38.c b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c new file mode 100644 index 00000000000..ed4a24a54fd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-38.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 33); +} + +/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c new file mode 100644 index 00000000000..e33644c2f10 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512bw -mtune=generic" } */ + +void +foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) +{ + __builtin_memset (dst, x, 66); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* { dg-final { scan-assembler-not "vinserti64x4" } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c new file mode 100644 index 00000000000..9256919bfdf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 33); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-40.c b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c new file mode 100644 index 00000000000..4eda73ead59 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-40.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=sandybridge" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 66); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-41.c b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c new file mode 100644 index 00000000000..f86b6986da9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-41.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-42.c b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c new file mode 100644 index 00000000000..df0c122aae7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-42.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 33); +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-43.c b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c new file mode 100644 index 00000000000..2f2179c2df9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-43.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=sandybridge" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 33); +} + +/* { dg-final { scan-assembler-times "vpcmpeqd\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-44.c b/gcc/testsuite/gcc.target/i386/pieces-memset-44.c new file mode 100644 index 00000000000..5986f8e8b23 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-44.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 48); +} + +/* { dg-final { scan-assembler-times "vpxor\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-45.c b/gcc/testsuite/gcc.target/i386/pieces-memset-45.c new file mode 100644 index 00000000000..70c80e5064b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-45.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mprefer-vector-width=256 -mavx512f -mtune-ctrl=avx512_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-46.c b/gcc/testsuite/gcc.target/i386/pieces-memset-46.c new file mode 100644 index 00000000000..be1b054eed2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-46.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */ +/* { dg-final { scan-assembler-times "vmovw\[ \\t\]+\[^\n\]*%xmm" 1 { xfail *-*-* } } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-47.c b/gcc/testsuite/gcc.target/i386/pieces-memset-47.c new file mode 100644 index 00000000000..78d3290c74f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-47.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids -mstore-max=128" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu(?:8|)\[ \\t\]+\[^\n\]*%xmm" 4 } } */ +/* { dg-final { scan-assembler-times "vmovw\[ \\t\]+\[^\n\]*%xmm" 1 { xfail *-*-* } } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-48.c b/gcc/testsuite/gcc.target/i386/pieces-memset-48.c new file mode 100644 index 00000000000..6342dbb91b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-48.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=sapphirerapids -mstore-max=256" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu(?:8|)\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* { dg-final { scan-assembler-times "vmovw\[ \\t\]+\[^\n\]*%xmm" 1 { xfail *-*-* } } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-49.c b/gcc/testsuite/gcc.target/i386/pieces-memset-49.c new file mode 100644 index 00000000000..ad43f89a9bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-49.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=sapphirerapids -march=x86-64 -mavx2" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 66); +} + +/* { dg-final { scan-assembler-times "vmovdqu(?:8|)\[ \\t\]+\[^\n\]*%ymm" 2 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-5.c b/gcc/testsuite/gcc.target/i386/pieces-memset-5.c new file mode 100644 index 00000000000..e2379df71aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-6.c b/gcc/testsuite/gcc.target/i386/pieces-memset-6.c new file mode 100644 index 00000000000..d795663e1e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-6.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=intel" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-7.c b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c new file mode 100644 index 00000000000..fd159869817 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-7.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 17); +} + +/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-8.c b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c new file mode 100644 index 00000000000..7df0019ef63 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-8.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 17); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-9.c b/gcc/testsuite/gcc.target/i386/pieces-memset-9.c new file mode 100644 index 00000000000..1ead154fe1e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-9.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f -mtune=generic" } */ + +extern char *dst; + +void +foo (int x) +{ + __builtin_memset (dst, x, 17); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* No need to dynamically realign the stack here. */ +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ +/* Nor use a frame pointer. */ +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100704-1.c b/gcc/testsuite/gcc.target/i386/pr100704-1.c new file mode 100644 index 00000000000..02461db9695 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100704-1.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64" } */ + +struct S +{ + long long s1 __attribute__ ((aligned (8))); + unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14; +}; + +extern struct S a[]; + +void bar (struct S); + +void +foo (void) +{ + bar (a[0]); +} + +/* { dg-final { scan-assembler-not "pushq" } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 48\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100704-2.c b/gcc/testsuite/gcc.target/i386/pr100704-2.c new file mode 100644 index 00000000000..07b9bd18c7a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100704-2.c @@ -0,0 +1,23 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64" } */ + +struct S +{ + char array[64]; +}; + +extern struct S a[]; + +void bar (struct S); + +void +foo (void) +{ + bar (a[0]); +} + +/* { dg-final { scan-assembler-not "pushq" } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 48\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100704-3.c b/gcc/testsuite/gcc.target/i386/pr100704-3.c new file mode 100644 index 00000000000..65f9745a197 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100704-3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-sse" } */ + +struct S +{ + long long s1 __attribute__ ((aligned (8))); + unsigned s2, s3; +}; + +extern struct S foooo[]; + +void bar (int, int, int, int, int, int, struct S); + +void +foo (void) +{ + bar (1, 2, 3, 4, 5, 6, foooo[0]); +} + +/* { dg-final { scan-assembler "push\[lq\]\tfoooo\+" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-1.c b/gcc/testsuite/gcc.target/i386/pr100865-1.c new file mode 100644 index 00000000000..949dd5c337a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "movdqa\[ \\t\]+\[^\n\]*%xmm" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10a.c b/gcc/testsuite/gcc.target/i386/pr100865-10a.c new file mode 100644 index 00000000000..1d849a381c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-10a.c @@ -0,0 +1,33 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned char) A) << 120) \ + | (((unsigned __int128) (unsigned char) A) << 112) \ + | (((unsigned __int128) (unsigned char) A) << 104) \ + | (((unsigned __int128) (unsigned char) A) << 96) \ + | (((unsigned __int128) (unsigned char) A) << 88) \ + | (((unsigned __int128) (unsigned char) A) << 80) \ + | (((unsigned __int128) (unsigned char) A) << 72) \ + | (((unsigned __int128) (unsigned char) A) << 64) \ + | (((unsigned __int128) (unsigned char) A) << 56) \ + | (((unsigned __int128) (unsigned char) A) << 48) \ + | (((unsigned __int128) (unsigned char) A) << 40) \ + | (((unsigned __int128) (unsigned char) A) << 32) \ + | (((unsigned __int128) (unsigned char) A) << 24) \ + | (((unsigned __int128) (unsigned char) A) << 16) \ + | (((unsigned __int128) (unsigned char) A) << 8) \ + | ((unsigned __int128) (unsigned char) A) ) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST (0x1f); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-10b.c b/gcc/testsuite/gcc.target/i386/pr100865-10b.c new file mode 100644 index 00000000000..e5616d8d258 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-10b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-10a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 8 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-11a.c b/gcc/testsuite/gcc.target/i386/pr100865-11a.c new file mode 100644 index 00000000000..04ce1662f3c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-11a.c @@ -0,0 +1,23 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned long long) A) << 64) \ + | ((unsigned __int128) (unsigned long long) A) ) + +#define MK_CONST128_BROADCAST_SIGNED(A) \ + ((__int128) MK_CONST128_BROADCAST (A)) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST_SIGNED (-0x1ffffffffLL); +} + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "(?:vpbroadcastq|vpunpcklqdq)\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-11b.c b/gcc/testsuite/gcc.target/i386/pr100865-11b.c new file mode 100644 index 00000000000..12d55b9a642 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-11b.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-11a.c" + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-11c.c b/gcc/testsuite/gcc.target/i386/pr100865-11c.c new file mode 100644 index 00000000000..de56c84b9ca --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-11c.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +#include "pr100865-11a.c" + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "vpunpcklqdq\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-12a.c b/gcc/testsuite/gcc.target/i386/pr100865-12a.c new file mode 100644 index 00000000000..d4833d44475 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-12a.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned long long) A) << 64) \ + | ((unsigned __int128) (unsigned long long) A) ) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST (0x1ffffffffLL); +} + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "(?:vpbroadcastq|vpunpcklqdq)\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-12b.c b/gcc/testsuite/gcc.target/i386/pr100865-12b.c new file mode 100644 index 00000000000..63a5629b90c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-12b.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-12a.c" + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-12c.c b/gcc/testsuite/gcc.target/i386/pr100865-12c.c new file mode 100644 index 00000000000..77415f22c97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-12c.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +#include "pr100865-12a.c" + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-times "vpunpcklqdq\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-2.c b/gcc/testsuite/gcc.target/i386/pr100865-2.c new file mode 100644 index 00000000000..f3ea7753abe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-3.c b/gcc/testsuite/gcc.target/i386/pr100865-3.c new file mode 100644 index 00000000000..714c43e12c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 3, 16); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4a.c b/gcc/testsuite/gcc.target/i386/pr100865-4a.c new file mode 100644 index 00000000000..8609d1128b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-4a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 2 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-4b.c b/gcc/testsuite/gcc.target/i386/pr100865-4b.c new file mode 100644 index 00000000000..6d9cb91b8e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-4b.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +#include "pr100865-4a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]%ymm\[0-9\]+, " 2 } } */ +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastb\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5a.c b/gcc/testsuite/gcc.target/i386/pr100865-5a.c new file mode 100644 index 00000000000..4149797fe81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-5a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern short array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-5b.c b/gcc/testsuite/gcc.target/i386/pr100865-5b.c new file mode 100644 index 00000000000..ded41b680d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-5b.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-5a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu16\[\\t \]%ymm\[0-9\]+, " 4 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6a.c b/gcc/testsuite/gcc.target/i386/pr100865-6a.c new file mode 100644 index 00000000000..3fde549a10d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-6a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6b.c b/gcc/testsuite/gcc.target/i386/pr100865-6b.c new file mode 100644 index 00000000000..9588249cb02 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-6b.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-6a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %ymm\[0-9\]+" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-6c.c b/gcc/testsuite/gcc.target/i386/pr100865-6c.c new file mode 100644 index 00000000000..46d31030ce8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-6c.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +extern int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vbroadcastss" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 8 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7a.c b/gcc/testsuite/gcc.target/i386/pr100865-7a.c new file mode 100644 index 00000000000..f6f2be91120 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-7a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern long long int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */ +/* { dg-final { scan-assembler-not "vpbroadcastq" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovdqa" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7b.c b/gcc/testsuite/gcc.target/i386/pr100865-7b.c new file mode 100644 index 00000000000..3b20c680521 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-7b.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-7a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */ +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-7c.c b/gcc/testsuite/gcc.target/i386/pr100865-7c.c new file mode 100644 index 00000000000..4d50bb7e2f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-7c.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +extern long long int array[64]; + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = -45; +} + +/* { dg-final { scan-assembler-times "vbroadcastsd" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, " 16 } } */ +/* { dg-final { scan-assembler-not "vbroadcastsd" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovdqa" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c b/gcc/testsuite/gcc.target/i386/pr100865-8a.c new file mode 100644 index 00000000000..911b14d4a25 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned int) A) << 96) \ + | (((unsigned __int128) (unsigned int) A) << 64) \ + | (((unsigned __int128) (unsigned int) A) << 32) \ + | ((unsigned __int128) (unsigned int) A) ) + +#define MK_CONST128_BROADCAST_SIGNED(A) \ + ((__int128) MK_CONST128_BROADCAST (A)) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST_SIGNED (-45); +} + +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8b.c b/gcc/testsuite/gcc.target/i386/pr100865-8b.c new file mode 100644 index 00000000000..99a10ad83bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-8b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-8a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c b/gcc/testsuite/gcc.target/i386/pr100865-8c.c new file mode 100644 index 00000000000..00682edb8c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +#include "pr100865-8a.c" + +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c new file mode 100644 index 00000000000..45d0e0d0e2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c @@ -0,0 +1,25 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake" } */ + +extern __int128 array[16]; + +#define MK_CONST128_BROADCAST(A) \ + ((((unsigned __int128) (unsigned short) A) << 112) \ + | (((unsigned __int128) (unsigned short) A) << 96) \ + | (((unsigned __int128) (unsigned short) A) << 80) \ + | (((unsigned __int128) (unsigned short) A) << 64) \ + | (((unsigned __int128) (unsigned short) A) << 48) \ + | (((unsigned __int128) (unsigned short) A) << 32) \ + | (((unsigned __int128) (unsigned short) A) << 16) \ + | ((unsigned __int128) (unsigned short) A) ) + +void +foo (void) +{ + int i; + for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) + array[i] = MK_CONST128_BROADCAST (0x1fff); +} + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9b.c b/gcc/testsuite/gcc.target/i386/pr100865-9b.c new file mode 100644 index 00000000000..14696248525 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-9b.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include "pr100865-9a.c" + +/* { dg-final { scan-assembler-times "vpbroadcastw\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c b/gcc/testsuite/gcc.target/i386/pr100865-9c.c new file mode 100644 index 00000000000..8ffcdc1629d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100865-9c.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O3 -march=skylake -mno-avx2" } */ + +#include "pr100865-9a.c" + +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c new file mode 100644 index 00000000000..16d8bafa663 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100951.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -march=x86-64" } */ + +typedef short __attribute__((__vector_size__ (8 * sizeof (short)))) V; +V v, w; + +void +foo (void) +{ + w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5); +} + +/* { dg-final { scan-assembler-not "punpcklwd" } } */ +/* { dg-final { scan-assembler-not "pshufd" } } */ +/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c new file mode 100644 index 00000000000..7fb3a3f055c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake" } */ + +#include + +extern __m256 x1; +extern __m256d x2; +extern __m256i x3; + +extern void bar (void); + +void +foo1 (void) +{ + x1 = _mm256_setzero_ps (); + bar (); +} + +void +foo2 (void) +{ + x2 = _mm256_setzero_pd (); + bar (); +} + +void +foo3 (void) +{ + x3 = _mm256_setzero_si256 (); + bar (); +} + +/* See PR104581 for the XFAIL reason. */ +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101456-2.c b/gcc/testsuite/gcc.target/i386/pr101456-2.c new file mode 100644 index 00000000000..554a0f1702c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101456-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake" } */ + +#include + +extern __m256 x1; +extern __m256d x2; +extern __m256i x3; + +extern __m256 bar (void); + +void +foo1 (void) +{ + bar (); + x1 = _mm256_setzero_ps (); +} + +void +foo2 (void) +{ + bar (); + x2 = _mm256_setzero_pd (); +} + +void +foo3 (void) +{ + bar (); + x3 = _mm256_setzero_si256 (); +} + +/* { dg-final { scan-assembler-times "vzeroupper" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101742a.c b/gcc/testsuite/gcc.target/i386/pr101742a.c new file mode 100644 index 00000000000..67ea40587dd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101742a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -mtune=nano-x2" } */ + +int n2; + +__attribute__ ((simd)) char +w7 (void) +{ + short int xb = n2; + int qp; + + for (qp = 0; qp < 2; ++qp) + xb = xb < 1; + + return xb; +} diff --git a/gcc/testsuite/gcc.target/i386/pr101742b.c b/gcc/testsuite/gcc.target/i386/pr101742b.c new file mode 100644 index 00000000000..ba19064077b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101742b.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -mtune=nano-x2 -mtune-ctrl=sse_unaligned_store_optimal" } */ + +#include "pr101742a.c" diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c b/gcc/testsuite/gcc.target/i386/pr101900-1.c new file mode 100644 index 00000000000..0a45f8e340a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=use_vector_fp_converts" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "vcvtps2pd" } } */ +/* { dg-final { scan-assembler "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vcvtss2sd" } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101900-2.c b/gcc/testsuite/gcc.target/i386/pr101900-2.c new file mode 100644 index 00000000000..c8b2d1da5ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101900-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=use_vector_converts" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "vcvtss2sd" } } */ +/* { dg-final { scan-assembler "vcvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101900-3.c b/gcc/testsuite/gcc.target/i386/pr101900-3.c new file mode 100644 index 00000000000..6ee565b5bd4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101900-3.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mfpmath=sse -mtune-ctrl=use_vector_fp_converts,use_vector_converts" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "vcvtps2pd" } } */ +/* { dg-final { scan-assembler "vcvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "vcvtss2sd" } } */ +/* { dg-final { scan-assembler-not "vcvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "vxorps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr101930.c b/gcc/testsuite/gcc.target/i386/pr101930.c new file mode 100644 index 00000000000..7207dd18377 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101930.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mfpmath=sse -ffast-math" } */ +double a; +double +__attribute__((noipa)) +foo (int b) +{ + return __builtin_ldexp (a, b); +} diff --git a/gcc/testsuite/gcc.target/i386/pr101989-1.c b/gcc/testsuite/gcc.target/i386/pr101989-1.c new file mode 100644 index 00000000000..594093ecdde --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101989-1.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "vpternlog" 6 } } */ +/* { dg-final { scan-assembler-not "vpxor" } } */ +/* { dg-final { scan-assembler-not "vpor" } } */ +/* { dg-final { scan-assembler-not "vpand" } } */ + +#include +__m256d +__attribute__((noipa, target("avx512vl"))) +copysign2_pd(__m256d from, __m256d to) { + __m256i a = _mm256_castpd_si256(from); + __m256d avx_signbit = _mm256_castsi256_pd(_mm256_slli_epi64(_mm256_cmpeq_epi64(a, a), 63)); + /* (avx_signbit & from) | (~avx_signbit & to) */ + return _mm256_or_pd(_mm256_and_pd(avx_signbit, from), _mm256_andnot_pd(avx_signbit, to)); +} + +__m256i +__attribute__((noipa, target("avx512vl"))) +foo (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & ~src1) | (src3 & src1); +} + +__m256i +__attribute__ ((noipa, target("avx512vl"))) +foo1 (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & src1) | (src3 & ~src1); +} + +__m256i +__attribute__ ((noipa, target("avx512vl"))) +foo2 (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & src1) | (~src3 & src1); +} + +__m256i +__attribute__ ((noipa, target("avx512vl"))) +foo3 (__m256i src1, __m256i src2, __m256i src3) +{ + return (~src2 & src1) | (src3 & src1); +} + +__m256i +__attribute__ ((noipa, target("avx512vl"))) +foo4 (__m256i src1, __m256i src2, __m256i src3) +{ + return src3 & src2 ^ src1; +} diff --git a/gcc/testsuite/gcc.target/i386/pr101989-2.c b/gcc/testsuite/gcc.target/i386/pr101989-2.c new file mode 100644 index 00000000000..9d9759a8e1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101989-2.c @@ -0,0 +1,102 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx2 -mno-avx512f" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512VL + +#include "avx512f-helper.h" + +#include "pr101989-1.c" +__m256d +avx2_copysign2_pd (__m256d from, __m256d to) { + __m256i a = _mm256_castpd_si256(from); + __m256d avx_signbit = _mm256_castsi256_pd(_mm256_slli_epi64(_mm256_cmpeq_epi64(a, a), 63)); + /* (avx_signbit & from) | (~avx_signbit & to) */ + return _mm256_or_pd(_mm256_and_pd(avx_signbit, from), _mm256_andnot_pd(avx_signbit, to)); +} + +__m256i +avx2_foo (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & ~src1) | (src3 & src1); +} + +__m256i +avx2_foo1 (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & src1) | (src3 & ~src1); +} + +__m256i +avx2_foo2 (__m256i src1, __m256i src2, __m256i src3) +{ + return (src2 & src1) | (~src3 & src1); +} + +__m256i +avx2_foo3 (__m256i src1, __m256i src2, __m256i src3) +{ + return (~src2 & src1) | (src3 & src1); +} + +__m256i +avx2_foo4 (__m256i src1, __m256i src2, __m256i src3) +{ + return src3 & src2 ^ src1; +} + + +void +test_256 (void) +{ + union256i_q q1, q2, q3, res2, exp2; + union256d d1, d2, res1, exp1; + int i, sign = 1; + + for (i = 0; i < 4; i++) + { + d1.a[i] = 12.34 * (i + 2000) * sign; + d2.a[i] = 56.78 * (i - 30) * sign; + q1.a[i] = 12 * (i + 2000) * sign; + q2.a[i] = 56 * (i - 30) * sign; + q3.a[i] = 90 * (i + 40) * sign; + res1.a[i] = DEFAULT_VALUE; + exp1.a[i] = DEFAULT_VALUE; + res2.a[i] = exp2.a[i] = -1; + sign = -sign; + } + + exp1.x = avx2_copysign2_pd (d1.x, d2.x); + res1.x = copysign2_pd (d1.x, d2.x); + if (UNION_CHECK (256, d) (res1, exp1.a)) + abort (); + + exp2.x = avx2_foo1 (q1.x, q2.x, q3.x); + res2.x = foo1 (q1.x, q2.x, q3.x); + if (UNION_CHECK (256, i_q) (res2, exp2.a)) + abort (); + + exp2.x = avx2_foo2 (q1.x, q2.x, q3.x); + res2.x = foo2 (q1.x, q2.x, q3.x); + if (UNION_CHECK (256, i_q) (res2, exp2.a)) + abort (); + + exp2.x = avx2_foo3 (q1.x, q2.x, q3.x); + res2.x = foo3 (q1.x, q2.x, q3.x); + if (UNION_CHECK (256, i_q) (res2, exp2.a)) + abort (); + + exp2.x = avx2_foo4 (q1.x, q2.x, q3.x); + res2.x = foo4 (q1.x, q2.x, q3.x); + if (UNION_CHECK (256, i_q) (res2, exp2.a)) + abort (); + + exp2.x = avx2_foo (q1.x, q2.x, q3.x); + res2.x = foo (q1.x, q2.x, q3.x); + if (UNION_CHECK (256, i_q) (res2, exp2.a)) + abort (); +} + +static void +test_128 () +{} diff --git a/gcc/testsuite/gcc.target/i386/pr101989-broadcast-1.c b/gcc/testsuite/gcc.target/i386/pr101989-broadcast-1.c new file mode 100644 index 00000000000..d03d192915f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr101989-broadcast-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl" } */ +/* { dg-final { scan-assembler-times "vpternlog" 4 } } */ +/* { dg-final { scan-assembler-times "\\\{1to4\\\}" 4 } } */ +#include +extern long long C; +__m256d +copysign2_pd(__m256d from, __m256d to) { + __m256i a = _mm256_castpd_si256(from); + __m256d avx_signbit = _mm256_castsi256_pd(_mm256_slli_epi64(_mm256_cmpeq_epi64(a, a), 63)); + /* (avx_signbit & from) | (~avx_signbit & to) */ + return _mm256_or_pd(_mm256_and_pd(avx_signbit, from), _mm256_andnot_pd(avx_signbit, to)); +} + +__m256i +mask_pternlog (__m256i A, __m256i B, __mmask8 U) +{ + return _mm256_mask_ternarylogic_epi64 (A, U, B, _mm256_set1_epi64x (C) ,202); +} + +__m256i +maskz_pternlog (__m256i A, __m256i B, __mmask8 U) +{ + return _mm256_maskz_ternarylogic_epi64 (U, A, B, _mm256_set1_epi64x (C) ,202); +} + +__m256i +none_pternlog (__m256i A, __m256i B) +{ + return _mm256_ternarylogic_epi64 (A, B, _mm256_set1_epi64x (C) ,202); +} diff --git a/gcc/testsuite/gcc.target/i386/pr102021.c b/gcc/testsuite/gcc.target/i386/pr102021.c new file mode 100644 index 00000000000..6db3f57dc76 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102021.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=skylake-avx512" } */ + +#include + +__m256i +foo () +{ + return _mm256_set1_epi16 (12); +} + +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+\[^\n\]*, %ymm\[0-9\]+" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ +/* { dg-final { scan-assembler-not "vzeroupper" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-10a.c b/gcc/testsuite/gcc.target/i386/pr102566-10a.c new file mode 100644 index 00000000000..1c1f86a9659 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-10a.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic int *v, int bit) +{ + int mask = 1 << bit; + return atomic_fetch_and_explicit (v, ~mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-10b.c b/gcc/testsuite/gcc.target/i386/pr102566-10b.c new file mode 100644 index 00000000000..0bf39824ea6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-10b.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic long long int *v, int bit) +{ + long long int mask = 1ll << bit; + return atomic_fetch_and_explicit (v, ~mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrq" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-11.c b/gcc/testsuite/gcc.target/i386/pr102566-11.c new file mode 100644 index 00000000000..2c8f8c4e59a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-11.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +#define MASK 0x1234 + +bool +foo1 (_Atomic int *v) +{ + return atomic_fetch_or_explicit (v, MASK, memory_order_relaxed) & MASK; +} + +bool +foo2 (_Atomic unsigned int *v, int mask) +{ + return atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask; +} + +bool +foo3 (_Atomic unsigned int *v, int mask) +{ + return !(atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask); +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchg" 3 } } */ +/* { dg-final { scan-assembler-not "bts" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-12.c b/gcc/testsuite/gcc.target/i386/pr102566-12.c new file mode 100644 index 00000000000..4603a77612c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-12.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +#define MASK 0x1234 + +bool +foo1 (_Atomic long *v) +{ + return atomic_fetch_and_explicit (v, ~MASK, memory_order_relaxed) & MASK; +} + +bool +foo2 (_Atomic long *v, long mask) +{ + return atomic_fetch_and_explicit (v, ~mask, memory_order_relaxed) & mask; +} + +bool +foo3 (_Atomic long *v, long mask) +{ + return !(atomic_fetch_and_explicit (v, ~mask, memory_order_relaxed) & mask); +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchg" 3 } } */ +/* { dg-final { scan-assembler-not "btr" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-13.c b/gcc/testsuite/gcc.target/i386/pr102566-13.c new file mode 100644 index 00000000000..2657a2f62ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-13.c @@ -0,0 +1,66 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +#include +#include + +#define FOO(TYPE,MASK) \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + +FOO(short, 0); +FOO(short, 7); +FOO(short, 15); +FOO(int, 0); +FOO(int, 15); +FOO(int, 31); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 12 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 24 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 12 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-14.c b/gcc/testsuite/gcc.target/i386/pr102566-14.c new file mode 100644 index 00000000000..24681c1da18 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-14.c @@ -0,0 +1,65 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ +#include +#include +typedef long long int64; + +#define FOO(TYPE,MASK) \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) TYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + + +FOO(int64, 0); +FOO(int64, 32); +FOO(int64, 63); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 6 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 12 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 6 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-1a.c b/gcc/testsuite/gcc.target/i386/pr102566-1a.c new file mode 100644 index 00000000000..a915de354e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-1a.c @@ -0,0 +1,188 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void bar (void); + +__attribute__((noinline, noclone)) int +f1 (int *a, int bit) +{ + int mask = 1 << bit; + return (__sync_fetch_and_or (a, mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f2 (int *a, int bit) +{ + int mask = 1 << bit; + int t1 = __atomic_fetch_or (a, mask, __ATOMIC_RELAXED); + int t2 = t1 & mask; + return t2 != 0; +} + +__attribute__((noinline, noclone)) long int +f3 (long int *a, int bit) +{ + long int mask = 1l << bit; + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) == 0; +} + +__attribute__((noinline, noclone)) int +f4 (int *a) +{ + int mask = 1 << 7; + return (__sync_fetch_and_or (a, mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f5 (int *a) +{ + int mask = 1 << 13; + return (__atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f6 (int *a) +{ + int mask = 1 << 0; + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) void +f7 (int *a, int bit) +{ + int mask = 1 << bit; + if ((__sync_fetch_and_xor (a, mask) & mask) != 0) + bar (); +} + +__attribute__((noinline, noclone)) void +f8 (int *a, int bit) +{ + int mask = 1 << bit; + if ((__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) == 0) + bar (); +} + +__attribute__((noinline, noclone)) int +f9 (int *a, int bit) +{ + int mask = 1 << bit; + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f10 (int *a) +{ + int mask = 1 << 7; + return (__sync_fetch_and_xor (a, mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f11 (int *a) +{ + int mask = 1 << 13; + return (__atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f12 (int *a) +{ + int mask = 1 << 0; + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f13 (int *a, int bit) +{ + int mask = 1 << bit; + return (__sync_fetch_and_and (a, ~mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f14 (int *a, int bit) +{ + int mask = 1 << bit; + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f15 (int *a, int bit) +{ + int mask = 1 << bit; + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f16 (int *a) +{ + int mask = 1 << 7; + return (__sync_fetch_and_and (a, ~mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f17 (int *a) +{ + int mask = 1 << 13; + return (__atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask) != 0; +} + +__attribute__((noinline, noclone)) int +f18 (int *a) +{ + int mask = 1 << 0; + return (__atomic_fetch_and (a, ~mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) long int +f19 (long int *a, int bit) +{ + long int mask = 1l << bit; + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +__attribute__((noinline, noclone)) long int +f20 (long int *a) +{ + long int mask = 1l << 7; + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask) == 0; +} + +__attribute__((noinline, noclone)) int +f21 (int *a, int bit) +{ + int mask = 1 << bit; + return (__sync_fetch_and_or (a, mask) & mask); +} + +__attribute__((noinline, noclone)) long int +f22 (long int *a) +{ + long int mask = 1l << 7; + return (__atomic_xor_fetch (a, mask, __ATOMIC_SEQ_CST) & mask); +} + +__attribute__((noinline, noclone)) long int +f23 (long int *a) +{ + long int mask = 1l << 7; + return (__atomic_fetch_xor (a, mask, __ATOMIC_SEQ_CST) & mask); +} + +__attribute__((noinline, noclone)) short int +f24 (short int *a) +{ + short int mask = 1 << 7; + return (__sync_fetch_and_or (a, mask) & mask) != 0; +} + +__attribute__((noinline, noclone)) short int +f25 (short int *a) +{ + short int mask = 1 << 7; + return (__atomic_fetch_or (a, mask, __ATOMIC_SEQ_CST) & mask) != 0; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 9 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 10 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 6 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-1b.c b/gcc/testsuite/gcc.target/i386/pr102566-1b.c new file mode 100644 index 00000000000..c4dab8135c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-1b.c @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -g" } */ + +int cnt; + +__attribute__((noinline, noclone)) void +bar (void) +{ + cnt++; +} + +#include "pr102566-1a.c" + +int a; +long int b; +unsigned long int c; +unsigned short int d; + +int +main () +{ + __atomic_store_n (&a, 15, __ATOMIC_RELAXED); + if (f1 (&a, 2) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 15 + || f1 (&a, 4) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 31) + __builtin_abort (); + if (f2 (&a, 1) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 31 + || f2 (&a, 5) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 63) + __builtin_abort (); + __atomic_store_n (&b, 24, __ATOMIC_RELAXED); + if (f3 (&b, 2) != 1 || __atomic_load_n (&b, __ATOMIC_RELAXED) != 28 + || f3 (&b, 3) != 0 || __atomic_load_n (&b, __ATOMIC_RELAXED) != 28) + __builtin_abort (); + __atomic_store_n (&a, 0, __ATOMIC_RELAXED); + if (f4 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 128 + || f4 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 128) + __builtin_abort (); + if (f5 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8320 + || f5 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8320) + __builtin_abort (); + if (f6 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321 + || f6 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (cnt != 0 + || (f7 (&a, 7), cnt) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193 + || (f7 (&a, 7), cnt) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if ((f8 (&a, 7), cnt) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193 + || (f8 (&a, 7), cnt) != 2 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (f9 (&a, 13) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 129 + || f9 (&a, 13) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (f10 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193 + || f10 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (f11 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 129 + || f11 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (f12 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8320 + || f12 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8321) + __builtin_abort (); + if (f13 (&a, 7) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193 + || f13 (&a, 7) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193) + __builtin_abort (); + if (f14 (&a, 13) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 1 + || f14 (&a, 13) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 1) + __builtin_abort (); + if (f15 (&a, 0) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 0 + || f15 (&a, 0) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 0) + __builtin_abort (); + __atomic_store_n (&a, 8321, __ATOMIC_RELAXED); + if (f16 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193 + || f16 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 8193) + __builtin_abort (); + if (f17 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 1 + || f17 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 1) + __builtin_abort (); + if (f18 (&a) != 1 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 0 + || f18 (&a) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 0) + __builtin_abort (); + if (f19 (&c, 7) != 1 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 128 + || f19 (&c, 7) != 0 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 0) + __builtin_abort (); + if (f20 (&c) != 0 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 128 + || f20 (&c) != 1 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 0) + __builtin_abort (); + __atomic_store_n (&a, 128, __ATOMIC_RELAXED); + if (f21 (&a, 4) != 0 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 144 + || f21 (&a, 4) != 16 || __atomic_load_n (&a, __ATOMIC_RELAXED) != 144) + __builtin_abort (); + __atomic_store_n (&c, 1, __ATOMIC_RELAXED); + if (f22 (&c) != 128 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 129 + || f22 (&c) != 0 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 1) + __builtin_abort (); + if (f23 (&c) != 0 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 129 + || f23 (&c) != 128 || __atomic_load_n (&c, __ATOMIC_RELAXED) != 1) + __builtin_abort (); + if (f24 (&d) != 0 || __atomic_load_n (&d, __ATOMIC_RELAXED) != 128 + || f24 (&d) != 1 || __atomic_load_n (&d, __ATOMIC_RELAXED) != 128) + __builtin_abort (); + __atomic_store_n (&d, 1, __ATOMIC_RELAXED); + if (f25 (&d) != 0 || __atomic_load_n (&d, __ATOMIC_RELAXED) != 129 + || f25 (&d) != 1 || __atomic_load_n (&d, __ATOMIC_RELAXED) != 129 + || cnt != 2) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr102566-2.c b/gcc/testsuite/gcc.target/i386/pr102566-2.c new file mode 100644 index 00000000000..00a7c349f2a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-2.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic int *v) +{ +#define BIT (1 << 0) + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo30 (_Atomic int *v) +{ +#define BIT (1 << 30) + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo31 (_Atomic int *v) +{ +#define BIT (1 << 31) + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-3a.c b/gcc/testsuite/gcc.target/i386/pr102566-3a.c new file mode 100644 index 00000000000..8bf1cd6e1bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-3a.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic int *v, int bit) +{ + int mask = 1 << bit; + return atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-3b.c b/gcc/testsuite/gcc.target/i386/pr102566-3b.c new file mode 100644 index 00000000000..d155ed367a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-3b.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic long long int *v, int bit) +{ + long long int mask = 1ll << bit; + return atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsq" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-4.c b/gcc/testsuite/gcc.target/i386/pr102566-4.c new file mode 100644 index 00000000000..2668ccf827c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-4.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic int *v, int bit) +{ + unsigned int mask = 1 << bit; + return atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-5.c b/gcc/testsuite/gcc.target/i386/pr102566-5.c new file mode 100644 index 00000000000..8bf1cd6e1bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-5.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo (_Atomic int *v, int bit) +{ + int mask = 1 << bit; + return atomic_fetch_or_explicit (v, mask, memory_order_relaxed) & mask; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 1 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-6.c b/gcc/testsuite/gcc.target/i386/pr102566-6.c new file mode 100644 index 00000000000..3dfe55ac683 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-6.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic int *v) +{ +#define BIT (1 << 0) + return !(atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo30 (_Atomic int *v) +{ +#define BIT (1 << 30) + return !(atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo31 (_Atomic int *v) +{ +#define BIT (1 << 31) + return !(atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-7.c b/gcc/testsuite/gcc.target/i386/pr102566-7.c new file mode 100644 index 00000000000..6bc0ae0f320 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-7.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +typedef int __attribute__ ((mode (__word__))) int_type; + +#define BIT (1 << 0) + +bool +foo0 (_Atomic int_type *v) +{ + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & ~1; +} + +bool +foo1 (_Atomic int_type *v) +{ + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & ~2; +} + +bool +foo2 (_Atomic int_type *v) +{ + return atomic_fetch_or_explicit (v, BIT, memory_order_relaxed) & ~3; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchg" 3 } } */ +/* { dg-final { scan-assembler-not "bts" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-8a.c b/gcc/testsuite/gcc.target/i386/pr102566-8a.c new file mode 100644 index 00000000000..168e3db78c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-8a.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic int *v) +{ +#define BIT (1 << 0) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo30 (_Atomic int *v) +{ +#define BIT (1 << 30) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo31 (_Atomic int *v) +{ +#define BIT (1 << 31) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-8b.c b/gcc/testsuite/gcc.target/i386/pr102566-8b.c new file mode 100644 index 00000000000..392da3098e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-8b.c @@ -0,0 +1,32 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic long long *v) +{ +#define BIT (1ll << 0) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo30 (_Atomic long long *v) +{ +#define BIT (1ll << 62) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +bool +foo31 (_Atomic long long *v) +{ +#define BIT (1ll << 63) + return atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT; +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrq" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-9a.c b/gcc/testsuite/gcc.target/i386/pr102566-9a.c new file mode 100644 index 00000000000..3fa2a3ef043 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-9a.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic int *v) +{ +#define BIT (1 << 0) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo30 (_Atomic int *v) +{ +#define BIT (1 << 30) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo31 (_Atomic int *v) +{ +#define BIT (1 << 31) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrl" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr102566-9b.c b/gcc/testsuite/gcc.target/i386/pr102566-9b.c new file mode 100644 index 00000000000..38ddbdc630f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr102566-9b.c @@ -0,0 +1,32 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ + +#include +#include + +bool +foo0 (_Atomic long long *v) +{ +#define BIT (1ll << 0) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo30 (_Atomic long long *v) +{ +#define BIT (1ll << 62) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +bool +foo31 (_Atomic long long *v) +{ +#define BIT (1ll << 63) + return !(atomic_fetch_and_explicit (v, ~BIT, memory_order_relaxed) & BIT); +#undef BIT +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrq" 3 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103069-1.c b/gcc/testsuite/gcc.target/i386/pr103069-1.c new file mode 100644 index 00000000000..f819af4409c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103069-1.c @@ -0,0 +1,35 @@ +/* PR target/103068 */ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -march=x86-64 -mtune=generic -mrelax-cmpxchg-loop" } */ +/* { dg-final { scan-assembler-times "rep;?\[ \\t\]+nop" 32 } } */ + +#include + +#define FUNC_ATOMIC(TYPE, OP) \ +__attribute__ ((noinline, noclone)) \ +TYPE f_##TYPE##_##OP##_fetch (TYPE *a, TYPE b) \ +{ \ + return __atomic_##OP##_fetch (a, b, __ATOMIC_RELAXED); \ +} \ +__attribute__ ((noinline, noclone)) \ +TYPE f_##TYPE##_fetch_##OP (TYPE *a, TYPE b) \ +{ \ + return __atomic_fetch_##OP (a, b, __ATOMIC_RELAXED); \ +} + +FUNC_ATOMIC (int64_t, and) +FUNC_ATOMIC (int64_t, nand) +FUNC_ATOMIC (int64_t, or) +FUNC_ATOMIC (int64_t, xor) +FUNC_ATOMIC (int, and) +FUNC_ATOMIC (int, nand) +FUNC_ATOMIC (int, or) +FUNC_ATOMIC (int, xor) +FUNC_ATOMIC (short, and) +FUNC_ATOMIC (short, nand) +FUNC_ATOMIC (short, or) +FUNC_ATOMIC (short, xor) +FUNC_ATOMIC (char, and) +FUNC_ATOMIC (char, nand) +FUNC_ATOMIC (char, or) +FUNC_ATOMIC (char, xor) diff --git a/gcc/testsuite/gcc.target/i386/pr103069-2.c b/gcc/testsuite/gcc.target/i386/pr103069-2.c new file mode 100644 index 00000000000..b3f2235fd55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103069-2.c @@ -0,0 +1,71 @@ +/* PR target/103069 */ +/* { dg-do run } */ +/* { dg-additional-options "-O2 -march=x86-64 -mtune=generic" } */ + +#include +#include "pr103069-1.c" + +#define FUNC_ATOMIC_RELAX(TYPE, OP) \ +__attribute__ ((noinline, noclone, target ("relax-cmpxchg-loop"))) \ +TYPE relax_##TYPE##_##OP##_fetch (TYPE *a, TYPE b) \ +{ \ + return __atomic_##OP##_fetch (a, b, __ATOMIC_RELAXED); \ +} \ +__attribute__ ((noinline, noclone, target ("relax-cmpxchg-loop"))) \ +TYPE relax_##TYPE##_fetch_##OP (TYPE *a, TYPE b) \ +{ \ + return __atomic_fetch_##OP (a, b, __ATOMIC_RELAXED); \ +} + +FUNC_ATOMIC_RELAX (int64_t, and) +FUNC_ATOMIC_RELAX (int64_t, nand) +FUNC_ATOMIC_RELAX (int64_t, or) +FUNC_ATOMIC_RELAX (int64_t, xor) +FUNC_ATOMIC_RELAX (int, and) +FUNC_ATOMIC_RELAX (int, nand) +FUNC_ATOMIC_RELAX (int, or) +FUNC_ATOMIC_RELAX (int, xor) +FUNC_ATOMIC_RELAX (short, and) +FUNC_ATOMIC_RELAX (short, nand) +FUNC_ATOMIC_RELAX (short, or) +FUNC_ATOMIC_RELAX (short, xor) +FUNC_ATOMIC_RELAX (char, and) +FUNC_ATOMIC_RELAX (char, nand) +FUNC_ATOMIC_RELAX (char, or) +FUNC_ATOMIC_RELAX (char, xor) + +#define TEST_ATOMIC_FETCH_LOGIC(TYPE, OP) \ +{ \ + TYPE a = 11, b = 101, res, exp; \ + TYPE c = 11, d = 101; \ + res = relax_##TYPE##_##OP##_fetch (&a, b); \ + exp = f_##TYPE##_##OP##_fetch (&c, d); \ + if (res != exp) \ + abort (); \ + a = c = 21, b = d = 92; \ + res = relax_##TYPE##_fetch_##OP (&a, b); \ + exp = f_##TYPE##_fetch_##OP (&c, d); \ + if (res != exp) \ + abort (); \ +} + +int main (void) +{ + TEST_ATOMIC_FETCH_LOGIC (int64_t, and) + TEST_ATOMIC_FETCH_LOGIC (int64_t, nand) + TEST_ATOMIC_FETCH_LOGIC (int64_t, or) + TEST_ATOMIC_FETCH_LOGIC (int64_t, xor) + TEST_ATOMIC_FETCH_LOGIC (int, and) + TEST_ATOMIC_FETCH_LOGIC (int, nand) + TEST_ATOMIC_FETCH_LOGIC (int, or) + TEST_ATOMIC_FETCH_LOGIC (int, xor) + TEST_ATOMIC_FETCH_LOGIC (short, and) + TEST_ATOMIC_FETCH_LOGIC (short, nand) + TEST_ATOMIC_FETCH_LOGIC (short, or) + TEST_ATOMIC_FETCH_LOGIC (short, xor) + TEST_ATOMIC_FETCH_LOGIC (char, and) + TEST_ATOMIC_FETCH_LOGIC (char, nand) + TEST_ATOMIC_FETCH_LOGIC (char, or) + TEST_ATOMIC_FETCH_LOGIC (char, xor) + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr103194-2.c b/gcc/testsuite/gcc.target/i386/pr103194-2.c new file mode 100644 index 00000000000..1a991fe0199 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103194-2.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +#include +#include + +#define FOO(RTYPE,TYPE,MASK) \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + +FOO(char, short, 0); +FOO(char, short, 7); +FOO(short, int, 0); +FOO(short, int, 15); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 8 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 16 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 8 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103194-3.c b/gcc/testsuite/gcc.target/i386/pr103194-3.c new file mode 100644 index 00000000000..4907598bbd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103194-3.c @@ -0,0 +1,64 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ +#include +#include +typedef long long int64; + +#define FOO(RTYPE, TYPE,MASK) \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + + +FOO(int, int64, 1); +FOO(int, int64, 31); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 4 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 8 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 4 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103194-4.c b/gcc/testsuite/gcc.target/i386/pr103194-4.c new file mode 100644 index 00000000000..8573016c5d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103194-4.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +#include +#include + +#define FOO(RTYPE,TYPE) \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1 << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + +FOO(short, int); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 2 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 4 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 2 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103194-5.c b/gcc/testsuite/gcc.target/i386/pr103194-5.c new file mode 100644 index 00000000000..2e335285c2f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103194-5.c @@ -0,0 +1,62 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2" } */ +#include +#include +#include + +#define FOO(RTYPE,TYPE) \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_or_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_or (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_xor_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_xor (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_xor_fetch_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_xor_fetch (a, mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + atomic_fetch_and_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __atomic_fetch_and (a, ~mask, __ATOMIC_RELAXED) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_or_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_or (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_xor_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_xor (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_xor_and_fetch_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_xor_and_fetch (a, mask) & mask; \ + } \ + __attribute__((noinline,noclone)) RTYPE \ + sync_fetch_and_and_##TYPE##_##MASK (_Atomic TYPE* a, TYPE MASK) \ + { \ + TYPE mask = 1ll << MASK; \ + return __sync_fetch_and_and (a, ~mask) & mask; \ + } \ + +FOO(int, int64_t); + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*bts" 2 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btc" 4 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btr" 2 } } */ +/* { dg-final { scan-assembler-not "cmpxchg" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr103194.c b/gcc/testsuite/gcc.target/i386/pr103194.c new file mode 100644 index 00000000000..a6d84332e4d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103194.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +long pscc_a_2_3; +int pscc_a_1_4; +void pscc() +{ + pscc_a_1_4 = __sync_fetch_and_and(&pscc_a_2_3, 1); +} + +static int si; +long +test_types (long n) +{ + unsigned int u2 = __atomic_fetch_xor (&si, 0, 5); + return u2; +} diff --git a/gcc/testsuite/gcc.target/i386/pr103205-2.c b/gcc/testsuite/gcc.target/i386/pr103205-2.c new file mode 100644 index 00000000000..705081e51d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103205-2.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */ + +extern short foo; +extern unsigned short bar; + +int +foo1 (void) +{ + return __sync_fetch_and_and (&foo, ~1) & 1; +} + +int +foo2 (void) +{ + return __sync_fetch_and_or (&foo, 1) & 1; +} + +int +foo3 (void) +{ + return __sync_fetch_and_xor (&foo, 1) & 1; +} + +unsigned short +bar1 (void) +{ + return __sync_fetch_and_and (&bar, ~1) & 1; +} + +unsigned short +bar2 (void) +{ + return __sync_fetch_and_or (&bar, 1) & 1; +} + +unsigned short +bar3 (void) +{ + return __sync_fetch_and_xor (&bar, 1) & 1; +} + +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 2 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 2 } } */ +/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 2 } } */ +/* { dg-final { scan-assembler-not "cmpxchgw" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr104441-1a.c b/gcc/testsuite/gcc.target/i386/pr104441-1a.c new file mode 100644 index 00000000000..83734f710bd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104441-1a.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=x86-64 -mtune=skylake -Wno-attributes" } */ + +#include +#include + +__attribute__((always_inline, target("avx2"))) +static __m256i +load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) +{ + __m128i src01, src23; + src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); + src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); + return _mm256_setr_m128i(src01, src23); +} + +__attribute__ ((noinline, noipa, target("avx2"))) +uint32_t +compute4x_m_sad_avx2_intrin(uint8_t *src, uint32_t src_stride, + uint8_t *ref, uint32_t ref_stride, + uint32_t height) +{ + __m128i xmm0; + __m256i ymm = _mm256_setzero_si256(); + uint32_t y; + + for (y = 0; y < height; y += 4) { + const __m256i src0123 = load8bit_4x4_avx2(src, src_stride); + const __m256i ref0123 = load8bit_4x4_avx2(ref, ref_stride); + ymm = _mm256_add_epi32(ymm, _mm256_sad_epu8(src0123, ref0123)); + src += src_stride << 2; + ref += ref_stride << 2; + } + + xmm0 = _mm_add_epi32(_mm256_castsi256_si128(ymm), + _mm256_extracti128_si256(ymm, 1)); + + return (uint32_t)_mm_cvtsi128_si32(xmm0); +} + +/* Expect assembly like: + + vextracti128 $0x1, %ymm3, %xmm3 + vpaddd %xmm3, %xmm0, %xmm0 + vmovd %xmm0, %eax + vzeroupper + +rather than: + + vzeroupper + vextracti128 $0x1, %ymm3, %xmm3 + vpaddd %xmm3, %xmm0, %xmm0 + vmovd %xmm0, %eax + + */ + +/* { dg-final { scan-assembler "\[ \t\]+vextracti128\[ \t\]+\[^\n\]+\n\[ \t\]+vpaddd\[ \t\]+\[^\n\]+\n\[ \t\]+vmovd\[ \t\]+\[^\n\]+\n\[ \t\]+vzeroupper" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr104441-1b.c b/gcc/testsuite/gcc.target/i386/pr104441-1b.c new file mode 100644 index 00000000000..325af044bb8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104441-1b.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -march=x86-64 -mvzeroupper -Wno-attributes" } */ + +#include "pr104441-1a.c" + +#define ARRAY_SIZE 255 + +__attribute__ ((noinline, noipa)) +static void +do_test (void) +{ + uint8_t src[ARRAY_SIZE]; + uint8_t ref[ARRAY_SIZE]; + uint32_t x; + uint32_t i; + for (i = 0; i < ARRAY_SIZE; i++) + { + src[i] = i; + ref[i] = i; + } + x = compute4x_m_sad_avx2_intrin(src, 64 >> 2, ref, 64, 4); + if (x != 0x240) + __builtin_abort (); +} + +int +main () +{ + if (__builtin_cpu_supports ("avx2")) + do_test (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-1.c b/gcc/testsuite/gcc.target/i386/pr104704-1.c new file mode 100644 index 00000000000..28c499f4c44 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-1.c @@ -0,0 +1,33 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f" } */ + +#include + +__m512d y, z; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm31 __asm ("xmm31") = i; + asm volatile ("" : "+v" (xmm31)); + z = y; + register int xmm2 __asm ("xmm2") = xmm31; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=x86-64"))) +int +main (void) +{ + if (__builtin_cpu_supports ("avx512f")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-2.c b/gcc/testsuite/gcc.target/i386/pr104704-2.c new file mode 100644 index 00000000000..79b04b2543a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-2.c @@ -0,0 +1,33 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64 -mavx2" } */ + +#include + +__m256d y, z; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm15 __asm ("xmm15") = i; + asm volatile ("" : "+v" (xmm15)); + z = y; + register int xmm2 __asm ("xmm2") = xmm15; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=x86-64"))) +int +main (void) +{ + if (__builtin_cpu_supports ("avx2")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-3.c b/gcc/testsuite/gcc.target/i386/pr104704-3.c new file mode 100644 index 00000000000..d0648d82fbd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-3.c @@ -0,0 +1,33 @@ +/* { dg-do run { target ia32 } } */ +/* { dg-options "-O2 -march=i686 -msse2" } */ + +#include + +__m128d y, z; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm7 __asm ("xmm7") = i; + asm volatile ("" : "+v" (xmm7)); + z = y; + register int xmm2 __asm ("xmm2") = xmm7; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=i486"))) +int +main (void) +{ + if (__builtin_cpu_supports ("sse2")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-4.c b/gcc/testsuite/gcc.target/i386/pr104704-4.c new file mode 100644 index 00000000000..c97666293b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-4.c @@ -0,0 +1,33 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64 -mavx512f" } */ + +#include + +char z[128]; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm31 __asm ("xmm31") = i; + asm volatile ("" : "+v" (xmm31)); + __builtin_memset (&z, 0, sizeof (z)); + register int xmm2 __asm ("xmm2") = xmm31; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=x86-64"))) +int +main (void) +{ + if (__builtin_cpu_supports ("avx512f")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-5.c b/gcc/testsuite/gcc.target/i386/pr104704-5.c new file mode 100644 index 00000000000..de9466e697d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-5.c @@ -0,0 +1,33 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -march=x86-64 -mtune=skylake -mavx2" } */ + +#include + +char z[64]; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm15 __asm ("xmm15") = i; + asm volatile ("" : "+v" (xmm15)); + __builtin_memset (&z, 0, sizeof (z)); + register int xmm2 __asm ("xmm2") = xmm15; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=x86-64"))) +int +main (void) +{ + if (__builtin_cpu_supports ("avx2")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr104704-6.c b/gcc/testsuite/gcc.target/i386/pr104704-6.c new file mode 100644 index 00000000000..e6a4cb840ee --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104704-6.c @@ -0,0 +1,33 @@ +/* { dg-do run { target ia32 } } */ +/* { dg-options "-O2 -march=i686 -mtune=skylake -msse2" } */ + +#include + +char z[16]; + +int i; + +__attribute__((noipa)) +int +do_test (void) +{ + register int xmm7 __asm ("xmm7") = i; + asm volatile ("" : "+v" (xmm7)); + __builtin_memset (&z, 0, sizeof (z)); + register int xmm2 __asm ("xmm2") = xmm7; + asm volatile ("" : "+v" (xmm2)); + return xmm2; +} + +__attribute__((target("arch=i486"))) +int +main (void) +{ + if (__builtin_cpu_supports ("sse2")) + { + i = 4; + if (do_test () != 4) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr35513-10a.c b/gcc/testsuite/gcc.target/i386/pr35513-10a.c new file mode 100644 index 00000000000..d7b5c98fa8c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-10a.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mno-direct-extern-access" } */ + +/* Weak common symbol with -fpic. */ +__attribute__((weak, visibility("protected"))) +int xxx; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-10b.c b/gcc/testsuite/gcc.target/i386/pr35513-10b.c new file mode 100644 index 00000000000..a40692e6e3d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-10b.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mdirect-extern-access" } */ + +/* Weak common symbol with -fpic. */ +__attribute__((weak, visibility("protected"),nodirect_extern_access)) +int xxx; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-11a.c b/gcc/testsuite/gcc.target/i386/pr35513-11a.c new file mode 100644 index 00000000000..5489f1e5cee --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-11a.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mno-direct-extern-access" } */ + +/* Initialized symbol with -fpic. */ +__attribute__((visibility("protected"))) +int xxx = -1; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-11b.c b/gcc/testsuite/gcc.target/i386/pr35513-11b.c new file mode 100644 index 00000000000..2704900fed5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-11b.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mdirect-extern-access" } */ + +/* Initialized symbol with -fpic. */ +__attribute__((visibility("protected"), nodirect_extern_access)) +int xxx = -1; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-12a.c b/gcc/testsuite/gcc.target/i386/pr35513-12a.c new file mode 100644 index 00000000000..8b3123f9042 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-12a.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mno-direct-extern-access" } */ + +/* Weak initialized symbol with -fpic. */ +__attribute__((weak, visibility("protected"))) +int xxx = -1; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-12b.c b/gcc/testsuite/gcc.target/i386/pr35513-12b.c new file mode 100644 index 00000000000..a1b6b9e92df --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-12b.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mdirect-extern-access" } */ + +/* Weak initialized symbol with -fpic. */ +__attribute__((weak, visibility("protected"), nodirect_extern_access)) +int xxx = -1; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-1a.c b/gcc/testsuite/gcc.target/i386/pr35513-1a.c new file mode 100644 index 00000000000..972542423cb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-1a.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -mno-direct-extern-access" } */ + +extern void bar (void); +extern void *p; + +void +foo (void) +{ + p = &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-1b.c b/gcc/testsuite/gcc.target/i386/pr35513-1b.c new file mode 100644 index 00000000000..54a579a9e37 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-1b.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -mdirect-extern-access" } */ + +extern void bar (void) __attribute__ ((nodirect_extern_access)); +extern void *p; + +void +foo (void) +{ + p = &bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-2a.c b/gcc/testsuite/gcc.target/i386/pr35513-2a.c new file mode 100644 index 00000000000..74fa8fc9d97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-2a.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -mno-direct-extern-access" } */ + +extern int bar; + +int +foo (void) +{ + return bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-2b.c b/gcc/testsuite/gcc.target/i386/pr35513-2b.c new file mode 100644 index 00000000000..ae2edff8d93 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-2b.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fno-pic -mdirect-extern-access" } */ + +extern int bar __attribute__ ((nodirect_extern_access)); + +int +foo (void) +{ + return bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-3a.c b/gcc/testsuite/gcc.target/i386/pr35513-3a.c new file mode 100644 index 00000000000..4ca4332c4ab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-3a.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpie -mno-direct-extern-access" } */ + +extern int bar; + +int +foo (void) +{ + return bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT" { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-3b.c b/gcc/testsuite/gcc.target/i386/pr35513-3b.c new file mode 100644 index 00000000000..c3888039834 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-3b.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpie -mdirect-extern-access" } */ + +extern int bar __attribute__ ((nodirect_extern_access)); + +int +foo (void) +{ + return bar; +} + +/* { dg-final { scan-assembler "mov\(l|q\)\[ \t\]*bar@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "movl\[ \t\]*bar@GOT" { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "mov\(l|q\)\[ \t\]*\\\$bar," { target { ia32 && got32x_reloc } } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-4a.c b/gcc/testsuite/gcc.target/i386/pr35513-4a.c new file mode 100644 index 00000000000..9c3a199404c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-4a.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fno-pic -mno-direct-extern-access" } */ + +extern void foo (void); + +int +bar (void) +{ + foo (); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]*foo" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-4b.c b/gcc/testsuite/gcc.target/i386/pr35513-4b.c new file mode 100644 index 00000000000..e1a50784bf9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-4b.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fno-pic -mdirect-extern-access" } */ + +extern void foo (void) __attribute__ ((nodirect_extern_access)); + +int +bar (void) +{ + foo (); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]*foo" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-5a.c b/gcc/testsuite/gcc.target/i386/pr35513-5a.c new file mode 100644 index 00000000000..4d2e1732838 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-5a.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fpic -mno-direct-extern-access" } */ + +extern void foo (void); + +int +bar (void) +{ + foo (); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]*foo@PLT" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-5b.c b/gcc/testsuite/gcc.target/i386/pr35513-5b.c new file mode 100644 index 00000000000..81e98ed7836 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-5b.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fpic -mdirect-extern-access" } */ + +extern void foo (void) __attribute__ ((nodirect_extern_access)); + +int +bar (void) +{ + foo (); + return 0; +} + +/* { dg-final { scan-assembler "call\[ \t\]*foo@PLT" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-6a.c b/gcc/testsuite/gcc.target/i386/pr35513-6a.c new file mode 100644 index 00000000000..ece878e3c3a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-6a.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fno-pic -mno-direct-extern-access" } */ + +extern void foo (void); + +void +bar (void) +{ + foo (); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]*foo" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-6b.c b/gcc/testsuite/gcc.target/i386/pr35513-6b.c new file mode 100644 index 00000000000..3f679defdab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-6b.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fno-pic -mdirect-extern-access" } */ + +extern void foo (void) __attribute__ ((nodirect_extern_access)); + +void +bar (void) +{ + foo (); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]*foo" } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-7a.c b/gcc/testsuite/gcc.target/i386/pr35513-7a.c new file mode 100644 index 00000000000..1de014d39c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-7a.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fpic -mno-direct-extern-access" } */ + +extern void foo (void); + +void +bar (void) +{ + foo (); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]*foo@PLT" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]*foo@PLT" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-7b.c b/gcc/testsuite/gcc.target/i386/pr35513-7b.c new file mode 100644 index 00000000000..984e2dc2752 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-7b.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fplt -fpic -mdirect-extern-access" } */ + +extern void foo (void) __attribute__ ((nodirect_extern_access)); + +void +bar (void) +{ + foo (); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]*foo@PLT" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "call\[ \t\]*foo@PLT" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-8.c b/gcc/testsuite/gcc.target/i386/pr35513-8.c new file mode 100644 index 00000000000..d51f7efb353 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-8.c @@ -0,0 +1,44 @@ +/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */ +/* { dg-require-effective-target maybe_x32 } */ +/* { dg-options "-mx32 -O2 -fno-pic -fexceptions -fasynchronous-unwind-tables -mno-direct-extern-access" } */ + +extern int foo (int); +extern void exit (int __status) __attribute__ ((__nothrow__ )) __attribute__ ((__noreturn__)); +struct __pthread_cleanup_frame +{ + void (*__cancel_routine) (void *); + void *__cancel_arg; + int __do_it; + int __cancel_type; +}; +extern __inline void +__pthread_cleanup_routine (struct __pthread_cleanup_frame *__frame) +{ + if (__frame->__do_it) + __frame->__cancel_routine (__frame->__cancel_arg); +} +static int cl_called; + +static void +cl (void *arg) +{ + ++cl_called; +} + + +void * +tf_usleep (void *arg) +{ + + do { struct __pthread_cleanup_frame __clframe __attribute__ ((__cleanup__ (__pthread_cleanup_routine))) = { .__cancel_routine = (cl), .__cancel_arg = ( + ((void *)0)), .__do_it = 1 };; + + foo (arg == ((void *)0) ? (0x7fffffffL * 2UL + 1UL) : 0); + + __clframe.__do_it = (0); } while (0); + + exit (1); +} +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-9a.c b/gcc/testsuite/gcc.target/i386/pr35513-9a.c new file mode 100644 index 00000000000..533f1d2ddb4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-9a.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mno-direct-extern-access" } */ + +/* Common symbol with -fpic. */ +__attribute__((visibility("protected"))) +int xxx; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr35513-9b.c b/gcc/testsuite/gcc.target/i386/pr35513-9b.c new file mode 100644 index 00000000000..b6c66f43b40 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr35513-9b.c @@ -0,0 +1,20 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options "-O2 -fpic -mdirect-extern-access" } */ + +/* Common symbol with -fpic. */ +__attribute__((visibility("protected"), nodirect_extern_access)) +int xxx; + +int +foo () +{ + return xxx; +} + +/* { dg-final { scan-assembler "xxx\\(%rip\\)" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "xxx@GOTPCREL" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler "xxx@GOTOFF" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xxx@GOT\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler "\.section\[ \t]+.note.gnu.property," } } */ +/* { dg-final { scan-assembler "\.long\[ \t]+0xb0008000" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c b/gcc/testsuite/gcc.target/i386/pr72839.c index ea724f70377..6888d9d0a55 100644 --- a/gcc/testsuite/gcc.target/i386/pr72839.c +++ b/gcc/testsuite/gcc.target/i386/pr72839.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target ia32 } */ -/* { dg-options "-O2 -mtune=lakemont" } */ +/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */ extern char *strcpy (char *, const char *); diff --git a/gcc/testsuite/gcc.target/i386/pr83782-1.c b/gcc/testsuite/gcc.target/i386/pr83782-1.c new file mode 100644 index 00000000000..ce97b12e65d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr83782-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O2 -fpic" } */ + +static void +my_foo (void) +{ +} + +static void (*resolve_foo (void)) (void) +{ + return my_foo; +} + +extern void foo (void) __attribute__((ifunc("resolve_foo"), visibility("hidden"))); + +void * +bar(void) +{ + return foo; +} + +/* { dg-final { scan-assembler {leal[ \t]foo@GOTOFF\(%[^,]*\),[ \t]%eax} { target ia32 } } } */ +/* { dg-final { scan-assembler {lea(?:l|q)[ \t]foo\(%rip\),[ \t]%(?:e|r)ax} { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT\\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL\\\(" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr83782-2.c b/gcc/testsuite/gcc.target/i386/pr83782-2.c new file mode 100644 index 00000000000..e25d258bbda --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr83782-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O2 -fpic" } */ + +static void +my_foo (void) +{ +} + +static void (*resolve_foo (void)) (void) +{ + return my_foo; +} + +static void foo (void) __attribute__((ifunc("resolve_foo"))); + +void * +bar(void) +{ + return foo; +} + +/* { dg-final { scan-assembler {leal[ \t]foo@GOTOFF\(%[^,]*\),[ \t]%eax} { target ia32 } } } */ +/* { dg-final { scan-assembler {lea(?:l|q)[ \t]foo\(%rip\),[ \t]%(?:e|r)ax} { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "foo@GOT\\\(" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "foo@GOTPCREL\\\(" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr89984-1.c b/gcc/testsuite/gcc.target/i386/pr89984-1.c new file mode 100644 index 00000000000..d77691c0da0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr89984-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2" } */ + +float +check_f_pos (float x, float y) +{ + return x * __builtin_copysignf (1.0f, y); +} diff --git a/gcc/testsuite/gcc.target/i386/pr89984-2.c b/gcc/testsuite/gcc.target/i386/pr89984-2.c new file mode 100644 index 00000000000..ff6a8e50573 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr89984-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx" } */ + +float +check_f_pos (float x, float y) +{ + return x * __builtin_copysignf (1.0f, y); +} + +/* { dg-final { scan-assembler-not "vmovaps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-1.c b/gcc/testsuite/gcc.target/i386/pr90773-1.c new file mode 100644 index 00000000000..4fd5a40d99d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mtune=generic" } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 15); +} + +/* { dg-final { scan-assembler-times "movq\[\\t \]+\\(%\[\^,\]+\\)," 1 } } */ +/* { dg-final { scan-assembler-times "movq\[\\t \]+7\\(%\[\^,\]+\\)," 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-10.c b/gcc/testsuite/gcc.target/i386/pr90773-10.c new file mode 100644 index 00000000000..9ad725e4880 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-10.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int c) +{ + __builtin_memset (dst, c, 5); +} + +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-11.c b/gcc/testsuite/gcc.target/i386/pr90773-11.c new file mode 100644 index 00000000000..1734c03a2eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-11.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ + +extern char *dst; + +void +foo (int c) +{ + __builtin_memset (dst, c, 6); +} + +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-12.c b/gcc/testsuite/gcc.target/i386/pr90773-12.c new file mode 100644 index 00000000000..e45840a5b8d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-12.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */ + +void +foo (char *dst, char *src) +{ + __builtin_memcpy (dst, src, 255); +} + +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-13.c b/gcc/testsuite/gcc.target/i386/pr90773-13.c new file mode 100644 index 00000000000..4d5ae8d1086 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-13.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */ + +void +foo (char *dst) +{ + __builtin_memset (dst, 0, 255); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-14.c b/gcc/testsuite/gcc.target/i386/pr90773-14.c new file mode 100644 index 00000000000..60763bb7c37 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-14.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 1, 20); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movd\[\\t \]+%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-15.c b/gcc/testsuite/gcc.target/i386/pr90773-15.c new file mode 100644 index 00000000000..403cdb248a2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-15.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (int c) +{ + __builtin_memset (dst, c, 17); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb\[\\t \]+%.*, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+%.*, 16\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-16.c b/gcc/testsuite/gcc.target/i386/pr90773-16.c new file mode 100644 index 00000000000..bb0aadbc77e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, -1, 17); +} + +/* { dg-final { scan-assembler-times "(?:vpcmpeqd|vpternlogd)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$-1, 16\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-17.c b/gcc/testsuite/gcc.target/i386/pr90773-17.c new file mode 100644 index 00000000000..570748366f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-17.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 12, 19); +} + +/* { dg-final { scan-assembler-times "vpbroadcastb" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu8\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovd\[\\t \]+%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-18.c b/gcc/testsuite/gcc.target/i386/pr90773-18.c new file mode 100644 index 00000000000..b0687abbe01 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-18.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake-avx512" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 12, 9); +} + +/* { dg-final { scan-assembler-times "movabsq\[\\t \]+\\\$868082074056920076, %r" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, \\(%\[\^,\]+\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 4\\(%\[\^,\]+\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+\\\$12, 8\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-19.c b/gcc/testsuite/gcc.target/i386/pr90773-19.c new file mode 100644 index 00000000000..8aa5540bacc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-19.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 12, 9); +} + +/* { dg-final { scan-assembler-times "movabsq\[\\t \]+\\\$868082074056920076, %r" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, \\(%\[\^,\]+\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\\$202116108, 4\\(%\[\^,\]+\\)" 1 { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-2.c b/gcc/testsuite/gcc.target/i386/pr90773-2.c new file mode 100644 index 00000000000..64495751b46 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ +/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */ +/* { dg-additional-options "-mno-sse" { target ia32 } } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 19); +} + +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c b/gcc/testsuite/gcc.target/i386/pr90773-20.c new file mode 100644 index 00000000000..884a5502b59 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (int c) +{ + __builtin_memset (dst, c, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c b/gcc/testsuite/gcc.target/i386/pr90773-21.c new file mode 100644 index 00000000000..5bbb387a3ea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (int c) +{ + __builtin_memset (dst, c, 34); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c b/gcc/testsuite/gcc.target/i386/pr90773-22.c new file mode 100644 index 00000000000..245a436b7eb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 33); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c b/gcc/testsuite/gcc.target/i386/pr90773-23.c new file mode 100644 index 00000000000..ca4a86f30b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_store_by_pieces" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 34); +} + +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c b/gcc/testsuite/gcc.target/i386/pr90773-24.c new file mode 100644 index 00000000000..71f1fd8c4df --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64" } */ + +struct S +{ + long long s1 __attribute__ ((aligned (8))); + unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14; +}; + +const struct S array[] = { + { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 } +}; + +void +foo (struct S *x) +{ + x[0] = array[0]; +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 48\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c b/gcc/testsuite/gcc.target/i386/pr90773-25.c new file mode 100644 index 00000000000..ad19a88c883 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64" } */ + +struct S +{ + long long s1 __attribute__ ((aligned (8))); + unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14; +}; + +const struct S array[] = { + { 0, } +}; + +void +foo (struct S *x) +{ + x[0] = array[0]; +} + +/* { dg-final { scan-assembler-not "movdqa" } } */ +/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 16\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 48\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-26.c b/gcc/testsuite/gcc.target/i386/pr90773-26.c new file mode 100644 index 00000000000..76fb79f2e20 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-26.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=skylake -mtune-ctrl=avx256_move_by_pieces" } */ + +struct S +{ + long long s1 __attribute__ ((aligned (8))); + unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14; +}; + +const struct S array[] = { + { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 } +}; + +void +foo (struct S *x) +{ + x[0] = array[0]; +} + +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 32\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-3.c b/gcc/testsuite/gcc.target/i386/pr90773-3.c new file mode 100644 index 00000000000..84747c94652 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-3.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ +/* { dg-additional-options "-mno-avx -msse2" { target { ! ia32 } } } */ +/* { dg-additional-options "-mno-sse" { target ia32 } } */ + +extern char *dst, *src; + +void +foo (void) +{ + __builtin_memcpy (dst, src, 31); +} + +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+15\\(%\[\^,\]+\\)," 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+4\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+8\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+12\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+16\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+20\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+24\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "movl\[\\t \]+27\\(%\[\^,\]+\\)," 1 { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-4.c b/gcc/testsuite/gcc.target/i386/pr90773-4.c new file mode 100644 index 00000000000..ee4c04678d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 31); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, 15\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-5.c b/gcc/testsuite/gcc.target/i386/pr90773-5.c new file mode 100644 index 00000000000..9ef96279960 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 21); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movq\[\\t \]+%xmm\[0-9\]+, 13\\(%\[\^,\]+\\)" 1 { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-6.c b/gcc/testsuite/gcc.target/i386/pr90773-6.c new file mode 100644 index 00000000000..46498f6f50c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-6.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */ + +void +foo (char *dst, char *src) +{ + __builtin_memcpy (dst, src, 255); +} + +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-7.c b/gcc/testsuite/gcc.target/i386/pr90773-7.c new file mode 100644 index 00000000000..4d5ae8d1086 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-7.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */ + +void +foo (char *dst) +{ + __builtin_memset (dst, 0, 255); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-8.c b/gcc/testsuite/gcc.target/i386/pr90773-8.c new file mode 100644 index 00000000000..0d47845d560 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-8.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 5); +} + +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-9.c b/gcc/testsuite/gcc.target/i386/pr90773-9.c new file mode 100644 index 00000000000..ab5ea451f30 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-9.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ + +extern char *dst; + +void +foo (void) +{ + __builtin_memset (dst, 0, 6); +} + +/* { dg-final { scan-assembler-times "movl\[\\t \]+.+, \\(%\[\^,\]+\\)" 1 } } */ +/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 4\\(%\[\^,\]+\\)" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr98309-1.c b/gcc/testsuite/gcc.target/i386/pr98309-1.c new file mode 100644 index 00000000000..3a7afb58971 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98309-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mfpmath=sse -ffast-math" } */ +/* { dg-final { scan-assembler-times "vcvtsi2s\[sd\]" "2" } } */ +/* { dg-final { scan-assembler-times "vscalefs\[sd\]" "2" } } */ + +double +__attribute__((noipa)) +foo (double a, int b) +{ + return __builtin_ldexp (a, b); +} + +float +__attribute__((noipa)) +foo2 (float a, int b) +{ + return __builtin_ldexpf (a, b); +} diff --git a/gcc/testsuite/gcc.target/i386/pr98309-2.c b/gcc/testsuite/gcc.target/i386/pr98309-2.c new file mode 100644 index 00000000000..ecfb9168b7d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98309-2.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-options "-mavx512f -O2 -mfpmath=sse -ffast-math" } */ +/* { dg-require-effective-target avx512f } */ + +#define AVX512F +#ifndef CHECK +#define CHECK "avx512f-helper.h" +#endif + +#include CHECK + +#include "pr98309-1.c" + +double +__attribute__((noipa, target("fpmath=387"))) +foo_i387 (double a, int b) +{ + return __builtin_ldexp (a, b); +} + +float +__attribute__((noipa, target("fpmath=387"))) +foo2_i387 (float a, int b) +{ + return __builtin_ldexpf (a, b); +} + +static void +test_512 (void) +{ + float fa = 14.5; + double da = 44.5; + int fb = 12; + int db = 8; + if (foo_i387 (da, db) != foo (da, db)) + abort (); + if (foo2_i387 (fa, fb) != foo2 (fa, fb)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-1.c b/gcc/testsuite/gcc.target/i386/pr98737-1.c new file mode 100644 index 00000000000..33c84da51bb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-1.c @@ -0,0 +1,207 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*subq\t" { target lp64 } } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*subl\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*subw\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*subb\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) == 0; +} + +int +f2 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) == 0; +} + +int +f3 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) == 0; +} + +int +f4 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) == 0; +} + +int +f5 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) != 0; +} + +int +f6 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) != 0; +} + +int +f7 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) != 0; +} + +int +f8 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) != 0; +} + +int +f9 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) < 0; +} + +int +f10 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) < 0; +} + +int +f11 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) < 0; +} + +int +f12 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) < 0; +} + +int +f13 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) >= 0; +} + +int +f14 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) >= 0; +} + +int +f15 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) >= 0; +} + +int +f16 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) >= 0; +} + +int +f17 (long x) +{ + return __sync_sub_and_fetch (&a, x) == 0; +} + +int +f18 (int x) +{ + return __sync_sub_and_fetch (&b, x) == 0; +} + +int +f19 (short x) +{ + return __sync_sub_and_fetch (&c, x) == 0; +} + +int +f20 (char x) +{ + return __sync_sub_and_fetch (&d, x) == 0; +} + +int +f21 (long x) +{ + return __sync_sub_and_fetch (&a, x) != 0; +} + +int +f22 (int x) +{ + return __sync_sub_and_fetch (&b, x) != 0; +} + +int +f23 (short x) +{ + return __sync_sub_and_fetch (&c, x) != 0; +} + +int +f24 (char x) +{ + return __sync_sub_and_fetch (&d, x) != 0; +} + +int +f25 (long x) +{ + return __sync_sub_and_fetch (&a, x) < 0; +} + +int +f26 (int x) +{ + return __sync_sub_and_fetch (&b, x) < 0; +} + +int +f27 (short x) +{ + return __sync_sub_and_fetch (&c, x) < 0; +} + +int +f28 (char x) +{ + return __sync_sub_and_fetch (&d, x) < 0; +} + +int +f29 (long x) +{ + return __sync_sub_and_fetch (&a, x) >= 0; +} + +int +f30 (int x) +{ + return __sync_sub_and_fetch (&b, x) >= 0; +} + +int +f31 (short x) +{ + return __sync_sub_and_fetch (&c, x) >= 0; +} + +int +f32 (char x) +{ + return __sync_sub_and_fetch (&d, x) >= 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-2.c b/gcc/testsuite/gcc.target/i386/pr98737-2.c new file mode 100644 index 00000000000..53b674e90f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-2.c @@ -0,0 +1,111 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*subq\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*subl\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*subw\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*subb\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) <= 0; +} + +int +f2 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) <= 0; +} + +int +f3 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) <= 0; +} + +int +f4 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) <= 0; +} + +int +f5 (long x) +{ + return __atomic_sub_fetch (&a, x, __ATOMIC_RELEASE) > 0; +} + +int +f6 (int x) +{ + return __atomic_sub_fetch (&b, x, __ATOMIC_RELEASE) > 0; +} + +int +f7 (short x) +{ + return __atomic_sub_fetch (&c, x, __ATOMIC_RELEASE) > 0; +} + +int +f8 (char x) +{ + return __atomic_sub_fetch (&d, x, __ATOMIC_RELEASE) > 0; +} + +int +f9 (long x) +{ + return __sync_sub_and_fetch (&a, x) <= 0; +} + +int +f10 (int x) +{ + return __sync_sub_and_fetch (&b, x) <= 0; +} + +int +f11 (short x) +{ + return __sync_sub_and_fetch (&c, x) <= 0; +} + +int +f12 (char x) +{ + return __sync_sub_and_fetch (&d, x) <= 0; +} + +int +f13 (long x) +{ + return __sync_sub_and_fetch (&a, x) > 0; +} + +int +f14 (int x) +{ + return __sync_sub_and_fetch (&b, x) > 0; +} + +int +f15 (short x) +{ + return __sync_sub_and_fetch (&c, x) > 0; +} + +int +f16 (char x) +{ + return __sync_sub_and_fetch (&d, x) > 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-3.c b/gcc/testsuite/gcc.target/i386/pr98737-3.c new file mode 100644 index 00000000000..0e7108ac5fb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-3.c @@ -0,0 +1,207 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*addq\t" { target lp64 } } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*addl\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*addw\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*addb\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) == 0; +} + +int +f2 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) == 0; +} + +int +f3 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) == 0; +} + +int +f4 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) == 0; +} + +int +f5 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) != 0; +} + +int +f6 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) != 0; +} + +int +f7 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) != 0; +} + +int +f8 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) != 0; +} + +int +f9 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) < 0; +} + +int +f10 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) < 0; +} + +int +f11 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) < 0; +} + +int +f12 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) < 0; +} + +int +f13 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) >= 0; +} + +int +f14 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) >= 0; +} + +int +f15 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) >= 0; +} + +int +f16 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) >= 0; +} + +int +f17 (long x) +{ + return __sync_add_and_fetch (&a, x) == 0; +} + +int +f18 (int x) +{ + return __sync_add_and_fetch (&b, x) == 0; +} + +int +f19 (short x) +{ + return __sync_add_and_fetch (&c, x) == 0; +} + +int +f20 (char x) +{ + return __sync_add_and_fetch (&d, x) == 0; +} + +int +f21 (long x) +{ + return __sync_add_and_fetch (&a, x) != 0; +} + +int +f22 (int x) +{ + return __sync_add_and_fetch (&b, x) != 0; +} + +int +f23 (short x) +{ + return __sync_add_and_fetch (&c, x) != 0; +} + +int +f24 (char x) +{ + return __sync_add_and_fetch (&d, x) != 0; +} + +int +f25 (long x) +{ + return __sync_add_and_fetch (&a, x) < 0; +} + +int +f26 (int x) +{ + return __sync_add_and_fetch (&b, x) < 0; +} + +int +f27 (short x) +{ + return __sync_add_and_fetch (&c, x) < 0; +} + +int +f28 (char x) +{ + return __sync_add_and_fetch (&d, x) < 0; +} + +int +f29 (long x) +{ + return __sync_add_and_fetch (&a, x) >= 0; +} + +int +f30 (int x) +{ + return __sync_add_and_fetch (&b, x) >= 0; +} + +int +f31 (short x) +{ + return __sync_add_and_fetch (&c, x) >= 0; +} + +int +f32 (char x) +{ + return __sync_add_and_fetch (&d, x) >= 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-4.c b/gcc/testsuite/gcc.target/i386/pr98737-4.c new file mode 100644 index 00000000000..8228d527d2b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-4.c @@ -0,0 +1,111 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\rx]\*addq\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\rx]\*addl\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\rx]\*addw\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\rx]\*addb\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) <= 0; +} + +int +f2 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) <= 0; +} + +int +f3 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) <= 0; +} + +int +f4 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) <= 0; +} + +int +f5 (long x) +{ + return __atomic_add_fetch (&a, x, __ATOMIC_RELEASE) > 0; +} + +int +f6 (int x) +{ + return __atomic_add_fetch (&b, x, __ATOMIC_RELEASE) > 0; +} + +int +f7 (short x) +{ + return __atomic_add_fetch (&c, x, __ATOMIC_RELEASE) > 0; +} + +int +f8 (char x) +{ + return __atomic_add_fetch (&d, x, __ATOMIC_RELEASE) > 0; +} + +int +f9 (long x) +{ + return __sync_add_and_fetch (&a, x) <= 0; +} + +int +f10 (int x) +{ + return __sync_add_and_fetch (&b, x) <= 0; +} + +int +f11 (short x) +{ + return __sync_add_and_fetch (&c, x) <= 0; +} + +int +f12 (char x) +{ + return __sync_add_and_fetch (&d, x) <= 0; +} + +int +f13 (long x) +{ + return __sync_add_and_fetch (&a, x) > 0; +} + +int +f14 (int x) +{ + return __sync_add_and_fetch (&b, x) > 0; +} + +int +f15 (short x) +{ + return __sync_add_and_fetch (&c, x) > 0; +} + +int +f16 (char x) +{ + return __sync_add_and_fetch (&d, x) > 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-5.c b/gcc/testsuite/gcc.target/i386/pr98737-5.c new file mode 100644 index 00000000000..6d3e0638590 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-5.c @@ -0,0 +1,303 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*andq\t" { target lp64 } } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*andl\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*andw\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*andb\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) == 0; +} + +int +f2 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) == 0; +} + +int +f3 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) == 0; +} + +int +f4 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) == 0; +} + +int +f5 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) != 0; +} + +int +f6 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) != 0; +} + +int +f7 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) != 0; +} + +int +f8 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) != 0; +} + +int +f9 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) < 0; +} + +int +f10 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) < 0; +} + +int +f11 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) < 0; +} + +int +f12 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) < 0; +} + +int +f13 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) >= 0; +} + +int +f14 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) >= 0; +} + +int +f15 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) >= 0; +} + +int +f16 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) >= 0; +} + +int +f17 (long x) +{ + return __sync_and_and_fetch (&a, x) == 0; +} + +int +f18 (int x) +{ + return __sync_and_and_fetch (&b, x) == 0; +} + +int +f19 (short x) +{ + return __sync_and_and_fetch (&c, x) == 0; +} + +int +f20 (char x) +{ + return __sync_and_and_fetch (&d, x) == 0; +} + +int +f21 (long x) +{ + return __sync_and_and_fetch (&a, x) != 0; +} + +int +f22 (int x) +{ + return __sync_and_and_fetch (&b, x) != 0; +} + +int +f23 (short x) +{ + return __sync_and_and_fetch (&c, x) != 0; +} + +int +f24 (char x) +{ + return __sync_and_and_fetch (&d, x) != 0; +} + +int +f25 (long x) +{ + return __sync_and_and_fetch (&a, x) < 0; +} + +int +f26 (int x) +{ + return __sync_and_and_fetch (&b, x) < 0; +} + +int +f27 (short x) +{ + return __sync_and_and_fetch (&c, x) < 0; +} + +int +f28 (char x) +{ + return __sync_and_and_fetch (&d, x) < 0; +} + +int +f29 (long x) +{ + return __sync_and_and_fetch (&a, x) >= 0; +} + +int +f30 (int x) +{ + return __sync_and_and_fetch (&b, x) >= 0; +} + +int +f31 (short x) +{ + return __sync_and_and_fetch (&c, x) >= 0; +} + +int +f32 (char x) +{ + return __sync_and_and_fetch (&d, x) >= 0; +} + +int +f33 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) <= 0; +} + +int +f34 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) <= 0; +} + +int +f35 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) <= 0; +} + +int +f36 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) <= 0; +} + +int +f37 (long x) +{ + return __atomic_and_fetch (&a, x, __ATOMIC_RELEASE) > 0; +} + +int +f38 (int x) +{ + return __atomic_and_fetch (&b, x, __ATOMIC_RELEASE) > 0; +} + +int +f39 (short x) +{ + return __atomic_and_fetch (&c, x, __ATOMIC_RELEASE) > 0; +} + +int +f40 (char x) +{ + return __atomic_and_fetch (&d, x, __ATOMIC_RELEASE) > 0; +} + +int +f41 (long x) +{ + return __sync_and_and_fetch (&a, x) <= 0; +} + +int +f42 (int x) +{ + return __sync_and_and_fetch (&b, x) <= 0; +} + +int +f43 (short x) +{ + return __sync_and_and_fetch (&c, x) <= 0; +} + +int +f44 (char x) +{ + return __sync_and_and_fetch (&d, x) <= 0; +} + +int +f45 (long x) +{ + return __sync_and_and_fetch (&a, x) > 0; +} + +int +f46 (int x) +{ + return __sync_and_and_fetch (&b, x) > 0; +} + +int +f47 (short x) +{ + return __sync_and_and_fetch (&c, x) > 0; +} + +int +f48 (char x) +{ + return __sync_and_and_fetch (&d, x) > 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-6.c b/gcc/testsuite/gcc.target/i386/pr98737-6.c new file mode 100644 index 00000000000..6cc1c3ae17f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-6.c @@ -0,0 +1,303 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*orq\t" { target lp64 } } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*orl\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*orw\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*orb\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) == 0; +} + +int +f2 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) == 0; +} + +int +f3 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) == 0; +} + +int +f4 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) == 0; +} + +int +f5 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) != 0; +} + +int +f6 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) != 0; +} + +int +f7 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) != 0; +} + +int +f8 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) != 0; +} + +int +f9 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) < 0; +} + +int +f10 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) < 0; +} + +int +f11 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) < 0; +} + +int +f12 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) < 0; +} + +int +f13 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) >= 0; +} + +int +f14 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) >= 0; +} + +int +f15 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) >= 0; +} + +int +f16 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) >= 0; +} + +int +f17 (long x) +{ + return __sync_or_and_fetch (&a, x) == 0; +} + +int +f18 (int x) +{ + return __sync_or_and_fetch (&b, x) == 0; +} + +int +f19 (short x) +{ + return __sync_or_and_fetch (&c, x) == 0; +} + +int +f20 (char x) +{ + return __sync_or_and_fetch (&d, x) == 0; +} + +int +f21 (long x) +{ + return __sync_or_and_fetch (&a, x) != 0; +} + +int +f22 (int x) +{ + return __sync_or_and_fetch (&b, x) != 0; +} + +int +f23 (short x) +{ + return __sync_or_and_fetch (&c, x) != 0; +} + +int +f24 (char x) +{ + return __sync_or_and_fetch (&d, x) != 0; +} + +int +f25 (long x) +{ + return __sync_or_and_fetch (&a, x) < 0; +} + +int +f26 (int x) +{ + return __sync_or_and_fetch (&b, x) < 0; +} + +int +f27 (short x) +{ + return __sync_or_and_fetch (&c, x) < 0; +} + +int +f28 (char x) +{ + return __sync_or_and_fetch (&d, x) < 0; +} + +int +f29 (long x) +{ + return __sync_or_and_fetch (&a, x) >= 0; +} + +int +f30 (int x) +{ + return __sync_or_and_fetch (&b, x) >= 0; +} + +int +f31 (short x) +{ + return __sync_or_and_fetch (&c, x) >= 0; +} + +int +f32 (char x) +{ + return __sync_or_and_fetch (&d, x) >= 0; +} + +int +f33 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) <= 0; +} + +int +f34 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) <= 0; +} + +int +f35 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) <= 0; +} + +int +f36 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) <= 0; +} + +int +f37 (long x) +{ + return __atomic_or_fetch (&a, x, __ATOMIC_RELEASE) > 0; +} + +int +f38 (int x) +{ + return __atomic_or_fetch (&b, x, __ATOMIC_RELEASE) > 0; +} + +int +f39 (short x) +{ + return __atomic_or_fetch (&c, x, __ATOMIC_RELEASE) > 0; +} + +int +f40 (char x) +{ + return __atomic_or_fetch (&d, x, __ATOMIC_RELEASE) > 0; +} + +int +f41 (long x) +{ + return __sync_or_and_fetch (&a, x) <= 0; +} + +int +f42 (int x) +{ + return __sync_or_and_fetch (&b, x) <= 0; +} + +int +f43 (short x) +{ + return __sync_or_and_fetch (&c, x) <= 0; +} + +int +f44 (char x) +{ + return __sync_or_and_fetch (&d, x) <= 0; +} + +int +f45 (long x) +{ + return __sync_or_and_fetch (&a, x) > 0; +} + +int +f46 (int x) +{ + return __sync_or_and_fetch (&b, x) > 0; +} + +int +f47 (short x) +{ + return __sync_or_and_fetch (&c, x) > 0; +} + +int +f48 (char x) +{ + return __sync_or_and_fetch (&d, x) > 0; +} diff --git a/gcc/testsuite/gcc.target/i386/pr98737-7.c b/gcc/testsuite/gcc.target/i386/pr98737-7.c new file mode 100644 index 00000000000..2da23c44650 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98737-7.c @@ -0,0 +1,303 @@ +/* PR target/98737 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-additional-options "-march=i686" { target ia32 } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xorq\t" { target lp64 } } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xorl\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xorw\t" } } */ +/* { dg-final { scan-assembler "lock\[^\n\r]\*xorb\t" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*xadd" } } */ +/* { dg-final { scan-assembler-not "lock\[^\n\r]\*cmpxchg" } } */ + +long a; +int b; +short c; +char d; + +int +f1 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) == 0; +} + +int +f2 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) == 0; +} + +int +f3 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) == 0; +} + +int +f4 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) == 0; +} + +int +f5 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) != 0; +} + +int +f6 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) != 0; +} + +int +f7 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) != 0; +} + +int +f8 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) != 0; +} + +int +f9 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) < 0; +} + +int +f10 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) < 0; +} + +int +f11 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) < 0; +} + +int +f12 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) < 0; +} + +int +f13 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) >= 0; +} + +int +f14 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) >= 0; +} + +int +f15 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) >= 0; +} + +int +f16 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) >= 0; +} + +int +f17 (long x) +{ + return __sync_xor_and_fetch (&a, x) == 0; +} + +int +f18 (int x) +{ + return __sync_xor_and_fetch (&b, x) == 0; +} + +int +f19 (short x) +{ + return __sync_xor_and_fetch (&c, x) == 0; +} + +int +f20 (char x) +{ + return __sync_xor_and_fetch (&d, x) == 0; +} + +int +f21 (long x) +{ + return __sync_xor_and_fetch (&a, x) != 0; +} + +int +f22 (int x) +{ + return __sync_xor_and_fetch (&b, x) != 0; +} + +int +f23 (short x) +{ + return __sync_xor_and_fetch (&c, x) != 0; +} + +int +f24 (char x) +{ + return __sync_xor_and_fetch (&d, x) != 0; +} + +int +f25 (long x) +{ + return __sync_xor_and_fetch (&a, x) < 0; +} + +int +f26 (int x) +{ + return __sync_xor_and_fetch (&b, x) < 0; +} + +int +f27 (short x) +{ + return __sync_xor_and_fetch (&c, x) < 0; +} + +int +f28 (char x) +{ + return __sync_xor_and_fetch (&d, x) < 0; +} + +int +f29 (long x) +{ + return __sync_xor_and_fetch (&a, x) >= 0; +} + +int +f30 (int x) +{ + return __sync_xor_and_fetch (&b, x) >= 0; +} + +int +f31 (short x) +{ + return __sync_xor_and_fetch (&c, x) >= 0; +} + +int +f32 (char x) +{ + return __sync_xor_and_fetch (&d, x) >= 0; +} + +int +f33 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) <= 0; +} + +int +f34 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) <= 0; +} + +int +f35 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) <= 0; +} + +int +f36 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) <= 0; +} + +int +f37 (long x) +{ + return __atomic_xor_fetch (&a, x, __ATOMIC_RELEASE) > 0; +} + +int +f38 (int x) +{ + return __atomic_xor_fetch (&b, x, __ATOMIC_RELEASE) > 0; +} + +int +f39 (short x) +{ + return __atomic_xor_fetch (&c, x, __ATOMIC_RELEASE) > 0; +} + +int +f40 (char x) +{ + return __atomic_xor_fetch (&d, x, __ATOMIC_RELEASE) > 0; +} + +int +f41 (long x) +{ + return __sync_xor_and_fetch (&a, x) <= 0; +} + +int +f42 (int x) +{ + return __sync_xor_and_fetch (&b, x) <= 0; +} + +int +f43 (short x) +{ + return __sync_xor_and_fetch (&c, x) <= 0; +} + +int +f44 (char x) +{ + return __sync_xor_and_fetch (&d, x) <= 0; +} + +int +f45 (long x) +{ + return __sync_xor_and_fetch (&a, x) > 0; +} + +int +f46 (int x) +{ + return __sync_xor_and_fetch (&b, x) > 0; +} + +int +f47 (short x) +{ + return __sync_xor_and_fetch (&c, x) > 0; +} + +int +f48 (char x) +{ + return __sync_xor_and_fetch (&d, x) > 0; +} diff --git a/gcc/testsuite/gcc.target/i386/sse-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-covert-1.c new file mode 100644 index 00000000000..c30af694505 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-covert-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency,^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern double d; +extern int i; + +void +foo (void) +{ + d = f; + f = i; +} + +/* { dg-final { scan-assembler "cvtss2sd" } } */ +/* { dg-final { scan-assembler "cvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "cvtps2pd" } } */ +/* { dg-final { scan-assembler-not "cvtdq2ps" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c new file mode 100644 index 00000000000..b6567e60e3e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-fp-covert-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_fp_converts_dependency" } */ + +extern float f; +extern double d; + +void +foo (void) +{ + d = f; +} + +/* { dg-final { scan-assembler "cvtss2sd" } } */ +/* { dg-final { scan-assembler-not "cvtps2pd" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c b/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c new file mode 100644 index 00000000000..107f7241def --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse-int-covert-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64 -mfpmath=sse -mtune-ctrl=^sse_partial_reg_converts_dependency" } */ + +extern float f; +extern int i; + +void +foo (void) +{ + f = i; +} + +/* { dg-final { scan-assembler "cvtsi2ssl" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr94680.c b/gcc/testsuite/gcc.target/i386/sse2-pr94680.c new file mode 100644 index 00000000000..7e0ff9f6bc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-pr94680.c @@ -0,0 +1,91 @@ +/* { dg-do compile } */ +/* { dg-options "-msse2 -mno-sse4.1 -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:mov|psrldq).*%xmm[0-9]} 12 } } */ +/* { dg-final { scan-assembler-not "pxor" } } */ + +typedef float v4sf __attribute__((vector_size(16))); +typedef double v2df __attribute__ ((vector_size (16))); +typedef long long v2di __attribute__((vector_size(16))); +typedef int v4si __attribute__((vector_size(16))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef char v16qi __attribute__ ((vector_size (16))); + +v2df +foo_v2df (v2df x) +{ + return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) {0, 2}); +} + +v2df +foo_v2df_l (v2df x) +{ + return __builtin_shuffle ((v2df) { 0, 0 }, x, (v2di) {3, 1}); +} + +v2di +foo_v2di (v2di x) +{ + return __builtin_shuffle (x, (v2di) { 0, 0 }, (v2di) {0, 3}); +} + +v2di +foo_v2di_l (v2di x) +{ + return __builtin_shuffle ((v2di) { 0, 0 }, x, (v2di) {3, 0}); +} + +v4sf +foo_v4sf (v4sf x) +{ + return __builtin_shuffle (x, (v4sf) { 0, 0, 0, 0 }, (v4si) {0, 1, 4, 5}); +} + +v4sf +foo_v4sf_l (v4sf x) +{ + return __builtin_shuffle ((v4sf) { 0, 0, 0, 0 }, x, (v4si) {4, 5, 3, 1}); +} + +v4si +foo_v4si (v4si x) +{ + return __builtin_shuffle (x, (v4si) { 0, 0, 0, 0 }, (v4si) {0, 1, 6, 7}); +} + +v4si +foo_v4si_l (v4si x) +{ + return __builtin_shuffle ((v4si) { 0, 0, 0, 0 }, x, (v4si) {4, 5, 1, 2}); +} + +v8hi +foo_v8hi (v8hi x) +{ + return __builtin_shuffle (x, (v8hi) { 0, 0, 0, 0, 0, 0, 0, 0 }, + (v8hi) { 0, 1, 2, 3, 8, 12, 10, 13 }); +} + +v8hi +foo_v8hi_l (v8hi x) +{ + return __builtin_shuffle ((v8hi) { 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v8hi) { 8, 9, 10, 11, 7, 6, 5, 4 }); +} + +v16qi +foo_v16qi (v16qi x) +{ + return __builtin_shuffle (x, (v16qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, + (v16qi) {0, 1, 2, 3, 4, 5, 6, 7, + 16, 24, 18, 26, 20, 28, 22, 30 }); +} + +v16qi +foo_v16qi_l (v16qi x) +{ + return __builtin_shuffle ((v16qi) { 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0 }, x, + (v16qi) { 16, 17, 18, 19, 20, 21, 22, 23, + 15, 0, 13, 2, 11, 4, 9, 6 }); +} diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c b/gcc/testsuite/gcc.target/i386/sw-1.c index aec095eda62..a9c89fca4ec 100644 --- a/gcc/testsuite/gcc.target/i386/sw-1.c +++ b/gcc/testsuite/gcc.target/i386/sw-1.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" } */ +/* { dg-additional-options "-mno-avx" { target ia32 } } */ /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/vect8-ret.c b/gcc/testsuite/gcc.target/i386/vect8-ret.c index 2b2b81ecf7a..6ace07e6e0c 100644 --- a/gcc/testsuite/gcc.target/i386/vect8-ret.c +++ b/gcc/testsuite/gcc.target/i386/vect8-ret.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ia32 && { ! *-*-vxworks* } } } } */ -/* { dg-options "-mmmx -mvect8-ret-in-mem" } */ +/* { dg-options "-mmmx -mno-sse -mvect8-ret-in-mem" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/xorsign-avx.c b/gcc/testsuite/gcc.target/i386/xorsign-avx.c new file mode 100644 index 00000000000..f2e2054b6fb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/xorsign-avx.c @@ -0,0 +1,4 @@ +/* { dg-do run { target avx_runtime } } */ +/* { dg-options "-O2 -mavx -mfpmath=sse -ftree-vectorize" } */ + +#include "xorsign.c" diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 857e57218c1..bca0c1035fe 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -30,6 +30,7 @@ # # Assume by default that CONTENTS is C code. # Otherwise, code should contain: +# "/* Assembly" for assembly code, # "// C++" for c++, # "// D" for D, # "! Fortran" for Fortran code, @@ -57,6 +58,7 @@ proc check_compile {basename type contents args} { set options "" } switch -glob -- $contents { + "*/* Assembly*" { set src ${basename}[pid].S } "*! Fortran*" { set src ${basename}[pid].f90 } "*// C++*" { set src ${basename}[pid].cc } "*// D*" { set src ${basename}[pid].d } @@ -11235,3 +11237,37 @@ proc check_effective_target_lra { } { return 1 } +proc check_effective_target_property_1_needed { } { + return [check_no_compiler_messages property_1_needed executable { +/* Assembly code */ +#ifdef __LP64__ +# define __PROPERTY_ALIGN 3 +#else +# define __PROPERTY_ALIGN 2 +#endif + + .section ".note.gnu.property", "a" + .p2align __PROPERTY_ALIGN + .long 1f - 0f /* name length. */ + .long 4f - 1f /* data length. */ + /* NT_GNU_PROPERTY_TYPE_0. */ + .long 5 /* note type. */ +0: + .asciz "GNU" /* vendor name. */ +1: + .p2align __PROPERTY_ALIGN + /* GNU_PROPERTY_1_NEEDED. */ + .long 0xb0008000 /* pr_type. */ + .long 3f - 2f /* pr_datasz. */ +2: + /* GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS. */ + .long 1 +3: + .p2align __PROPERTY_ALIGN +4: + .text + .globl main +main: + .byte 0 + } ""] +} diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c index aad8526eb21..93fbb34bb94 100644 --- a/gcc/tree-ssa-ccp.c +++ b/gcc/tree-ssa-ccp.c @@ -151,6 +151,7 @@ along with GCC; see the file COPYING3. If not see #include "symbol-summary.h" #include "ipa-utils.h" #include "ipa-prop.h" +#include "internal-fn.h" /* Possible lattice values. */ typedef enum @@ -2862,12 +2863,97 @@ optimize_unreachable (gimple_stmt_iterator i) return ret; } +/* Convert + _1 = __atomic_fetch_or_* (ptr_6, 1, _3); + _7 = ~_1; + _5 = (_Bool) _7; + to + _1 = __atomic_fetch_or_* (ptr_6, 1, _3); + _8 = _1 & 1; + _5 = _8 == 0; + and convert + _1 = __atomic_fetch_and_* (ptr_6, ~1, _3); + _7 = ~_1; + _4 = (_Bool) _7; + to + _1 = __atomic_fetch_and_* (ptr_6, ~1, _3); + _8 = _1 & 1; + _4 = (_Bool) _8; + + USE_STMT is the gimplt statement which uses the return value of + __atomic_fetch_or_*. LHS is the return value of __atomic_fetch_or_*. + MASK is the mask passed to __atomic_fetch_or_*. + */ + +static gimple * +convert_atomic_bit_not (enum internal_fn fn, gimple *use_stmt, + tree lhs, tree mask) +{ + tree and_mask; + if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + { + /* MASK must be ~1. */ + if (!operand_equal_p (build_int_cst (TREE_TYPE (lhs), + ~HOST_WIDE_INT_1), mask, 0)) + return nullptr; + and_mask = build_int_cst (TREE_TYPE (lhs), 1); + } + else + { + /* MASK must be 1. */ + if (!operand_equal_p (build_int_cst (TREE_TYPE (lhs), 1), mask, 0)) + return nullptr; + and_mask = mask; + } + + tree use_lhs = gimple_assign_lhs (use_stmt); + + use_operand_p use_p; + gimple *use_not_stmt; + + if (!single_imm_use (use_lhs, &use_p, &use_not_stmt) + || !is_gimple_assign (use_not_stmt)) + return nullptr; + + if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (use_not_stmt))) + return nullptr; + + tree use_not_lhs = gimple_assign_lhs (use_not_stmt); + if (TREE_CODE (TREE_TYPE (use_not_lhs)) != BOOLEAN_TYPE) + return nullptr; + + gimple_stmt_iterator gsi; + gsi = gsi_for_stmt (use_stmt); + gsi_remove (&gsi, true); + tree var = make_ssa_name (TREE_TYPE (lhs)); + use_stmt = gimple_build_assign (var, BIT_AND_EXPR, lhs, and_mask); + gsi = gsi_for_stmt (use_not_stmt); + gsi_insert_before (&gsi, use_stmt, GSI_NEW_STMT); + lhs = gimple_assign_lhs (use_not_stmt); + gimple *g = gimple_build_assign (lhs, EQ_EXPR, var, + build_zero_cst (TREE_TYPE (mask))); + gsi_insert_after (&gsi, g, GSI_NEW_STMT); + gsi = gsi_for_stmt (use_not_stmt); + gsi_remove (&gsi, true); + return use_stmt; +} + +/* match.pd function to match atomic_bit_test_and pattern which + has nop_convert: + _1 = __atomic_fetch_or_4 (&v, 1, 0); + _2 = (int) _1; + _5 = _2 & 1; + */ +extern bool gimple_nop_atomic_bit_test_and_p (tree, tree *, + tree (*) (tree)); +extern bool gimple_nop_convert (tree, tree*, tree (*) (tree)); + /* Optimize mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; to - _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); + _4 = .ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); _5 = _4; If _5 is only used in _5 != 0 or _5 == 0 comparisons, 1 is passed instead of 0, and the builtin just returns a zero @@ -2879,7 +2965,7 @@ optimize_unreachable (gimple_stmt_iterator i) the second argument to the builtin needs to be one's complement of the mask instead of mask. */ -static void +static bool optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, enum internal_fn fn, bool has_model_arg, bool after) @@ -2888,7 +2974,7 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, tree lhs = gimple_call_lhs (call); use_operand_p use_p; gimple *use_stmt; - tree mask, bit; + tree mask; optab optab; if (!flag_inline_atomics @@ -2898,9 +2984,8 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs) || !single_imm_use (lhs, &use_p, &use_stmt) || !is_gimple_assign (use_stmt) - || gimple_assign_rhs_code (use_stmt) != BIT_AND_EXPR || !gimple_vdef (call)) - return; + return false; switch (fn) { @@ -2914,57 +2999,352 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, optab = atomic_bit_test_and_reset_optab; break; default: - return; + return false; } - if (optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs))) == CODE_FOR_nothing) - return; + tree bit = nullptr; mask = gimple_call_arg (call, 1); + tree_code rhs_code = gimple_assign_rhs_code (use_stmt); + if (rhs_code != BIT_AND_EXPR) + { + if (rhs_code != NOP_EXPR && rhs_code != BIT_NOT_EXPR) + return false; + + tree use_lhs = gimple_assign_lhs (use_stmt); + if (TREE_CODE (use_lhs) == SSA_NAME + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use_lhs)) + return false; + + tree use_rhs = gimple_assign_rhs1 (use_stmt); + if (lhs != use_rhs) + return false; + + if (optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs))) + == CODE_FOR_nothing) + return false; + + gimple *g; + gimple_stmt_iterator gsi; + tree var; + int ibit = -1; + + if (rhs_code == BIT_NOT_EXPR) + { + g = convert_atomic_bit_not (fn, use_stmt, lhs, mask); + if (!g) + return false; + use_stmt = g; + ibit = 0; + } + else if (TREE_CODE (TREE_TYPE (use_lhs)) == BOOLEAN_TYPE) + { + tree and_mask; + if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + { + /* MASK must be ~1. */ + if (!operand_equal_p (build_int_cst (TREE_TYPE (lhs), + ~HOST_WIDE_INT_1), + mask, 0)) + return false; + + /* Convert + _1 = __atomic_fetch_and_* (ptr_6, ~1, _3); + _4 = (_Bool) _1; + to + _1 = __atomic_fetch_and_* (ptr_6, ~1, _3); + _5 = _1 & 1; + _4 = (_Bool) _5; + */ + and_mask = build_int_cst (TREE_TYPE (lhs), 1); + } + else + { + and_mask = build_int_cst (TREE_TYPE (lhs), 1); + if (!operand_equal_p (and_mask, mask, 0)) + return false; + + /* Convert + _1 = __atomic_fetch_or_* (ptr_6, 1, _3); + _4 = (_Bool) _1; + to + _1 = __atomic_fetch_or_* (ptr_6, 1, _3); + _5 = _1 & 1; + _4 = (_Bool) _5; + */ + } + var = make_ssa_name (TREE_TYPE (use_rhs)); + replace_uses_by (use_rhs, var); + g = gimple_build_assign (var, BIT_AND_EXPR, use_rhs, + and_mask); + gsi = gsi_for_stmt (use_stmt); + gsi_insert_before (&gsi, g, GSI_NEW_STMT); + use_stmt = g; + ibit = 0; + } + else if (TYPE_PRECISION (TREE_TYPE (use_lhs)) + <= TYPE_PRECISION (TREE_TYPE (use_rhs))) + { + gimple *use_nop_stmt; + if (!single_imm_use (use_lhs, &use_p, &use_nop_stmt) + || !is_gimple_assign (use_nop_stmt)) + return false; + tree use_nop_lhs = gimple_assign_lhs (use_nop_stmt); + rhs_code = gimple_assign_rhs_code (use_nop_stmt); + if (rhs_code != BIT_AND_EXPR) + { + if (TREE_CODE (use_nop_lhs) == SSA_NAME + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use_nop_lhs)) + return false; + if (rhs_code == BIT_NOT_EXPR) + { + g = convert_atomic_bit_not (fn, use_nop_stmt, lhs, + mask); + if (!g) + return false; + /* Convert + _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); + _2 = (int) _1; + _7 = ~_2; + _5 = (_Bool) _7; + to + _1 = __atomic_fetch_or_4 (ptr_6, ~1, _3); + _8 = _1 & 1; + _5 = _8 == 0; + and convert + _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); + _2 = (int) _1; + _7 = ~_2; + _5 = (_Bool) _7; + to + _1 = __atomic_fetch_and_4 (ptr_6, 1, _3); + _8 = _1 & 1; + _5 = _8 == 0; + */ + gsi = gsi_for_stmt (use_stmt); + gsi_remove (&gsi, true); + use_stmt = g; + ibit = 0; + } + else + { + if (TREE_CODE (TREE_TYPE (use_nop_lhs)) != BOOLEAN_TYPE) + return false; + if (rhs_code != GE_EXPR && rhs_code != LT_EXPR) + return false; + tree cmp_rhs1 = gimple_assign_rhs1 (use_nop_stmt); + if (use_lhs != cmp_rhs1) + return false; + tree cmp_rhs2 = gimple_assign_rhs2 (use_nop_stmt); + if (!integer_zerop (cmp_rhs2)) + return false; + + tree and_mask; + + unsigned HOST_WIDE_INT bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (use_rhs))); + ibit = bytes * BITS_PER_UNIT - 1; + unsigned HOST_WIDE_INT highest + = HOST_WIDE_INT_1U << ibit; + + if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + { + /* Get the signed maximum of the USE_RHS type. */ + and_mask = build_int_cst (TREE_TYPE (use_rhs), + highest - 1); + if (!operand_equal_p (and_mask, mask, 0)) + return false; + + /* Convert + _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3); + _5 = (signed int) _1; + _4 = _5 < 0 or _5 >= 0; + to + _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3); + _6 = _1 & 0x80000000; + _4 = _6 != 0 or _6 == 0; + */ + and_mask = build_int_cst (TREE_TYPE (use_rhs), + highest); + } + else + { + /* Get the signed minimum of the USE_RHS type. */ + and_mask = build_int_cst (TREE_TYPE (use_rhs), + highest); + if (!operand_equal_p (and_mask, mask, 0)) + return false; + + /* Convert + _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3); + _5 = (signed int) _1; + _4 = _5 < 0 or _5 >= 0; + to + _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3); + _6 = _1 & 0x80000000; + _4 = _6 != 0 or _6 == 0; + */ + } + var = make_ssa_name (TREE_TYPE (use_rhs)); + gsi = gsi_for_stmt (use_stmt); + gsi_remove (&gsi, true); + g = gimple_build_assign (var, BIT_AND_EXPR, use_rhs, + and_mask); + gsi = gsi_for_stmt (use_nop_stmt); + gsi_insert_before (&gsi, g, GSI_NEW_STMT); + use_stmt = g; + g = gimple_build_assign (use_nop_lhs, + (rhs_code == GE_EXPR + ? EQ_EXPR : NE_EXPR), + var, + build_zero_cst (TREE_TYPE (use_rhs))); + gsi_insert_after (&gsi, g, GSI_NEW_STMT); + gsi = gsi_for_stmt (use_nop_stmt); + gsi_remove (&gsi, true); + } + } + else + { + tree match_op[3]; + gimple *g; + if (!gimple_nop_atomic_bit_test_and_p (use_nop_lhs, + &match_op[0], NULL) + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (match_op[2]) + || !single_imm_use (match_op[2], &use_p, &g) + || !is_gimple_assign (g)) + return false; + mask = match_op[0]; + if (TREE_CODE (match_op[1]) == INTEGER_CST) + { + ibit = tree_log2 (match_op[1]); + gcc_assert (ibit >= 0); + } + else + { + g = SSA_NAME_DEF_STMT (match_op[1]); + gcc_assert (is_gimple_assign (g)); + bit = gimple_assign_rhs2 (g); + } + /* Convert + _1 = __atomic_fetch_or_4 (ptr_6, mask, _3); + _2 = (int) _1; + _5 = _2 & mask; + to + _1 = __atomic_fetch_or_4 (ptr_6, mask, _3); + _6 = _1 & mask; + _5 = (int) _6; + and convert + _1 = ~mask_7; + _2 = (unsigned int) _1; + _3 = __atomic_fetch_and_4 (ptr_6, _2, 0); + _4 = (int) _3; + _5 = _4 & mask_7; + to + _1 = __atomic_fetch_and_* (ptr_6, ~mask_7, _3); + _12 = _3 & mask_7; + _5 = (int) _12; + + and Convert + _1 = __atomic_fetch_and_4 (ptr_6, ~mask, _3); + _2 = (short int) _1; + _5 = _2 & mask; + to + _1 = __atomic_fetch_and_4 (ptr_6, ~mask, _3); + _8 = _1 & mask; + _5 = (short int) _8; + */ + gimple_seq stmts = NULL; + match_op[1] = gimple_convert (&stmts, + TREE_TYPE (use_rhs), + match_op[1]); + var = gimple_build (&stmts, BIT_AND_EXPR, + TREE_TYPE (use_rhs), use_rhs, match_op[1]); + gsi = gsi_for_stmt (use_stmt); + gsi_remove (&gsi, true); + release_defs (use_stmt); + use_stmt = gimple_seq_last_stmt (stmts); + gsi = gsi_for_stmt (use_nop_stmt); + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); + gimple_assign_set_rhs_with_ops (&gsi, CONVERT_EXPR, var); + update_stmt (use_nop_stmt); + } + } + else + return false; + + if (!bit) + { + if (ibit < 0) + gcc_unreachable (); + bit = build_int_cst (TREE_TYPE (lhs), ibit); + } + } + else if (optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs))) + == CODE_FOR_nothing) + return false; + tree use_lhs = gimple_assign_lhs (use_stmt); if (!use_lhs) - return; + return false; - if (TREE_CODE (mask) == INTEGER_CST) + if (!bit) { - if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) - mask = const_unop (BIT_NOT_EXPR, TREE_TYPE (mask), mask); - mask = fold_convert (TREE_TYPE (lhs), mask); - int ibit = tree_log2 (mask); - if (ibit < 0) - return; - bit = build_int_cst (TREE_TYPE (lhs), ibit); - } - else if (TREE_CODE (mask) == SSA_NAME) - { - gimple *g = SSA_NAME_DEF_STMT (mask); - if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + if (TREE_CODE (mask) == INTEGER_CST) { - if (!is_gimple_assign (g) - || gimple_assign_rhs_code (g) != BIT_NOT_EXPR) - return; - mask = gimple_assign_rhs1 (g); - if (TREE_CODE (mask) != SSA_NAME) - return; - g = SSA_NAME_DEF_STMT (mask); + if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + mask = const_unop (BIT_NOT_EXPR, TREE_TYPE (mask), mask); + mask = fold_convert (TREE_TYPE (lhs), mask); + int ibit = tree_log2 (mask); + if (ibit < 0) + return false; + bit = build_int_cst (TREE_TYPE (lhs), ibit); } - if (!is_gimple_assign (g) - || gimple_assign_rhs_code (g) != LSHIFT_EXPR - || !integer_onep (gimple_assign_rhs1 (g))) - return; - bit = gimple_assign_rhs2 (g); - } - else - return; + else if (TREE_CODE (mask) == SSA_NAME) + { + gimple *g = SSA_NAME_DEF_STMT (mask); + tree match_op; + if (gimple_nop_convert (mask, &match_op, NULL)) + { + mask = match_op; + if (TREE_CODE (mask) != SSA_NAME) + return false; + g = SSA_NAME_DEF_STMT (mask); + } + if (!is_gimple_assign (g)) + return false; - if (gimple_assign_rhs1 (use_stmt) == lhs) - { - if (!operand_equal_p (gimple_assign_rhs2 (use_stmt), mask, 0)) - return; + if (fn == IFN_ATOMIC_BIT_TEST_AND_RESET) + { + if (gimple_assign_rhs_code (g) != BIT_NOT_EXPR) + return false; + mask = gimple_assign_rhs1 (g); + if (TREE_CODE (mask) != SSA_NAME) + return false; + g = SSA_NAME_DEF_STMT (mask); + } + + rhs_code = gimple_assign_rhs_code (g); + if (rhs_code != LSHIFT_EXPR + || !integer_onep (gimple_assign_rhs1 (g))) + return false; + bit = gimple_assign_rhs2 (g); + } + else + return false; + + tree cmp_mask; + if (gimple_assign_rhs1 (use_stmt) == lhs) + cmp_mask = gimple_assign_rhs2 (use_stmt); + else + cmp_mask = gimple_assign_rhs1 (use_stmt); + + tree match_op; + if (gimple_nop_convert (cmp_mask, &match_op, NULL)) + cmp_mask = match_op; + + if (!operand_equal_p (cmp_mask, mask, 0)) + return false; } - else if (gimple_assign_rhs2 (use_stmt) != lhs - || !operand_equal_p (gimple_assign_rhs1 (use_stmt), mask, 0)) - return; bool use_bool = true; bool has_debug_uses = false; @@ -2988,6 +3368,8 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, case COND_EXPR: op1 = gimple_assign_rhs1 (g); code = TREE_CODE (op1); + if (TREE_CODE_CLASS (code) != tcc_comparison) + break; op0 = TREE_OPERAND (op1, 0); op1 = TREE_OPERAND (op1, 1); break; @@ -3053,18 +3435,20 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, of the specified bit after the atomic operation (makes only sense for xor, otherwise the bit content is compile time known), we need to invert the bit. */ - g = gimple_build_assign (make_ssa_name (TREE_TYPE (lhs)), - BIT_XOR_EXPR, new_lhs, - use_bool ? build_int_cst (TREE_TYPE (lhs), 1) - : mask); - new_lhs = gimple_assign_lhs (g); + tree mask_convert = mask; + gimple_seq stmts = NULL; + if (!use_bool) + mask_convert = gimple_convert (&stmts, TREE_TYPE (lhs), mask); + new_lhs = gimple_build (&stmts, BIT_XOR_EXPR, TREE_TYPE (lhs), new_lhs, + use_bool ? build_int_cst (TREE_TYPE (lhs), 1) + : mask_convert); if (throws) { - gsi_insert_on_edge_immediate (e, g); - gsi = gsi_for_stmt (g); + gsi_insert_seq_on_edge_immediate (e, stmts); + gsi = gsi_for_stmt (gimple_seq_last (stmts)); } else - gsi_insert_after (&gsi, g, GSI_NEW_STMT); + gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT); } if (use_bool && has_debug_uses) { @@ -3105,6 +3489,196 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip, release_defs (use_stmt); gsi_remove (gsip, true); release_ssa_name (lhs); + return true; +} + +/* Optimize + _4 = __atomic_add_fetch_* (ptr_6, arg_2, _3); + _5 = _4 == 0; + to + _4 = .ATOMIC_ADD_FETCH_CMP_0 (EQ_EXPR, ptr_6, arg_2, _3); + _5 = _4; + Similarly for __sync_add_and_fetch_* (without the ", _3" part + in there). */ + +static bool +optimize_atomic_op_fetch_cmp_0 (gimple_stmt_iterator *gsip, + enum internal_fn fn, bool has_model_arg) +{ + gimple *call = gsi_stmt (*gsip); + tree lhs = gimple_call_lhs (call); + use_operand_p use_p; + gimple *use_stmt; + + if (!flag_inline_atomics + || optimize_debug + || !gimple_call_builtin_p (call, BUILT_IN_NORMAL) + || !lhs + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs) + || !single_imm_use (lhs, &use_p, &use_stmt) + || !gimple_vdef (call)) + return false; + + optab optab; + switch (fn) + { + case IFN_ATOMIC_ADD_FETCH_CMP_0: + optab = atomic_add_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_SUB_FETCH_CMP_0: + optab = atomic_sub_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_AND_FETCH_CMP_0: + optab = atomic_and_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_OR_FETCH_CMP_0: + optab = atomic_or_fetch_cmp_0_optab; + break; + case IFN_ATOMIC_XOR_FETCH_CMP_0: + optab = atomic_xor_fetch_cmp_0_optab; + break; + default: + return false; + } + + if (optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs))) + == CODE_FOR_nothing) + return false; + + tree use_lhs = lhs; + if (gimple_assign_cast_p (use_stmt)) + { + use_lhs = gimple_assign_lhs (use_stmt); + if (!tree_nop_conversion_p (TREE_TYPE (use_lhs), TREE_TYPE (lhs)) + || (!INTEGRAL_TYPE_P (TREE_TYPE (use_lhs)) + && !POINTER_TYPE_P (TREE_TYPE (use_lhs))) + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (use_lhs) + || !single_imm_use (use_lhs, &use_p, &use_stmt)) + return false; + } + enum tree_code code = ERROR_MARK; + tree op0 = NULL_TREE, op1 = NULL_TREE; + if (is_gimple_assign (use_stmt)) + switch (gimple_assign_rhs_code (use_stmt)) + { + case COND_EXPR: + op1 = gimple_assign_rhs1 (use_stmt); + code = TREE_CODE (op1); + if (TREE_CODE_CLASS (code) == tcc_comparison) + { + op0 = TREE_OPERAND (op1, 0); + op1 = TREE_OPERAND (op1, 1); + } + break; + default: + code = gimple_assign_rhs_code (use_stmt); + if (TREE_CODE_CLASS (code) == tcc_comparison) + { + op0 = gimple_assign_rhs1 (use_stmt); + op1 = gimple_assign_rhs2 (use_stmt); + } + break; + } + else if (gimple_code (use_stmt) == GIMPLE_COND) + { + code = gimple_cond_code (use_stmt); + op0 = gimple_cond_lhs (use_stmt); + op1 = gimple_cond_rhs (use_stmt); + } + + switch (code) + { + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + if (!INTEGRAL_TYPE_P (TREE_TYPE (use_lhs)) + || TREE_CODE (TREE_TYPE (use_lhs)) == BOOLEAN_TYPE + || TYPE_UNSIGNED (TREE_TYPE (use_lhs))) + return false; + /* FALLTHRU */ + case EQ_EXPR: + case NE_EXPR: + if (op0 == use_lhs && integer_zerop (op1)) + break; + return false; + default: + return false; + } + + int encoded; + switch (code) + { + /* Use special encoding of the operation. We want to also + encode the mode in the first argument and for neither EQ_EXPR + etc. nor EQ etc. we can rely it will fit into QImode. */ + case EQ_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_EQ; break; + case NE_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_NE; break; + case LT_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_LT; break; + case LE_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_LE; break; + case GT_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_GT; break; + case GE_EXPR: encoded = ATOMIC_OP_FETCH_CMP_0_GE; break; + default: gcc_unreachable (); + } + + tree new_lhs = make_ssa_name (boolean_type_node); + gimple *g; + tree flag = build_int_cst (TREE_TYPE (lhs), encoded); + if (has_model_arg) + g = gimple_build_call_internal (fn, 4, flag, + gimple_call_arg (call, 0), + gimple_call_arg (call, 1), + gimple_call_arg (call, 2)); + else + g = gimple_build_call_internal (fn, 3, flag, + gimple_call_arg (call, 0), + gimple_call_arg (call, 1)); + gimple_call_set_lhs (g, new_lhs); + gimple_set_location (g, gimple_location (call)); + gimple_move_vops (g, call); + bool throws = stmt_can_throw_internal (cfun, call); + gimple_call_set_nothrow (as_a (g), + gimple_call_nothrow_p (as_a (call))); + gimple_stmt_iterator gsi = *gsip; + gsi_insert_after (&gsi, g, GSI_SAME_STMT); + if (throws) + maybe_clean_or_replace_eh_stmt (call, g); + if (is_gimple_assign (use_stmt)) + switch (gimple_assign_rhs_code (use_stmt)) + { + case COND_EXPR: + gimple_assign_set_rhs1 (use_stmt, new_lhs); + break; + default: + gsi = gsi_for_stmt (use_stmt); + if (tree ulhs = gimple_assign_lhs (use_stmt)) + if (useless_type_conversion_p (TREE_TYPE (ulhs), + boolean_type_node)) + { + gimple_assign_set_rhs_with_ops (&gsi, SSA_NAME, new_lhs); + break; + } + gimple_assign_set_rhs_with_ops (&gsi, NOP_EXPR, new_lhs); + break; + } + else if (gimple_code (use_stmt) == GIMPLE_COND) + { + gcond *use_cond = as_a (use_stmt); + gimple_cond_set_code (use_cond, NE_EXPR); + gimple_cond_set_lhs (use_cond, new_lhs); + gimple_cond_set_rhs (use_cond, boolean_false_node); + } + + update_stmt (use_stmt); + if (use_lhs != lhs) + { + gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (use_lhs)); + gsi_remove (&gsi, true); + release_ssa_name (use_lhs); + } + gsi_remove (gsip, true); + release_ssa_name (lhs); + return true; } /* Optimize @@ -3333,6 +3907,44 @@ pass_fold_builtins::execute (function *fun) cfg_changed = true; break; + case BUILT_IN_ATOMIC_ADD_FETCH_1: + case BUILT_IN_ATOMIC_ADD_FETCH_2: + case BUILT_IN_ATOMIC_ADD_FETCH_4: + case BUILT_IN_ATOMIC_ADD_FETCH_8: + case BUILT_IN_ATOMIC_ADD_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_ADD_FETCH_CMP_0, + true); + break; + case BUILT_IN_SYNC_ADD_AND_FETCH_1: + case BUILT_IN_SYNC_ADD_AND_FETCH_2: + case BUILT_IN_SYNC_ADD_AND_FETCH_4: + case BUILT_IN_SYNC_ADD_AND_FETCH_8: + case BUILT_IN_SYNC_ADD_AND_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_ADD_FETCH_CMP_0, + false); + break; + + case BUILT_IN_ATOMIC_SUB_FETCH_1: + case BUILT_IN_ATOMIC_SUB_FETCH_2: + case BUILT_IN_ATOMIC_SUB_FETCH_4: + case BUILT_IN_ATOMIC_SUB_FETCH_8: + case BUILT_IN_ATOMIC_SUB_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_SUB_FETCH_CMP_0, + true); + break; + case BUILT_IN_SYNC_SUB_AND_FETCH_1: + case BUILT_IN_SYNC_SUB_AND_FETCH_2: + case BUILT_IN_SYNC_SUB_AND_FETCH_4: + case BUILT_IN_SYNC_SUB_AND_FETCH_8: + case BUILT_IN_SYNC_SUB_AND_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_SUB_FETCH_CMP_0, + false); + break; + case BUILT_IN_ATOMIC_FETCH_OR_1: case BUILT_IN_ATOMIC_FETCH_OR_2: case BUILT_IN_ATOMIC_FETCH_OR_4: @@ -3374,16 +3986,24 @@ pass_fold_builtins::execute (function *fun) case BUILT_IN_ATOMIC_XOR_FETCH_4: case BUILT_IN_ATOMIC_XOR_FETCH_8: case BUILT_IN_ATOMIC_XOR_FETCH_16: - optimize_atomic_bit_test_and - (&i, IFN_ATOMIC_BIT_TEST_AND_COMPLEMENT, true, true); + if (optimize_atomic_bit_test_and + (&i, IFN_ATOMIC_BIT_TEST_AND_COMPLEMENT, true, true)) + break; + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_XOR_FETCH_CMP_0, + true); break; case BUILT_IN_SYNC_XOR_AND_FETCH_1: case BUILT_IN_SYNC_XOR_AND_FETCH_2: case BUILT_IN_SYNC_XOR_AND_FETCH_4: case BUILT_IN_SYNC_XOR_AND_FETCH_8: case BUILT_IN_SYNC_XOR_AND_FETCH_16: - optimize_atomic_bit_test_and - (&i, IFN_ATOMIC_BIT_TEST_AND_COMPLEMENT, false, true); + if (optimize_atomic_bit_test_and + (&i, IFN_ATOMIC_BIT_TEST_AND_COMPLEMENT, false, true)) + break; + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_XOR_FETCH_CMP_0, + false); break; case BUILT_IN_ATOMIC_FETCH_AND_1: @@ -3405,6 +4025,44 @@ pass_fold_builtins::execute (function *fun) false, false); break; + case BUILT_IN_ATOMIC_AND_FETCH_1: + case BUILT_IN_ATOMIC_AND_FETCH_2: + case BUILT_IN_ATOMIC_AND_FETCH_4: + case BUILT_IN_ATOMIC_AND_FETCH_8: + case BUILT_IN_ATOMIC_AND_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_AND_FETCH_CMP_0, + true); + break; + case BUILT_IN_SYNC_AND_AND_FETCH_1: + case BUILT_IN_SYNC_AND_AND_FETCH_2: + case BUILT_IN_SYNC_AND_AND_FETCH_4: + case BUILT_IN_SYNC_AND_AND_FETCH_8: + case BUILT_IN_SYNC_AND_AND_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_AND_FETCH_CMP_0, + false); + break; + + case BUILT_IN_ATOMIC_OR_FETCH_1: + case BUILT_IN_ATOMIC_OR_FETCH_2: + case BUILT_IN_ATOMIC_OR_FETCH_4: + case BUILT_IN_ATOMIC_OR_FETCH_8: + case BUILT_IN_ATOMIC_OR_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_OR_FETCH_CMP_0, + true); + break; + case BUILT_IN_SYNC_OR_AND_FETCH_1: + case BUILT_IN_SYNC_OR_AND_FETCH_2: + case BUILT_IN_SYNC_OR_AND_FETCH_4: + case BUILT_IN_SYNC_OR_AND_FETCH_8: + case BUILT_IN_SYNC_OR_AND_FETCH_16: + optimize_atomic_op_fetch_cmp_0 (&i, + IFN_ATOMIC_OR_FETCH_CMP_0, + false); + break; + case BUILT_IN_MEMCPY: if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL) && TREE_CODE (gimple_call_arg (stmt, 0)) == ADDR_EXPR diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c index 3bb4a07d58c..5d98ac381e4 100644 --- a/gcc/tree-ssa-forwprop.c +++ b/gcc/tree-ssa-forwprop.c @@ -1240,12 +1240,19 @@ constant_pointer_difference (tree p1, tree p2) memset (p + 4, ' ', 3); into memcpy (p, "abcd ", 7); - call if the latter can be stored by pieces during expansion. */ + call if the latter can be stored by pieces during expansion. + + Also canonicalize __atomic_fetch_op (p, x, y) op x + to __atomic_op_fetch (p, x, y) or + __atomic_op_fetch (p, x, y) iop x + to __atomic_fetch_op (p, x, y) when possible (also __sync). */ static bool simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2) { gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p); + enum built_in_function other_atomic = END_BUILTINS; + enum tree_code atomic_op = ERROR_MARK; tree vuse = gimple_vuse (stmt2); if (vuse == NULL) return false; @@ -1447,6 +1454,310 @@ simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2) } } break; + + #define CASE_ATOMIC(NAME, OTHER, OP) \ + case BUILT_IN_##NAME##_1: \ + case BUILT_IN_##NAME##_2: \ + case BUILT_IN_##NAME##_4: \ + case BUILT_IN_##NAME##_8: \ + case BUILT_IN_##NAME##_16: \ + atomic_op = OP; \ + other_atomic \ + = (enum built_in_function) (BUILT_IN_##OTHER##_1 \ + + (DECL_FUNCTION_CODE (callee2) \ + - BUILT_IN_##NAME##_1)); \ + goto handle_atomic_fetch_op; + + CASE_ATOMIC (ATOMIC_FETCH_ADD, ATOMIC_ADD_FETCH, PLUS_EXPR) + CASE_ATOMIC (ATOMIC_FETCH_SUB, ATOMIC_SUB_FETCH, MINUS_EXPR) + CASE_ATOMIC (ATOMIC_FETCH_AND, ATOMIC_AND_FETCH, BIT_AND_EXPR) + CASE_ATOMIC (ATOMIC_FETCH_XOR, ATOMIC_XOR_FETCH, BIT_XOR_EXPR) + CASE_ATOMIC (ATOMIC_FETCH_OR, ATOMIC_OR_FETCH, BIT_IOR_EXPR) + + CASE_ATOMIC (SYNC_FETCH_AND_ADD, SYNC_ADD_AND_FETCH, PLUS_EXPR) + CASE_ATOMIC (SYNC_FETCH_AND_SUB, SYNC_SUB_AND_FETCH, MINUS_EXPR) + CASE_ATOMIC (SYNC_FETCH_AND_AND, SYNC_AND_AND_FETCH, BIT_AND_EXPR) + CASE_ATOMIC (SYNC_FETCH_AND_XOR, SYNC_XOR_AND_FETCH, BIT_XOR_EXPR) + CASE_ATOMIC (SYNC_FETCH_AND_OR, SYNC_OR_AND_FETCH, BIT_IOR_EXPR) + + CASE_ATOMIC (ATOMIC_ADD_FETCH, ATOMIC_FETCH_ADD, MINUS_EXPR) + CASE_ATOMIC (ATOMIC_SUB_FETCH, ATOMIC_FETCH_SUB, PLUS_EXPR) + CASE_ATOMIC (ATOMIC_XOR_FETCH, ATOMIC_FETCH_XOR, BIT_XOR_EXPR) + + CASE_ATOMIC (SYNC_ADD_AND_FETCH, SYNC_FETCH_AND_ADD, MINUS_EXPR) + CASE_ATOMIC (SYNC_SUB_AND_FETCH, SYNC_FETCH_AND_SUB, PLUS_EXPR) + CASE_ATOMIC (SYNC_XOR_AND_FETCH, SYNC_FETCH_AND_XOR, BIT_XOR_EXPR) + +#undef CASE_ATOMIC + + handle_atomic_fetch_op: + if (gimple_call_num_args (stmt2) >= 2 && gimple_call_lhs (stmt2)) + { + tree lhs2 = gimple_call_lhs (stmt2), lhsc = lhs2; + tree arg = gimple_call_arg (stmt2, 1); + gimple *use_stmt, *cast_stmt = NULL; + use_operand_p use_p; + tree ndecl = builtin_decl_explicit (other_atomic); + + if (ndecl == NULL_TREE || !single_imm_use (lhs2, &use_p, &use_stmt)) + break; + + if (gimple_assign_cast_p (use_stmt)) + { + cast_stmt = use_stmt; + lhsc = gimple_assign_lhs (cast_stmt); + if (lhsc == NULL_TREE + || !INTEGRAL_TYPE_P (TREE_TYPE (lhsc)) + || (TYPE_PRECISION (TREE_TYPE (lhsc)) + != TYPE_PRECISION (TREE_TYPE (lhs2))) + || !single_imm_use (lhsc, &use_p, &use_stmt)) + { + use_stmt = cast_stmt; + cast_stmt = NULL; + lhsc = lhs2; + } + } + + bool ok = false; + tree oarg = NULL_TREE; + enum tree_code ccode = ERROR_MARK; + tree crhs1 = NULL_TREE, crhs2 = NULL_TREE; + if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == atomic_op) + { + if (gimple_assign_rhs1 (use_stmt) == lhsc) + oarg = gimple_assign_rhs2 (use_stmt); + else if (atomic_op != MINUS_EXPR) + oarg = gimple_assign_rhs1 (use_stmt); + } + else if (atomic_op == MINUS_EXPR + && is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR + && TREE_CODE (arg) == INTEGER_CST + && (TREE_CODE (gimple_assign_rhs2 (use_stmt)) + == INTEGER_CST)) + { + tree a = fold_convert (TREE_TYPE (lhs2), arg); + tree o = fold_convert (TREE_TYPE (lhs2), + gimple_assign_rhs2 (use_stmt)); + if (wi::to_wide (a) == wi::neg (wi::to_wide (o))) + ok = true; + } + else if (atomic_op == BIT_AND_EXPR || atomic_op == BIT_IOR_EXPR) + ; + else if (gimple_code (use_stmt) == GIMPLE_COND) + { + ccode = gimple_cond_code (use_stmt); + crhs1 = gimple_cond_lhs (use_stmt); + crhs2 = gimple_cond_rhs (use_stmt); + } + else if (is_gimple_assign (use_stmt)) + { + if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) + { + ccode = gimple_assign_rhs_code (use_stmt); + crhs1 = gimple_assign_rhs1 (use_stmt); + crhs2 = gimple_assign_rhs2 (use_stmt); + } + else if (gimple_assign_rhs_code (use_stmt) == COND_EXPR) + { + tree cond = gimple_assign_rhs1 (use_stmt); + if (COMPARISON_CLASS_P (cond)) + { + ccode = TREE_CODE (cond); + crhs1 = TREE_OPERAND (cond, 0); + crhs2 = TREE_OPERAND (cond, 1); + } + } + } + if (ccode == EQ_EXPR || ccode == NE_EXPR) + { + /* Deal with x - y == 0 or x ^ y == 0 + being optimized into x == y and x + cst == 0 + into x == -cst. */ + tree o = NULL_TREE; + if (crhs1 == lhsc) + o = crhs2; + else if (crhs2 == lhsc) + o = crhs1; + if (o && atomic_op != PLUS_EXPR) + oarg = o; + else if (o + && TREE_CODE (o) == INTEGER_CST + && TREE_CODE (arg) == INTEGER_CST) + { + tree a = fold_convert (TREE_TYPE (lhs2), arg); + o = fold_convert (TREE_TYPE (lhs2), o); + if (wi::to_wide (a) == wi::neg (wi::to_wide (o))) + ok = true; + } + } + if (oarg && !ok) + { + if (operand_equal_p (arg, oarg, 0)) + ok = true; + else if (TREE_CODE (arg) == SSA_NAME + && TREE_CODE (oarg) == SSA_NAME) + { + tree oarg2 = oarg; + if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (oarg))) + { + gimple *g = SSA_NAME_DEF_STMT (oarg); + oarg2 = gimple_assign_rhs1 (g); + if (TREE_CODE (oarg2) != SSA_NAME + || !INTEGRAL_TYPE_P (TREE_TYPE (oarg2)) + || (TYPE_PRECISION (TREE_TYPE (oarg2)) + != TYPE_PRECISION (TREE_TYPE (oarg)))) + oarg2 = oarg; + } + if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (arg))) + { + gimple *g = SSA_NAME_DEF_STMT (arg); + tree rhs1 = gimple_assign_rhs1 (g); + /* Handle e.g. + x.0_1 = (long unsigned int) x_4(D); + _2 = __atomic_fetch_add_8 (&vlong, x.0_1, 0); + _3 = (long int) _2; + _7 = x_4(D) + _3; */ + if (rhs1 == oarg || rhs1 == oarg2) + ok = true; + /* Handle e.g. + x.18_1 = (short unsigned int) x_5(D); + _2 = (int) x.18_1; + _3 = __atomic_fetch_xor_2 (&vshort, _2, 0); + _4 = (short int) _3; + _8 = x_5(D) ^ _4; + This happens only for char/short. */ + else if (TREE_CODE (rhs1) == SSA_NAME + && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)) + && (TYPE_PRECISION (TREE_TYPE (rhs1)) + == TYPE_PRECISION (TREE_TYPE (lhs2)))) + { + g = SSA_NAME_DEF_STMT (rhs1); + if (gimple_assign_cast_p (g) + && (gimple_assign_rhs1 (g) == oarg + || gimple_assign_rhs1 (g) == oarg2)) + ok = true; + } + } + if (!ok && arg == oarg2) + /* Handle e.g. + _1 = __sync_fetch_and_add_4 (&v, x_5(D)); + _2 = (int) _1; + x.0_3 = (int) x_5(D); + _7 = _2 + x.0_3; */ + ok = true; + } + } + + if (ok) + { + tree new_lhs = make_ssa_name (TREE_TYPE (lhs2)); + gimple_call_set_lhs (stmt2, new_lhs); + gimple_call_set_fndecl (stmt2, ndecl); + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); + if (ccode == ERROR_MARK) + gimple_assign_set_rhs_with_ops (&gsi, cast_stmt + ? NOP_EXPR : SSA_NAME, + new_lhs); + else + { + crhs1 = new_lhs; + crhs2 = build_zero_cst (TREE_TYPE (lhs2)); + if (gimple_code (use_stmt) == GIMPLE_COND) + { + gcond *cond_stmt = as_a (use_stmt); + gimple_cond_set_lhs (cond_stmt, crhs1); + gimple_cond_set_rhs (cond_stmt, crhs2); + } + else if (gimple_assign_rhs_class (use_stmt) + == GIMPLE_BINARY_RHS) + { + gimple_assign_set_rhs1 (use_stmt, crhs1); + gimple_assign_set_rhs2 (use_stmt, crhs2); + } + else + { + gcc_checking_assert (gimple_assign_rhs_code (use_stmt) + == COND_EXPR); + tree cond = build2 (ccode, boolean_type_node, + crhs1, crhs2); + gimple_assign_set_rhs1 (use_stmt, cond); + } + } + update_stmt (use_stmt); + if (atomic_op != BIT_AND_EXPR + && atomic_op != BIT_IOR_EXPR + && !stmt_ends_bb_p (stmt2)) + { + /* For the benefit of debug stmts, emit stmt(s) to set + lhs2 to the value it had from the new builtin. + E.g. if it was previously: + lhs2 = __atomic_fetch_add_8 (ptr, arg, 0); + emit: + new_lhs = __atomic_add_fetch_8 (ptr, arg, 0); + lhs2 = new_lhs - arg; + We also keep cast_stmt if any in the IL for + the same reasons. + These stmts will be DCEd later and proper debug info + will be emitted. + This is only possible for reversible operations + (+/-/^) and without -fnon-call-exceptions. */ + gsi = gsi_for_stmt (stmt2); + tree type = TREE_TYPE (lhs2); + if (TREE_CODE (arg) == INTEGER_CST) + arg = fold_convert (type, arg); + else if (!useless_type_conversion_p (type, TREE_TYPE (arg))) + { + tree narg = make_ssa_name (type); + gimple *g = gimple_build_assign (narg, NOP_EXPR, arg); + gsi_insert_after (&gsi, g, GSI_NEW_STMT); + arg = narg; + } + enum tree_code rcode; + switch (atomic_op) + { + case PLUS_EXPR: rcode = MINUS_EXPR; break; + case MINUS_EXPR: rcode = PLUS_EXPR; break; + case BIT_XOR_EXPR: rcode = atomic_op; break; + default: gcc_unreachable (); + } + gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg); + gsi_insert_after (&gsi, g, GSI_NEW_STMT); + update_stmt (stmt2); + } + else + { + /* For e.g. + lhs2 = __atomic_fetch_or_8 (ptr, arg, 0); + after we change it to + new_lhs = __atomic_or_fetch_8 (ptr, arg, 0); + there is no way to find out the lhs2 value (i.e. + what the atomic memory contained before the operation), + values of some bits are lost. We have checked earlier + that we don't have any non-debug users except for what + we are already changing, so we need to reset the + debug stmts and remove the cast_stmt if any. */ + imm_use_iterator iter; + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2) + if (use_stmt != cast_stmt) + { + gcc_assert (is_gimple_debug (use_stmt)); + gimple_debug_bind_reset_value (use_stmt); + update_stmt (use_stmt); + } + if (cast_stmt) + { + gsi = gsi_for_stmt (cast_stmt); + gsi_remove (&gsi, true); + } + update_stmt (stmt2); + release_ssa_name (lhs2); + } + } + } + break; + default: break; } diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index 0828ff97569..46dacbea78a 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -281,16 +281,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f, if (!ret_type) ret_type = type; vec_alloc (v, (nunits + delta - 1) / delta); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) { tree result = f (gsi, inner_type, a, b, index, part_width, code, ret_type); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); } - return build_constructor (ret_type, v); + if (constant_p) + return build_vector_from_ctor (ret_type, v); + else + return build_constructor (ret_type, v); } /* Expand a vector operation to scalars with the freedom to use @@ -1059,6 +1065,7 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) int nunits = nunits_for_known_piecewise_op (type); vec_alloc (v, nunits); + bool constant_p = true; for (int i = 0; i < nunits; i++) { tree aa, result; @@ -1083,6 +1090,8 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) else aa = tree_vec_extract (gsi, cond_type, a, width, index); result = gimplify_build3 (gsi, COND_EXPR, inner_type, aa, bb, cc); + if (!CONSTANT_CLASS_P (result)) + constant_p = false; constructor_elt ce = {NULL_TREE, result}; v->quick_push (ce); index = int_const_binop (PLUS_EXPR, index, width); @@ -1092,7 +1101,10 @@ expand_vector_condition (gimple_stmt_iterator *gsi, bitmap dce_ssa_names) comp_index = int_const_binop (PLUS_EXPR, comp_index, comp_width); } - constr = build_constructor (type, v); + if (constant_p) + constr = build_vector_from_ctor (type, v); + else + constr = build_constructor (type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); @@ -1532,6 +1544,7 @@ lower_vec_perm (gimple_stmt_iterator *gsi) "vector shuffling operation will be expanded piecewise"); vec_alloc (v, elements); + bool constant_p = true; for (i = 0; i < elements; i++) { si = size_int (i); @@ -1593,10 +1606,15 @@ lower_vec_perm (gimple_stmt_iterator *gsi) t = v0_val; } + if (!CONSTANT_CLASS_P (t)) + constant_p = false; CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, t); } - constr = build_constructor (vect_type, v); + if (constant_p) + constr = build_vector_from_ctor (vect_type, v); + else + constr = build_constructor (vect_type, v); gimple_assign_set_rhs_from_tree (gsi, constr); update_stmt (gsi_stmt (*gsi)); } @@ -1968,6 +1986,7 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) } vec_alloc (v, (nunits + delta - 1) / delta * 2); + bool constant_p = true; for (i = 0; i < nunits; i += delta, index = int_const_binop (PLUS_EXPR, index, part_width)) @@ -1978,12 +1997,19 @@ expand_vector_conversion (gimple_stmt_iterator *gsi) index); tree result = gimplify_build1 (gsi, code1, cretd_type, a); constructor_elt ce = { NULL_TREE, result }; + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); ce.value = gimplify_build1 (gsi, code2, cretd_type, a); + if (!CONSTANT_CLASS_P (ce.value)) + constant_p = false; v->quick_push (ce); } - new_rhs = build_constructor (ret_type, v); + if (constant_p) + new_rhs = build_vector_from_ctor (ret_type, v); + else + new_rhs = build_constructor (ret_type, v); g = gimple_build_assign (lhs, new_rhs); gsi_replace (gsi, g, false); return; diff --git a/gcc/varasm.c b/gcc/varasm.c index a7ef9b8d9fe..153e6c68772 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -7451,7 +7451,8 @@ default_binds_local_p_3 (const_tree exp, bool shlib, bool weak_dominate, FIXME: We can resolve the weakref case more curefuly by looking at the weakref alias. */ if (lookup_attribute ("weakref", DECL_ATTRIBUTES (exp)) - || (TREE_CODE (exp) == FUNCTION_DECL + || (!targetm.ifunc_ref_local_ok () + && TREE_CODE (exp) == FUNCTION_DECL && cgraph_node::get (exp) && cgraph_node::get (exp)->ifunc_resolver)) return false; diff --git a/libffi/configure.host b/libffi/configure.host index 786b32c5bb0..7248acb7458 100644 --- a/libffi/configure.host +++ b/libffi/configure.host @@ -95,20 +95,13 @@ case "${host}" in i?86-*-* | x86_64-*-* | amd64-*) TARGETDIR=x86 if test $ac_cv_sizeof_size_t = 4; then - case "$host" in - *-gnux32) - TARGET=X86_64 - ;; - *) - echo 'int foo (void) { return __x86_64__; }' > conftest.c - if $CC $CFLAGS -Werror -S conftest.c -o conftest.s > /dev/null 2>&1; then - TARGET=X86_64; - else - TARGET=X86; - fi - rm -f conftest.* - ;; - esac + echo 'int foo (void) { return __x86_64__; }' > conftest.c + if $CC $CFLAGS -Werror -S conftest.c -o conftest.s > /dev/null 2>&1; then + TARGET=X86_64; + else + TARGET=X86; + fi + rm -f conftest.* else TARGET=X86_64; fi