Regular Streaming SIMD Extensions intrinsics work on 4 32-bit single precision values. On Itanium(TM)-based systemsbasic operations like add or compare will require two SIMD instructions. Both can be executed in the same cycle so the throughput is one basic Streaming SIMD Extensions operation per cycle or 4 32-bit single precision operations per cycle.
A = Expected to give significant performance gain over non-intrinsic-based code equivalent.
B = Non-intrinsic-based source code would be better; the intrinsic's implementation may map directly to native instructions but they offer no significant performance gain.
C = Requires contorted implementation for particular microarchitecture. Will result in very poor performance if used.
Intrinsic Name |
Alternate Name |
Across All IA |
MMX(TM Technology |
Streaming SIMD Extensions |
Streaming SIMD Extensions 2 |
Itanium(TM Architecture |
---|---|---|---|---|---|---|
_mm_add_ss | N/A | N/A | B | B | B | |
_mm_add_ps | N/A | N/A | A | A | A | |
_mm_sub_ss | N/A | N/A | B | B | B | |
_mm_sub_ps | N/A | N/A | A | A | A | |
_mm_mul_ss | N/A | N/A | B | B | B | |
_mm_mul_ps | N/A | N/A | A | A | A | |
_mm_div_ss | N/A | N/A | B | B | B | |
_mm_div_ps | N/A | N/A | A | A | A | |
_mm_sqrt_ss | N/A | N/A | B | B | B | |
_mm_sqrt_ps | N/A | N/A | A | A | A | |
_mm_rcp_ss | N/A | N/A | B | B | B | |
_mm_rcp_ps | N/A | N/A | A | A | A | |
_mm_rsqrt_ss | N/A | N/A | B | B | B | |
_mm_rsqrt_ps | N/A | N/A | A | A | A | |
_mm_min_ss | N/A | N/A | B | B | B | |
_mm_min_ps | N/A | N/A | A | A | A | |
_mm_max_ss | N/A | N/A | B | B | B | |
_mm_max_ps | N/A | N/A | A | A | A | |
_mm_and_ps | N/A | N/A | A | A | A | |
_mm_andnot_ps | N/A | N/A | A | A | A | |
_mm_or_ps | N/A | N/A | A | A | A | |
_mm_xor_ps | N/A | N/A | A | A | A | |
_mm_cmpeq_ss | N/A | N/A | B | B | B | |
_mm_cmpeq_ps | N/A | N/A | A | A | A | |
_mm_cmplt_ss | N/A | N/A | B | B | B | |
_mm_cmplt_ps | N/A | N/A | A | A | A | |
_mm_cmple_ss | N/A | N/A | B | B | B | |
_mm_cmple_ps | N/A | N/A | A | A | A | |
_mm_cmpgt_ss | N/A | N/A | B | B | B | |
_mm_cmpgt_ps | N/A | N/A | A | A | A | |
_mm_cmpge_ss | N/A | N/A | B | B | B | |
_mm_cmpge_ps | N/A | N/A | A | A | A | |
_mm_cmpneq_ss | N/A | N/A | B | B | B | |
_mm_cmpneq_ps | N/A | N/A | A | A | A | |
_mm_cmpnlt_ss | N/A | N/A | B | B | B | |
_mm_cmpnlt_ps | N/A | N/A | A | A | A | |
_mm_cmpnle_ss | N/A | N/A | B | B | B | |
_mm_cmpnle_ps | N/A | N/A | A | A | A | |
_mm_cmpngt_ss | N/A | N/A | B | B | B | |
_mm_cmpngt_ps | N/A | N/A | A | A | A | |
_mm_cmpnge_ss | N/A | N/A | B | B | B | |
_mm_cmpnge_ps | N/A | N/A | A | A | A | |
_mm_cmpord_ss | N/A | N/A | B | B | B | |
_mm_cmpord_ps | N/A | N/A | A | A | A | |
_mm_cmpunord_ss | N/A | N/A | B | B | B | |
_mm_cmpunord_ps | N/A | N/A | A | A | A | |
_mm_comieq_ss | N/A | N/A | B | B | B | |
_mm_comilt_ss | N/A | N/A | B | B | B | |
_mm_comile_ss | N/A | N/A | B | B | B | |
_mm_comigt_ss | N/A | N/A | B | B | B | |
_mm_comige_ss | N/A | N/A | B | B | B | |
_mm_comineq_ss | N/A | N/A | B | B | B | |
_mm_ucomieq_ss | N/A | N/A | B | B | B | |
_mm_ucomilt_ss | N/A | N/A | B | B | B | |
_mm_ucomile_ss | N/A | N/A | B | B | B | |
_mm_ucomigt_ss | N/A | N/A | B | B | B | |
_mm_ucomige_ss | N/A | N/A | B | B | B | |
_mm_ucomineq_ss | N/A | N/A | B | B | B | |
_mm_cvt_ss2si | _mm_cvtss_si32 | N/A | N/A | A | A | B |
_mm_cvt_ps2pi | _mm_cvtps_pi32 | N/A | N/A | A | A | A |
_mm_cvtt_ss2si | _mm_cvttss_si32 | N/A | N/A | A | A | B |
_mm_cvtt_ps2pi | _mm_cvttps_pi32 | N/A | N/A | A | A | A |
_mm_cvt_si2ss | _mm_cvtsi32_ss | N/A | N/A | A | A | B |
_mm_cvt_pi2ps | _mm_cvtpi32_ps | N/A | N/A | A | A | C |
_mm_cvtpi16_ps | N/A | N/A | A | A | C | |
_mm_cvtpu16_ps | N/A | N/A | A | A | C | |
_mm_cvtpi8_ps | N/A | N/A | A | A | C | |
_mm_cvtpu8_ps | N/A | N/A | A | A | C | |
_mm_cvtpi32x2_ps | N/A | N/A | A | A | C | |
_mm_cvtps_pi16 | N/A | N/A | A | A | C | |
_mm_cvtps_pi8 | N/A | N/A | A | A | C | |
_mm_move_ss | N/A | N/A | A | A | A | |
_mm_shuffle_ps | N/A | N/A | A | A | A | |
_mm_unpackhi_ps | N/A | N/A | A | A | A | |
_mm_unpacklo_ps | N/A | N/A | A | A | A | |
_mm_movehl_ps | N/A | N/A | A | A | A | |
_mm_movelh_ps | N/A | N/A | A | A | A | |
_mm_movemask_ps | N/A | N/A | A | A | C | |
_mm_getcsr | N/A | N/A | A | A | A | |
_mm_setcsr | N/A | N/A | A | A | A | |
_mm_loadh_pi | N/A | N/A | A | A | A | |
_mm_loadl_pi | N/A | N/A | A | A | A | |
_mm_load_ss | N/A | N/A | A | A | B | |
_mm_load_ps1 | _mm_load1_ps | N/A | N/A | A | A | A |
_mm_load_ps | N/A | N/A | A | A | A | |
_mm_loadu_ps | N/A | N/A | A | A | A | |
_mm_loadr_ps | N/A | N/A | A | A | A | |
_mm_storeh_pi | N/A | N/A | A | A | A | |
_mm_storel_pi | N/A | N/A | A | A | A | |
_mm_store_ss | N/A | N/A | A | A | A | |
_mm_store_ps | N/A | N/A | A | A | A | |
_mm_store_ps1 | _mm_store1_ps | N/A | N/A | A | A | A |
_mm_storeu_ps | N/A | N/A | A | A | A | |
_mm_storer_ps | N/A | N/A | A | A | A | |
_mm_set_ss | N/A | N/A | A | A | A | |
_mm_set_ps1 | _mm_set1_ps | N/A | N/A | A | A | A |
_mm_set_ps | N/A | N/A | A | A | A | |
_mm_setr_ps | N/A | N/A | A | A | A | |
_mm_setzero_ps | N/A | N/A | A | A | A | |
_mm_prefetch | N/A | N/A | A | A | A | |
_mm_stream_pi | N/A | N/A | A | A | A | |
_mm_stream_ps | N/A | N/A | A | A | A | |
_mm_sfence | N/A | N/A | A | A | A | |
_m_pextrw | _mm_extract_pi16 | N/A | N/A | A | A | A |
_m_pinsrw | _mm_insert_pi16 | N/A | N/A | A | A | A |
_m_pmaxsw | _mm_max_pi16 | N/A | N/A | A | A | A |
_m_pmaxub | _mm_max_pu8 | N/A | N/A | A | A | A |
_m_pminsw | _mm_min_pi16 | N/A | N/A | A | A | A |
_m_pminub | _mm_min_pu8 | N/A | N/A | A | A | A |
_m_pmovmskb | _mm_movemask_pi8 | N/A | N/A | A | A | C |
_m_pmulhuw | _mm_mulhi_pu16 | N/A | N/A | A | A | A |
_m_pshufw | _mm_shuffle_pi16 | N/A | N/A | A | A | A |
_m_maskmovq | _mm_maskmove_si64 | N/A | N/A | A | A | C |
_m_pavgb | _mm_avg_pu8 | N/A | N/A | A | A | A |
_m_pavgw | _mm_avg_pu16 | N/A | N/A | A | A | A |
_m_psadbw | _mm_sad_pu8 | N/A | N/A | A | A | A |