Intel® OPA Performance

Determining Performance

In high performance computing (HPC), message passing interface (MPI) benchmarks are used to demonstrate the performance capability of the cluster network. While application performance is the most important result, benchmarking generally starts with standard micro-benchmarks used to determine best-case MPI latency, bandwidth and message rate. An HPC cluster is only as fast as the individual servers can communicate with each other. Intel® Omni-Path Architecture (Intel® OPA) has been designed to meet the requirements of clusters from small to large scale. This includes ground-breaking quality of service (QoS) features meant to keep latency low, bandwidth high, and message rate high, even at scale.1

Latency

This figure compares latency with 8 byte messages for Intel® OPA relative to Enhanced Data Rate* (EDR) InfiniBand* (IB), as measured with the Ohio State University (OSU) OMB osu_latency benchmark for both Open MPI and Intel® MPI. Intel® OPA latency has been measured up to 11% lower than EDR IB*. This latency includes a switch hop for both Intel OPA and EDR IB*.

Bandwidth

This figure compares bandwidth with 1 MB messages for Intel® OPA relative to EDR as measured with the OSU OMB osu_bw benchmark for both Open MPI and Intel® MPI. Both Intel® OPA and EDR are capable of delivering nearly full wire rate of 100 Gbps.

Message Rate

This figure compares 8 byte message rate for Intel® OPA relative to EDR as measured with the OSU OMB osu_mbw_mr benchmark for both Open MPI and Intel® MPI. 32 MPI rank pairs are used in the measurement. Intel® OPA has been measured up to 64% higher than EDR without message coalescing at the MPI level. This is a true hardware message rate test without message coalescing in software.

Natural Order Ring (NOR) Latency

Natural order ring (NOR) latency is measured from the b_eff benchmark in the HPC Challenge* benchmark suite. These measurements demonstrate the ability of the fabric to sustain low latency as the cluster scales an HPC application. Intel® OPA has lower latency at 16 fully subscribed nodes using 32 MPI ranks per node.

Random Order Ring (ROR) Latency

Random order ring (ROR) latency is measured from the b_eff benchmark in the HPC Challenge* benchmark suite. These measurements demonstrate the ability of the fabric to sustain low latency as the cluster scales an HPC application. Intel® OPA has lower latency at 16 fully subscribed nodes using 32 MPI ranks per node.

Configuration Details (Internal Testing – Single Switch)

   
Test platform Intel® Xeon® Processor E5-2697A v4 dual-socket servers (16 cores, 40 MB cache, 2.6 GHz, 9.6 GT/s Intel QuickPath Interconnect, 145 W TDP) with 64 GB DDR4 memory @ 2133 MHz. Intel® Turbo Boost Technology and Intel® Hyper-Thread Technology enabled
Benchmarks

Ohio State Micro Benchmarks* v. 5.0

HPCC* 1.4.3

Operating system Red Hat Enterprise Linux* 7.2
Intel® MPI Intel® MPI 5.1.3
Open MPI* Open MPI 1.10.0
Intel® OPA hardware and settings shm:tmi fabrics, I_MPI_TMI_DRECV=1, Intel Corporation Device 24f0 – Series 100 HFI ASIC (B0 silicon). OPA Switch: Series 100 Edge Switch – 48 port (B0 silicon).  IOU Non-posted Prefetch disabled in BIOS. Snoop hold-off timer = 9
EDR hardware and settings shm:dapl fabric. -genv I_MPI_DAPL_EAGER_MESSAGE_AGGREGATION off (Intel® MPI Only). Mellanox* EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox SB7700 - 36 Port EDR InfiniBand switch. MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0), Best of default, MXM_TLS=self,rc, and -mca pml yalla tunings
Test detail

a. osu_latency 8B message

b. osu_bw 1 MB message

c. osu_mbw_mr, 8 B message (uni-directional), 32 MPI rank pairs. Maximum rank pair communication time used instead of average time, introduced into Ohio State Micro Benchmarks as of v3.9 (2/28/13). EDR using shm:ofa fabric, since this returned better message rates than shm:dapl fabric with I_MPI_DAPL_EAGER_MESSAGE_COALESCING disabled

d. HPCC1.4.3

Application Performance

   
Application Performance - Intel® MPI Learn more ›

제품 및 성능 정보

1

벤치마크 결과는 “스펙터”와 “멜트다운”으로 알려진 공격에 대응하기 위한 목적의 최신 소프트웨어 패치 및 펌웨어 업데이트를 적용하기 이전에 얻어진 것입니다. 이러한 업데이트를 적용할 경우 이와 같은 결과가 귀하의 장치 또는 시스템에는 해당하지 않을 수 있습니다.

성능 테스트에 사용된 소프트웨어 및 워크로드는 인텔® 마이크로프로세서에만 적합하도록 최적화되었을 수 있습니다. SYSmark* 및 MobileMark*와 같은 성능 테스트는 특정 컴퓨터 시스템, 구성 요소, 소프트웨어, 운영 및 기능을 사용해서 수행합니다. 해당 요소에 변경이 생기면 결과가 달라질 수 있습니다. 구매를 고려 중인 제품을 제대로 평가하려면 다른 제품과 결합하여 사용할 경우 해당 제품의 성능을 포함한 기타 정보 및 성능 테스트를 참고해야 합니다. 자세한 내용은 http://www.intel.co.kr/benchmarks를 참조하십시오.