Artificial Intelligence with 2nd Gen Intel® Xeon® Scalable Processor

The 2nd Gen Intel® Xeon® Scalable processor provides scalable performance for the widest variety of datacenter workloads – including deep learning. The new 2nd Gen Intel® Xeon® Scalable processor platform offers built-in Return on Investment (ROI), potent performance and production-ready support for AI deployments.

In our smart and connected world, machines are increasingly learning to sense, reason, act, and adapt in the real world. Artificial Intelligence (AI) is the next big wave of computing, and Intel uniquely has the experience to fuel the AI computing era. AI will let us accelerate solutions to large-scale problems that would otherwise take months, years, or decades to resolve.

AI is expected to unleash new scientific discoveries, automate monotonous tasks and extend our human senses and capabilities. Today, machine learning (ML) and deep learning (DL) are two underlying approaches to AI, as are reasoning-based systems.

Deep learning is the most rapidly emerging branch of machine learning, in many cases supplanting classic ML, relying on massive labeled data sets to iteratively “train” many-layered neural networks inspired by the human brain. Trained neural networks are used to “infer” the meaning of new data, with increased speed and accuracy for processes like image search, speech recognition, natural language processing, and other complex tasks.

The 2nd Generation Intel® Xeon® Scalable processors take AI performance to the next level with Intel® Deep Learning Boost (Intel® DL Boost), a new set of embedded processor technologies designed to accelerate AI deep learning use cases such as image recognition, object detection, speech recognition, language translation and others. It extends Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with a new Vector Neural Network Instruction (VNNI) that significantly increases deep learning inference performance over previous generations. With 2nd Gen Intel® Xeon® Platinum 8280 processors and Intel® Deep Learning Boost (Intel® DL Boost), we project that image recognition with Intel optimized Caffe ResNet-50 can perform up to 14x1 faster than on prior generation Intel® Xeon® Scalable processors (at launch, July 2017).

Performance data as of April 2, 2019.1 2 3 4

ResNet-50 Performance with Intel® Optimization for Caffe*

Designed for high performance computing, advanced artificial intelligence and analytics, and high density infrastructures Intel® Xeon® Platinum 9200 processors deliver breakthrough levels of performance. Using Intel® Deep Learning Boost (Intel® DL Boost) combined with Intel optimized Caffe, new breakthrough levels of performance can be achieved. Here we show the throughput on an image classification topology – ResNet-50 on the 2nd Generation Intel® Xeon Scalable processor.

Learn more ›

Configuration Details

Max Inference throughput at <7ms

Intel® Xeon® Platinum 8280 processor: Tested by Intel as of 3/04/2019. 2S Intel® Xeon® Platinum 8280(28 cores per socket) processor, HT ON, turbo ON, Total Memory 384 GB (12 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0348.011820191451, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=10, synthetic Data:3x224x224, 2 instance/2 socket, Datatype: INT8; latency: 6.16 ms.

Intel® Xeon® Platinum 9242 processor: Tested by Intel as of 3/04/2019 2S Intel® Xeon® Platinum 9242(48 cores per socket) processor, HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0403.022020190327, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS= 8, synthetic Data:3x224x224, 4 instance/2 socket, Datatype: INT8; latency: 6.90 ms.

Intel® Xeon® Platinum 9282 processor: Tested by Intel as of 3/04/2019. DL Inference: Platform: Dragon rock 2S Intel® Xeon® Platinum 9282(56 cores per socket) processor, HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=12, synthetic Data:3x224x224, 4 instance/2 socket, Datatype: INT8; latency: 6.91 ms.

Max Inference throughput

Intel® Xeon® Platinum 8280 processor: Tested by Intel as of 3/04/2019. 2S Intel® Xeon® Platinum 8280(28 cores per socket) processor, HT ON, turbo ON, Total Memory 384 GB (12 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0348.011820191451, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=10, syntheticData:3x224x224, 14 instance/2 socket, Datatype: INT8.

Intel® Xeon® Platinum 9242 processor: Tested by Intel as of 3/04/2019 2S Intel® Xeon® Platinum 9242(48 cores per socket) processor, HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0403.022020190327, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=8, synthetic Data:3x224x224, 24 instance/2 socket, Datatype: INT8.

Intel® Xeon® Platinum 9282 processor: Tested by Intel as of 3/04/2019. DL Inference: Platform: Dragon rock 2S Intel® Xeon® Platinum 9282(56 cores per socket) processor, HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Intel® Deep Learning Framework: Intel® Optimization for Caffe*
version: https://github.com/intel/caffe Commit id: 362a3b3, ICC 2019.2.187 for build, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=8, synthetic Data:3x224x224, 28 instance/2 socket, Datatype: INT8.

BKMs for running multi-stream configurations on Xeon: https://www.intel.ai/wp-content/uploads/sites/69/TensorFlow_Best_Practices_Intel_Xeon_AI-HPC_v1.1_Q119.pdf

ResNet-50 Inference Throughput Performance

Inference generally happens instantaneously at the edge or in the data center, such as when a new photo is uploaded for inspection. Inference output can be fed into a number of different usages including – a dashboard for visualization or a decision tree for automatic decision making. Here we show inference throughput on an image database using multiple popular deep learning frameworks such as Caffe, TensorFlow, Pytorch and MxNet with the ResNet-50 topology.

View larger image

Learn more

Configuration Details

3.0x performance boost with MxNet on ResNet-50: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet https://github.com/apache/incubator-mxnet/ -b master da5242b732de39ad47d8ecee582f261ba5935fa9, Compiler: gcc 4.8.5,MKL DNN version: v0.17, ResNet50: https://github.com/apache/incubator-MXNet/blob/master/python/MXNet/gluon/model_zoo/vision/resnet.py, BS=64, synthetic data, 2 instance/2 socket, 0.12% accuracy loss ,Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet https://github.com/apache/incubator-mxnet/ -b master da5242b732de39ad47d8ecee582f261ba5935fa9, Compiler: gcc 4.8.5,MKL DNN version: v0.17, ResNet50: https://github.com/apache/incubator-MXNet/blob/master/python/MXNet/gluon/model_zoo/vision/resnet.py, BS=64, synthetic data, 2 instance/2 socket, Datatype: FP32

3.7x performance boost with Pytorch ResNet-50: Tested by Intel as of 2/25/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x4000013), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, 3X INTEL SSDPE2KX040T7 SSD 3.7TB , Intel® Deep Learning Framework: Pytorch with ONNX/Caffe2 backend: https://github.com/pytorch/pytorch.git (commit: 4ac91b2d64eeea5ca21083831db5950dc08441d6)and Pull Request link: https://github.com/pytorch/pytorch/pull/17464 (submitted for upstreaming), gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, MKL DNN version: v0.17.3 (commit hash: 0c3cb94999919d33e4875177fdef662bd9413dd4), ResNet-50: https://github.com/intel/optimized-models/tree/master/pytorch, BS=512, synthetic data, 2 instance/2 socket, 0.6% accuracy loss; Datatype: INT8 vs Tested by Intel as of 2/25/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2666 MHz), BIOS: SE5C620.86B.00.01.0015.110720180833 (ucode: 0x200004d), CentOS 7.5, 3.10.0-693.el7.x86_64, Intel® SSD DC S4500 SERIES SSDSC2KB480G7 2.5’’ 6Gb/s SATA SSD 480G, Intel® Deep Learning Framework: https://github.com/pytorch/pytorch.git (commit:4ac91b2d64eeea5ca21083831db5950dc08441d6)and Pull Request link: https://github.com/pytorch/pytorch/pull/17464 (submitted for upstreaming), gcc (Red Hat 5.3.1-6) 5.3.1 20160406, MKL DNN version: v0.17.3 (commit hash: 0c3cb94999919d33e4875177fdef662bd9413dd4), ResNet-50: https://github.com/intel/optimized-models/tree/master/pytorch, BS=512, synthetic data, 2 instance/2 socket, Datatype: FP32

3.9x performance boost with TensorFlow ResNet-50:Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl (https://github.com/tensorflow/tensorflow.git commit: 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + Pull Request PR 25765, PR submitted for upstreaming) Compiler: gcc 6.3.0,MKL DNN version: v0.17, ResNet50: https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/resnet50, (commit: 87261e70a902513f934413f009364c4f2eed6642) BS=128, synthetic data, 2 instance/2 socket, 0.45% accuracy loss Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + PR25765, PR submitted for upstreaming) Compiler: gcc 6.3.0,MKL DNN version: v0.17, ResNet50: https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/resnet50 , (commit: 87261e70a902513f934413f009364c4f2eed6642) BS=128, synthetic data, 2 instance/2 socket, Datatype: FP32

4.0x performance boost with Intel® Optimization for Caffe* ResNet-50: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x4000013), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, 3X INTEL SSDPE2KX040T7 SSD 3.7TB , Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, syntheticData, 2 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 2/21/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2666 MHz), BIOS: SE5C620.86B.00.01.0015.110720180833 (ucode: 0x200004d), CentOS 7.5, 3.10.0-693.el7.x86_64, Intel® SSD DC S4500 SERIES SSDSC2KB480G7 2.5’’ 6Gb/s SATA SSD 480G, , Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/benchmark/resnet_50/deploy.prototxt, BS=64, synthetic Data, 2 instance/2 socket, Datatype: FP32

Inference Throughput Performance

The 2nd Gen Intel® Xeon® Scalable processors are built specifically to run high-performance AI and IoT workloads on the same hardware as other existing workloads. Intel® Deep Learning Boost (Intel® DL Boost) can benefit many inference applications ranging from recommendation systems, Object detection and image recognition and classification. Here we show inference throughput for image classification, object detection and a recommendation system. Multiple frameworks are used including TensorFlow*, Caffe2 and MxNet* and multiple topologies such as ResNet-101, Inception v3, RETINANET*, SSD-VGG16 and Wide and Deep.

View larger image

Learn more

Configuration Details

4.0x performance boost with TensorFlow ResNet-101: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl (https://github.com/tensorflow/tensorflow.git commit: 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + Pull Request PR 25765, PR submitted for upstreaming), Compiler: gcc 6.3.0,MKL DNN version: v0.17, ResNet 101 : https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/resnet101 commit: 87261e70a902513f934413f009364c4f2eed6642 , BS=128, synthetic data, 2 instance/2 socket, 0.58% accuracy loss Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl (https://github.com/tensorflow/tensorflow.git commit: 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + Pull Request PR 25765, PR submitted for upstreaming) Compiler: gcc 6.3.0,MKL DNN version: v0.17, ResNet 101 : https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/resnet101 commit: 87261e70a902513f934413f009364c4f2eed6642 , BS=128, synthetic data, 2 instance/2 socket, Datatype: FP32

3.8x performance boost with MxNet ResNet 101: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet https://github.com/apache/incubator-mxnet.git commit: da5242b732de39ad47d8ecee582f261ba5935fa9 , Compiler: gcc 4.8.5,MKL DNN version: v0.17, ResNet 101: https://github.com/apache/incubator-MXNet/blob/master/python/MXNet/gluon/model_zoo/vision/resnet.py ,BS= 64, Synthetic data, 2 instance/2 socket, 0.56% accuracy loss, Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet: https://github.com/apache/incubator-mxnet.git commit: da5242b732de39ad47d8ecee582f261ba5935fa9, Compiler: gcc 4.8.5,MKL DNN version: v0.17, ResNet 101: https://github.com/apache/incubator-MXNet/blob/master/python/MXNet/gluon/model_zoo/vision/resnet.py ,BS= 64, synthetic Data, 2 instance/2 socket, Datatype:FP32

3.1x performance boost with TensorFlow Inception v3: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl (https://github.com/tensorflow/tensorflow.git commit: 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + Pull Request PR 25765, PR submitted for upstreaming), Compiler: gcc 6.3.0,MKL DNN version: v0.17, Inception v3 : https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/inceptionv3 commit: 87261e70a902513f934413f009364c4f2eed6642 , BS=128, synthetic data, 2 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://hub.docker.com/r/intelaipg/intel-optimized-tensorflow:PR25765-devel-mkl (https://github.com/tensorflow/tensorflow.git commit: 6f2eaa3b99c241a9c09c345e1029513bc4cd470a + Pull Request PR 25765, PR submitted for upstreaming) Compiler: gcc 6.3.0,MKL DNN version: v0.17, Inception v3 : https://github.com/IntelAI/models/tree/master/models/image_recognition/tensorflow/inceptionv3 commit: 87261e70a902513f934413f009364c4f2eed6642 , BS=128, synthetic data, 2 instance/2 socket, Datatype: FP32

2.6x performance boost with PyTorch RetinaNet: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x4000013), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, 3X INTEL SSDPE2KX040T7 SSD 3.7TB , Intel® Deep Learning Framework: Pytorch with ONNX/Caffe2 backend: https://github.com/pytorch/pytorch.git (commit: 4ac91b2d64eeea5ca21083831db5950dc08441d6)and Pull Request link: https://github.com/pytorch/pytorch/pull/17464 (submitted for upstreaming), gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, MKL DNN version: v0.17.3 (commit hash: 0c3cb94999919d33e4875177fdef662bd9413dd4), RetinaNet: https://github.com/intel/Detectron/blob/master/configs/12_2017_baselines/retinanet_R-101-FPN_1x.yaml BS=1, synthetic data, 2 instance/2 socket, 0.003mAP accuracy loss, Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2666 MHz), BIOS: SE5C620.86B.00.01.0015.110720180833 (ucode: 0x200004d), CentOS 7.5, 3.10.0-693.el7.x86_64, Intel® SSD DC S4500 SERIES SSDSC2KB480G7 2.5’’ 6Gb/s SATA SSD 480G, Intel® Deep Learning Framework: Pytorch with ONNX/Caffe2 backend: https://github.com/pytorch/pytorch.git (commit: 4ac91b2d64eeea5ca21083831db5950dc08441d6)and Pull Request link: https://github.com/pytorch/pytorch/pull/17464 (submitted for upstreaming), gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, MKL DNN version: v0.17.3 (commit hash: 0c3cb94999919d33e4875177fdef662bd9413dd4), RetinaNet: https://github.com/intel/Detectron/blob/master/configs/12_2017_baselines/retinanet_R-101-FPN_1x.yaml, BS=1, synthetic data, 2 instance/2 socket, Datatype: FP32

2.5x performance boost with MxNet SSD-VGG16: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet https://github.com/apache/incubator-mxnet/ -b master da5242b732de39ad47d8ecee582f261ba5935fa9, Compiler: gcc 4.8.5,MKL DNN version: v0.17, SSD-VGG16: https://github.com/apache/incubator-MXNet/blob/master/example/ssd/symbol/vgg16_reduced.py ,BS= 224, Synthetic data, 2 instance/2 socket, 0.0001 mAP accuracy loss , Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: MxNet https://github.com/apache/incubator-mxnet/ -b master da5242b732de39ad47d8ecee582f261ba5935fa9, Compiler: gcc 4.8.5,MKL DNN version: v0.17, SSD-VGG16: https://github.com/apache/incubator-MXNet/blob/master/example/ssd/symbol/vgg16_reduced.py ,BS= 224, synthetic Data, 2 instance/2 socket, Datatype:FP32

2.2x performance boost with Intel® Optimized Caffe on SSD-Mobilenet v1: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x4000013), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda Intel® SSDSC2BA80 SSD 745.2GB, 3X INTEL SSDPE2KX040T7 SSD 3.7TB , Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/ssd_mobilenet_int8.prototxt, BS=64, synthetic Data, 2 instance/2 socket, 0.0096 mAP accuracy loss, Datatype: INT8 vs Tested by Intel as of 2/21/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2666 MHz), BIOS: SE5C620.86B.00.01.0015.110720180833 (ucode: 0x200004d), CentOS 7.5, 3.10.0-693.el7.x86_64, Intel® SSD DC S4500 SERIES SSDSC2KB480G7 2.5’’ 6Gb/s SATA SSD 480G, , Intel® Deep Learning Framework: Intel® Optimization for Caffe* version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/ssd_mobilenet_int8.prototxt, BS=64, synthetic Data, 2 instance/2 socket, Datatype: FP32

2.1x performance boost with TensorFlow on Wide & Deep: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013),CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://github.com/tensorflow/tensorflow.git 3262818d9d8f9f630f04df23033032d39a7a413 + Pull Request PR26169 + Pull Request PR26261 + Pull Request PR26271 , PR submitted for upstreaming, Compiler:gcc 6.3.1,MKL DNN version: v0.17, Wide & Deep: https://github.com/IntelAI/models/tree/master/benchmarks/recommendation/tensorflow/wide_deep_large_ds commit: a044cb3e7d2b082aebae2edbe6435e57a2cc1f8f ,BS=512, Criteo Display Advertisement Challenge, 2 instance/2 socket, 0.007% accuracy loss, Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757, CentOS 7.6, 4.19.5-1.el7.elrepo.x86_64, Intel® Deep Learning Framework: https://github.com/tensorflow/tensorflow.git 3262818d9d8f9f630f04df23033032d39a7a413 + Pull Request PR26169 + Pull Request PR26261 + Pull Request PR26271 , PR submitted for upstreaming, Compiler:gcc 6.3.1,MKL DNN version: v0.17, Wide & Deep:https://github.com/IntelAI/models/tree/master/benchmarks/recommendation/tensorflow/wide_deep_large_ds a044cb3e7d2b082aebae2edbe6435e57a2cc1f8f, BS= 512, Criteo Display Advertisement Challenge, 2 instance/2 socket,Datatype:FP32

2.1x performance boost with MXNet Wide & Deep: Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8280L processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0348.011820191451 (ucode:0x5000017), CentOS 7.6, Kernel 4.19.5-1.el7.elrepo.x86_64, SSD 1x INTEL SSDSC2KG96 960GB, Intel® Deep Learning Framework: MXNet https://github.com/apache/incubator-mxnet.git commit f1de8e51999ce3acaa95538d21a91fe43a0286ec applying https://github.com/intel/optimized-models/blob/v1.0.2/mxnet/wide_deep_criteo/patch.diff, Compiler: gcc 6.3.1, MKL DNN version: commit: 08bd90cca77683dd5d1c98068cea8b92ed05784, Wide & Deep: https://github.com/intel/optimized-models/tree/v1.0.2/mxnet/wide_deep_criteo commit: c3e7cbde4209c3657ecb6c9a142f71c3672654a5, Dataset: Criteo Display Advertisement Challenge, Batch Size=1024, 2 instance/2 socket, 0.23% accuracy loss, Datatype: INT8 vs Tested by Intel as of 3/1/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.0D.01.0286.121520181757 (ucode:0x2000057), CentOS 7.6, Kernel 4.19.5-1.el7.elrepo.x86_64, SSD 1x INTEL SSDSC2KG96 960GB, Intel® Deep Learning Framework: MXNet https://github.com/apache/incubator-mxnet.git commit f1de8e51999ce3acaa95538d21a91fe43a0286ec applying https://github.com/intel/optimized-models/blob/v1.0.2/mxnet/wide_deep_criteo/patch.diff, Compiler: gcc 6.3.1, MKL DNN version: commit: 08bd90cca77683dd5d1c98068cea8b92ed05784, Wide & Deep: https://github.com/intel/optimized-models/tree/v1.0.2/mxnet/wide_deep_criteo commit: c3e7cbde4209c3657ecb6c9a142f71c3672654a5, Dataset: Criteo Display Advertisement Challenge, Batch Size=1024, 2 instance/2 socket, Datatype:FP32

OpenVINO™ Toolkit5 Inference Throughput Performance

AI at the edge is opening up new possibilities in every industry, from predicting machine failures to personalizing retail. With the OpenVINO™ toolkit, businesses can take advantage of near real-time insights to help make better decisions, faster. The OpenVINO™ toolkit allows your business to implement computer vision and deep learning solutions quickly and effectively across multiple applications.

View larger image

Learn more

Configuration Details

2.4x performance boost with OpenVINO™ toolkit on SqueezeNet v1.1: Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013), Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5.01, SqueezeNet v1.1: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml, BS=64, Imagenet, 1 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605, Linux-4.15.0-29-generic-x86_64-with-Ubuntu-18.04-bionic, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5 (DLDTK Version:1.0.19154), SqueezeNet v1.1: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml ,Imagenet images , 1 instance/2 socket, Datatype: FP32 (BS=16)

3.1x performance boost with OpenVino™ toolkit on MobileNet v1: Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013), Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5.01, MobileNet v1: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml, BS=64, Imagenet, 1 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605, Linux-4.15.0-29-generic-x86_64-with-Ubuntu-18.04-bionic, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5 (DLDTK Version:1.0.19154), MobileNet v1: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml, Imagenet, 1 instance/2 socket, Datatype: FP32 (BS=16)

3.2x performance boost with OpenVINO™ toolkit on Inception v4 :Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013), Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5.01, Inception v4: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml, BS=128, Imagenet, 1 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605, Linux-4.15.0-29-generic-x86_64-with-Ubuntu-18.04-bionic, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5 (DLDTK Version:1.0.19154), Inception v4: https://github.com/opencv/open_model_zoo/blob/master/model_downloader/list_topologies.yml Imagenet, 1 instance/2 socket, Datatype: FP32 (BS=16)

3.9x performance boost with OpenVINO™ toolkit ResNet-50: Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8280 processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode:0x4000013), Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5 (DLDTK Version:1.0.19154 , AIXPRT CP (Community Preview) benchmark (https://www.principledtechnologies.com/benchmarkxprt/aixprt/) BS=64, Imagenet images, 1 instance/2 socket, Datatype: INT8 vs Tested by Intel as of 1/30/2019. 2 socket Intel® Xeon® Platinum 8180 processor, 28 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2633 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605, Linux-4.15.0-29-generic-x86_64-with-Ubuntu-18.04-bionic, Compiler: gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, Deep Learning Deployment Toolkit (DLDT): OpenVINO™ toolkit R5 (DLDTK Version:1.0.19154), AIXPRT CP (Community Preview) benchmark (https://www.principledtechnologies.com/benchmarkxprt/aixprt/) BS=64, Imagenet images, 1 instance/2 socket, Datatype: FP32

제품 및 성능 정보

1

인텔® 딥 러닝 부스트(인텔® DL 부스트)를 통해 인텔® 제온® Platinum 9282 프로세서에서 최대 30배의 추론 처리량 개선: 2019년 2월 26일 인텔에 의해 실시된 테스트. 플랫폼: Dragon Rock 2 소켓 인텔® 제온® Platinum 9282 프로세서(소켓당 56코어), HT 사용, 터보 사용, 총 메모리 768GB(24슬롯/32GB/2933MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, CentOS 7 커널 3.10.0-957.5.1.el7.x86_64, 딥 러닝 프레임워크: Caffe용 인텔® 최적화* 버전: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN 버전: v0.17(커밋 해시: 830a10059a018cd2634d94195140cf2d8790a75a), 모델: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, 무 데이터 계층 syntheticData: 3x224x224, 56인스턴스/2소켓, 데이터 형식: INT8 vs. 인텔에서 2017년 7월 11일에 테스트: 2S 인텔® 제온® Platinum 8180 CPU @ 2.50GHz(28코어), HT 사용 안 함, 터보 사용 안 함, intel_pstate 드라이버를 통해 scaling governor를 "performance"로 설정, 384GB DDR4-2666 ECC RAM. CentOS 리눅스* 릴리스 7.3.1611(코어), 리눅스* 커널 3.10.0-514.10.2.el7.x86_64. SSD: 인텔® SSD 데이터 센터 S3700 시리즈(800GB, 2.5인치 SATA 6Gb/s, 25nm, MLC). 성능 측정 조건: 환경 변수: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU 주파수를 CPU 전원 주파수 -d 2.5G -u 3.8G -g 성능으로 설정. Caffe: (http://github.com/intel/caffe/), 개정 f96b759f71b2281835f690af267158b82b150b5c. 추론은 "caffe time --forward_only" 명령으로, 트레이닝은 "caffe time" 명령으로 측정했습니다. "ConvNet" 토폴로지에는 합성 데이터 집합을 사용했습니다. 기타 토폴로지의 경우 데이터가 로컬 스토리지에 저장되어 트레이닝 전에 메모리에 캐싱되었습니다. 토폴로지 사양: https://github.com/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50). 인텔® C++ 컴파일러 버전 17.0.2 20170213, 인텔® Math Kernel Library(인텔® MKL) 소형 라이브러리 버전 2018.0.20170425. Caffe는 "numactl -l"로 실행.

2

성능 결과는 구성에 표시된 날짜의 테스트를 기반으로 하며 공개된 모든 보안 업데이트가 반영되어 있지 않을 수도 있습니다. 자세한 내용은 공개된 구성 정보를 참조하십시오. 어떤 제품 또는 구성 요소도 절대적으로 안전할 수는 없습니다.

3

성능 테스트에 사용된 소프트웨어 및 워크로드는 인텔® 마이크로프로세서에만 적합하도록 최적화되었을 수 있습니다. SYSmark* 및 MobileMark*와 같은 성능 테스트는 특정 컴퓨터 시스템, 구성 요소, 소프트웨어, 운영 및 기능을 사용해서 수행합니다. 해당 요소에 변경이 생기면 결과가 달라질 수 있습니다. 구매를 고려 중인 제품을 제대로 평가하려면 다른 제품과 결합하여 사용할 경우 해당 제품의 성능을 포함한 기타 정보 및 성능 테스트를 참고해야 합니다. 자세한 내용은 www.intel.co.kr/benchmarks를 참조하십시오.

4

최적화 공지: 인텔 컴파일러는 인텔® 마이크로프로세서에 고유하지 않은 최적화에 대해 타사 마이크로프로세서에서 동일한 수준의 최적화를 제공할 수도 있고 그렇지 않을 수도 있습니다. 이러한 최적화에는 인텔® Streaming SIMD Extensions 2(인텔® SSE2), 인텔® SSE3, Supplemental Streaming SIMD Extensions 3(SSSE3) 명령 세트 및 기타 최적화 등이 있습니다. 인텔은 인텔이 제조하지 않은 마이크로프로세서에 대한 최적화의 사용 가능성, 기능 또는 효과를 보장하지 않습니다. 마이크로프로세서에 따라 좌우되는 이 제품의 최적화는 인텔 마이크로프로세서에 사용할 목적으로 만들어졌습니다. 인텔® 마이크로아키텍처 전용이 아닌 특정 최적화는 인텔 마이크로프로세서에 사용할 수 있습니다. 이 공지가 적용되는 구체적인 명령 세트에 대한 자세한 정보는 해당 제품의 사용자 안내서 및 참조 안내서를 참조하십시오. 공지 개정 #20110804.

5

OpenVINO 및 OpenVINO 로고는 미국 및/또는 기타 국가에서 인텔사 또는 그 자회사의 상표입니다.