Skip to content

DeepSparse v1.0.0

Compare
Choose a tag to compare
@jeanniefinks jeanniefinks released this 01 Jul 16:34
· 5 commits to release/1.0 since this release

New Features:

  • Support added for running multiple models with the same engine when using the Elastic Scheduler.
  • When using the Elastic Scheduler, the caller can now use the num_streams argument to tune the number of requests that are processed in parallel.
  • Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
  • Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
  • AWS SageMaker example created.

Changes:

  • Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Performance:

  • Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

Resolved Issues:

  • When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
  • Assertion error addressed for Reduce operations where the reduction axis is of length 1.
  • Rare assertion failure addressed related to Tensor Columns.
  • When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

Known Issues:

  • In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
  • The engine will crash with an assertion failure when setting the num_streams parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
  • In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.