* Implement multi-stage Dockerfile
- Reduces image size from 2.3 GB to 1.46 GB.
- Uses Ubuntu based OpenVINO image as base image leading to fewer
required instructions
- Does not include unnecessary build time components in deploy image
* Remove wget after usage
* Uninstall wget in the same RUN statement
Avoids re-distributing wget package in any of the layers
* Update License header according to Intel guidelines
Updated the license header according to Intel corporate guidelines.
* Use Ubuntu18's default Python3
Don't install Miniconda and use the default Python3 provided by
the base Ubuntu 18 OS.
* OpenVINO EP with CentOS7
Dockefile to build ONNX RT with OpenVINO EP with a CentOS 7 base.
* Dockerfile documentation changes
Updated documentation to show the latest docker image location and
usage details.
* updated ov-ep doc link
* Temporarily disabling VAD-M due to regression
* fix for vad-m daemon config setting
* Revert "Temporarily disabling VAD-M due to regression"
This reverts commit c503bea38397f332b220321823e0ca1c55f4aab3.
VAD-M issue fixed. this is no longer needed
* Revert "Revert "Temporarily disabling VAD-M due to regression""
This reverts commit 7ca53feb2ba585c050be81770698f9abae8dbe28.
* Revert "fix for vad-m daemon config setting"
This reverts commit 9964f8452194655c0b988bd8472da45996deca38.
* Ubuntu Dockerfile update w.r.t 2021.4
This dockerfile uses openvino 2021.4 runtime
base image from OpenVINO.
uses onnxruntime 1.8 release branch to generate the
image.
Added fix for VADM HDDL
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Added new dependency in deploy stage
Added sources for all the dependency
packages of unattended-upgrades package
which had GPL license into deploy stage.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated CentOS Dockerfile to the latest 2021.4
-Dockerfile updated
-VADM Fix added
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated c# openvino dockerfile w.r.t 2021.4
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated the ubuntu dockefile branch and repo
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated Dockerfile Documentation w.r.t 2021.4
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated GCC version to 10 for centos dockerfile
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
* update onnx-tensorrt parser to master
* disable unsupported tests
* add cuda sm 75 for T4
* update tensorrt pipeline
* update trt pipelines
* update trt pipelines
* Update linux-gpu-tensorrt-ci-pipeline.yml
* update trt cid pipeline
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update Tensorrt Windows build pool and TensorRT/CUDA/CuDNN version
* update to cuda11.4 in trt ci pipeline
* update base image to cuda11.4
* update packaging pipeline to cuda11.4
* clean up
* remove cuda11.1 and cuda11.3 docker file
* disable unsupported tensorrt tests at runtime
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
don't transpose B if A is a 1D array
Don't transpose and pre-pack B if A is a 1D array, because
we only handle non-transposed case when we compute MatMul's
shape in codegen/mti/math/matmul_ops.cc
Co-authored-by: Yang Chen <yanchen@microsoft.com>
1. Update SDLNativeRules from v2 to v3. The new one allows us setting excluded paths.
2. Update TSAUpload from v1 to v2. And add a config file ".gdn/.gdntsa" for it.
3. Fix some parentheses warnings
4. Update cmake to the latest.
5. Remove "--x86" build option from pipeline yaml files. Now we can auto-detect cpu architecture from python. So we don't need to ask user to specify it.
* enable shared lib test on linux
* fix build break
* add onnx dependency
* add rpath
* skip the test for linux training
* set ONNX_ML definition
* install training python dependency
* update
* fix format; add eigen include folder
* fix format
* skip amd build
* enable shared provider on training
* fix comments in pr
Co-authored-by: Ubuntu <chenta@chenta-orttraining-cpu.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
Co-authored-by: Changming Sun <chasun@microsoft.com>
* correct batchnorm replacement output order;
remove bn replacement in grad graph builder
* update op defs and kernel class
* implement batch norm internal and grad.
* change saved_var into saved_inv_std
* cuda test case: bn internal
* remove redundant include
* fix comment; add support and UT for 1d input.
* exclude batch_norm_internal in amd_hipify
* run BNInternal UT for CUDA only
* fix CI error
* fix comment errors
* fix error
* add comment for inconsistency with cudnnBN doc
* additional comments for cudnnBN inconsistency
QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision.
Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute.
The formula for quantized GEMM is:
Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which,
C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)
*) use context buffer allocator, remove init cost of vector
*) using lookup table to dequantize large input
*) fall back to global average pool if it is
Adds a StridedCopy function that implements a copy from strided tensor to another.
This parallelizes the Concat operator, and can also be used in the future to parallelize many other data movement operators (e.g. Transpose, Split, etc.).
This operation is also required for the proposed data layout extensions to ORT.
* Do not copy the model_data when session is started by CreateSessionFromArray
* Add config option for disabling copy model bytes
* Add one additional test
* Address CR comments
* attention fusion kernel refactored
* consider the case of none in add_qk
* variabled added to check for pre-pack weights
* added a comment to PrePack()
* Optimized prepack and try to free the weights
* making comment sound better
* fixing a bug with optimizer.py
* commented out changes to be done
* removed comments
* make the private fn() private
* fix build
* making clean up fn static
* backed out optimizer tool change, needs more looking into
* freeze/fastpath support
* more comments on _fast_path
* per comments
* minor fix
* IntFlag improve
* address comments
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* atenop for inference
* assert if dtype mismatch
* atenop config in frontend
* fix orttrainer test
* gradient def not only for ATenOp
* bugfix
* fix gradient input shape and type issue
* fix after merge master