onnxruntime/orttraining/tools/scripts
Ashrit Shetty 4b5b5f7101
Update win-ort-main to tip main 250123 (#23473)
### Description
This PR is to update the win-ort-main branch to the tip main branch as
of 2025-01-23.

### PR List
ddf0d377a7 [QNN EP] Add LoggingManager::HasDefaultLogger() to provider
bridge API (#23467)
05fbbdf91f [QNN EP] Make QNN EP a shared library (#23120)
1336566d7f Add custom vcpkg ports (#23456)
2e1173c411 Update the compile flags for vcpkg packages (#23455)
1f628a9858 [Mobile] Add BrowserStack Android MAUI Test (#23383)
009cae0ec8 [js/webgpu] Optimize ConvTranspose (Continue) (#23429)
04a4a694cb Use onnx_protobuf.h to suppress some GCC warnings (#23453)
2e3b62b4b0 Suppress some strict-aliasing related warnings in WebGPU EP
(#23454)
b708f9b1dc Bump ruff from 0.9.1 to 0.9.2 (#23427)
c0afc66b2a [WebNN] Remove workarounds for TFLite backend (#23406)
8a821ff7f9 Bump vite from 6.0.7 to 6.0.11 in
/js/web/test/e2e/exports/testcases/vite-default (#23446)
220c1a203e Make ORT and Dawn use the same protobuf/abseil source code
(#23447)
b7b5792147 Change MacOS-13 to ubuntu on for
android-java-api-aar-test.yml. (#23444)
19d0d2a30f WIP: Dp4MatMulNBits accuracy level 4 matmul for WebGPU EP
(#23365)
95b8effbc4 [QNN EP]: Clean up QNN logging resources if an error occurs
during initialization (#23435)
626134c5b5 Bump clang-format from 19.1.6 to 19.1.7 (#23428)
0cf975301f Fix eigen external deps (#23439)
f9440aedce Moving RN_CI Android Testing to Linux (#23422)
1aa5902ff4 [QNN EP] workaround for QNN validation bug for Tanh with
uint16 quantized output (#23432)
7f5582a0e2 Seperate RN andriod and IOS into 2 separated Stages. (#23400)
73deac2e7f Implement some missing element wise Add/Sub/Mul/Div/Neg
operations for CPU and CUDA EPs (#23090)
949fe42af4 Upgrade Java version from react-native/android to Java 17
(#23066)
0892c23463 Update Qnn SDK default version to 2.30 (#23411)
94c099bcec Fix type cast build error (#23423)
d633e571d1 [WebNN EP] Fix AddInitializersToSkip issues (#23354)
e988ef00e2 [QNN EP] Fix regression for MatMul with two quantized/dynamic
uint16 inputs (#23419)
7538795f6b Update onnxruntime binary size checks ci pipeline's docker
image (#23405)
6c5ea41cad Revert "[QNN EP] Clean up correctly from a partial setup
(#23320)" (#23420)
e866804bbe Enable comprehension simplification in ruff rules (#23414)
0a5f1f392c bugfix: string_view of invalid memory (#23417)
4cc38e0277 fix crash when first input of BatchNormalization is 1-D
(#23387)
033441487f Target py310 and modernize codebase with ruff (#23401)
87341ac010 [QNN EP] Fix segfault when unregistering HTP shared memory
handles (#23402)

### Motivation and Context
This update includes the change to make QNN-EP a shared library.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Peishen Yan <peishen.yan@intel.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: Jian Chen <cjian@microsoft.com>
Co-authored-by: Alexis Tsogias <1114095+Zyrin@users.noreply.github.com>
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sushraja-msft <44513542+sushraja-msft@users.noreply.github.com>
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: Caroline Zhu <wolfivyaura@gmail.com>
2025-01-23 09:12:03 -08:00
..
experiment.py
gpt2_model_transform.py Update win-ort-main to tip main 250116 (#23398) 2025-01-16 15:20:25 -08:00
layer_norm_transform.py
model_transform.py Update win-ort-main to tip main 250116 (#23398) 2025-01-16 15:20:25 -08:00
nv_run_pretraining.py Update win-ort-main to tip main 250123 (#23473) 2025-01-23 09:12:03 -08:00
opset12_model_transform.py Update win-ort-main to tip main 250116 (#23398) 2025-01-16 15:20:25 -08:00
performance_investigation.py
pipeline_model_split.py
README.txt
single_node_perf.sh
sqldb_to_tensors.py
train.py
watch_experiment.py Update win-ort-main to tip main 250123 (#23473) 2025-01-23 09:12:03 -08:00

Procedure to export NV's pytorch model to ONNX.

1. cd into BERT in DeepLearningExamples and launch the docker.
2. run nv_run_pretraining.py using the same parameter you run run_pretraining.py. It will produce a model in your checkpoint directory.
3. Assume that the exported model's name is 'bert.onnx'. Run model_transform.py bert.onnx
4. Then run layer_norm_transform.py bert_optimized.onnx. The final model name would be bert_optimized_layer_norm.onnx
5. Now, you can run training with the newly created model.

Note that if you want to change model's configuration, you can edit bert_config.json in the BERT directory.

Example commands:
Step 2 (inside docker):
python3 /workspace/bert/nv_run_pretraining.py --input_dir=data/bookcorpus/hdf5_shards/ --output_dir=/results/checkpoints1 --config_file=bert_config.json --bert_model=bert-large-uncased --warmup_proportion=0 --num_steps_per_checkpoint=2000 --learning_rate=0.875e-4 --seed=42 --do_train --phase2 --max_seq_length=512 --max_predictions_per_seq=80 --max_steps=200 --train_batch_size=2 

Step 3 (inside onnxruntime/build/Linux/RelWithDeb):
sudo /data/anaconda/envs/py35/bin/python /bert_ort/wechi/DeepLearningExamples/PyTorch/LanguageModeling/BERT/model_transform.py /bert_ort/wechi/DeepLearningExamples/PyTorch/LanguageModeling/BERT/results/checkpoints1/bert_for_pretraining_without_loss_vocab_30528_hidden_1024_maxpos_512.onnx 

Step 4 (inside onnxruntime/build/Linux/RelWithDeb):
sudo /data/anaconda/envs/py35/bin/python /bert_ort/wechi/DeepLearningExamples/PyTorch/LanguageModeling/BERT/layer_norm_transform.py /bert_ort/wechi/DeepLearningExamples/PyTorch/LanguageModeling/BERT/results/checkpoints1/bert_for_pretraining_without_loss_vocab_30528_hidden_1024_maxpos_512_optimized.onnx

Step 5 (inside onnxruntime/build/Linux/RelWithDeb):
./onnxruntime_training_bert --num_of_perf_samples=100 --train_batch_size=1 --mode=perf --model_name /bert_ort/wechi/DeepLearningExamples/PyTorch/LanguageModeling/BERT/results/checkpoints1/bert_for_pretraining_without_loss_vocab_30528_hidden_1024_maxpos_512_optimized_layer_norm