onnxruntime/docs
kunal-vaishnavi 50bda44a70
Fix equation in MatMulNBits op spec (#22253)
### Description
This PR fixes an equation in the MatMulNBits op spec. The old formula is
stated as

```
[CeilDiv((N * n_blocks_per_col + 1) * bits, 8)]
```

but it should be stated as

```
[N * CeilDiv(n_blocks_per_col * bits, 8)]
```

or as

```
[N * FloorDiv((n_blocks_per_col + 1) * bits, 8)]
```

### Motivation and Context
For models such as ChatGLM where the column size is odd, the division
math can be off. For example:


![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0)

With the old equation, the projections are calculated as follows.

```
# Down projection
B = 4,096 x 107 x 64
zero_points = 221,184
N = 4,096
n_blocks_per_col = 107
 
4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184

# Up projection
B = 13,696 x 32 x 64
zero_points = 219,136
N = 13,696
n_blocks_per_col = 32
 
13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832
```

With the new equation, the projections are calculated as follows.

```
# Down projection
B = 4,096 x 107 x 64
zero_points = 221,184
N = 4,096
n_blocks_per_col = 107
 
4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184

# Up projection
B = 13,696 x 32 x 64
zero_points= 219,136
N = 13,696
n_blocks_per_col = 32
 
13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136
```
2024-10-01 09:31:56 -07:00
..
c_cxx Remove extraneous javascript includes (#17558) 2023-09-14 20:43:24 -07:00
execution_providers/images
images
python [Fix] Make python API doc generation in Microsoft-hosted Agent (#21766) 2024-08-20 23:32:38 +08:00
ABI_Dev_Notes.md Fix a typo in ABI_Dev_Notes.md (#17832) 2023-10-09 07:51:34 -07:00
Android_testing.md
C_API_Guidelines.md
cmake_guideline.md
Coding_Conventions_and_Standards.md Update lintrunner requirements (#22185) 2024-09-23 18:27:16 -07:00
ContribOperators.md Fix equation in MatMulNBits op spec (#22253) 2024-10-01 09:31:56 -07:00
FAQ.md [Technical docs] Fixed a couple of old links in FAQ.md (#17415) 2023-09-26 13:38:24 -07:00
How_To_Update_ONNX_Dev_Notes.md Update Dockerfile.cuda (#21042) 2024-06-13 23:50:03 -07:00
Memory_Optimizer.md Flash attention recompute (#20603) 2024-05-21 13:38:19 +08:00
Model_Test.md Update docs/Model_Test.md (#11466) 2024-05-15 11:33:11 -07:00
NotesOnThreading.md
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md Remove the extensions submodule (#17097) 2023-08-14 10:16:33 -07:00
OperatorKernels.md Support if node with sequence outputs (#22234) 2024-09-27 12:40:01 -07:00
ORT_Format_Update_in_1.13.md
ORT_Use_Triton_Kernel.md Rename a mispelled filename in the documentation (#21066) 2024-06-17 18:18:41 +02:00
ORTModule_Convergence_Notes.md Fix and enable few ORTModule Unit Tests (#19847) 2024-03-12 10:49:19 +08:00
ORTModule_ModuleWithLoss_Wrapper.md
ORTModule_PythonOp_Notes.md Add document for PythonOp (#17888) 2023-10-12 08:36:22 +08:00
ORTModule_Training_Guidelines.md Adds ATen fallback for scaled_dot_product_attention (#21107) 2024-07-22 16:37:04 -07:00
PR_Guidelines.md
Privacy.md
Reduced_Operator_Kernel_build.md
ReleaseManagement.md
Roadmap.md
Server.md
TVM_EP.md Fix: update hyperlinks to the Jupyter notebooks (#16145) 2023-08-21 09:53:05 -07:00
Versioning.md
WinML_principles.md