pytorch/.github/pytorch-probot.yml
Huy Do 24a223c49d Run inductor micro benchmark on x86 metal runner (#135042)
This enables inductor micro benchmark on CPU (x86):

* Running on AWS metal runner for more accurate benchmark
* I add a new `arch` column, which will be either x86_64 or arm64 for CPU or GPU name for GPU.  We can use this later to differentiate between different setup, i.e. cuda (a100) vs cuda (a10g) or cpu (x86_64) vs cpu (arm64)

The next step would be to run this one cpu arm64, and cuda (a10g).

### Testing
Here is the CSV results from my test run https://github.com/pytorch/pytorch/actions/runs/10709344180

```
name,metric,target,actual,dtype,device,arch,is_model
mlp_layer_norm_gelu,flops_utilization,0.8,17.36,bfloat16,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),990,170.80,int8,cpu,x86_64,False
gather_gemv,memory_bandwidth(GB/s),1060,204.78,bfloat16,cpu,x86_64,False
Mixtral-8x7B-v0.1,token_per_sec,175,26.68,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,171.91,int8,cpu,x86_64,True
Mixtral-8x7B-v0.1,compilation_time(s),162,47.36,int8,cpu,x86_64,True
gemv,memory_bandwidth(GB/s),870,236.36,int8,cpu,x86_64,False
gemv,memory_bandwidth(GB/s),990,305.71,bfloat16,cpu,x86_64,False
Llama-2-7b-chat-hf,token_per_sec,94,14.01,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,185.18,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),162,74.99,bfloat16,cpu,x86_64,True
Llama-2-7b-chat-hf,token_per_sec,144,25.09,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,165.83,int8,cpu,x86_64,True
Llama-2-7b-chat-hf,compilation_time(s),172,70.69,int8,cpu,x86_64,True
layer_norm,memory_bandwidth(GB/s),950,172.03,bfloat16,cpu,x86_64,False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135042
Approved by: https://github.com/yanboliang
2024-09-05 21:31:36 +00:00

31 lines
652 B
YAML

tracking_issue: 24422
ciflow_tracking_issue: 64124
ciflow_push_tags:
- ciflow/binaries
- ciflow/binaries_conda
- ciflow/binaries_libtorch
- ciflow/binaries_wheel
- ciflow/inductor
- ciflow/inductor-rocm
- ciflow/inductor-perf-compare
- ciflow/inductor-micro-benchmark
- ciflow/inductor-micro-benchmark-cpu-x86
- ciflow/inductor-cu124
- ciflow/linux-aarch64
- ciflow/mps
- ciflow/nightly
- ciflow/periodic
- ciflow/rocm
- ciflow/slow
- ciflow/trunk
- ciflow/unstable
- ciflow/xpu
- ciflow/torchbench
retryable_workflows:
- pull
- trunk
- linux-binary
- windows-binary
labeler_config: labeler.yml
label_to_label_config: label_to_label.yml
mergebot: True