Deform conv2d mps support #9026

goldfishsound · 2025-04-20T13:55:45Z

DeformConV2 MPS Support

This PR feature a full MPS implementation of the DeformConV2 operator.

Generel notes
For consistency and maintainability, this implementation is in many ways similar to the CPU and CUDA implementations, with the obvious exceptions relating to the difference in their frameworks. The Metal part of the implementation is likewise similar to the implementations for the “ROI” related kernel implementations.
Tests
The implementation passes all tests in test_ops.py:TestDeformConv except for the two optests.generate_opcheck_test:
"test_autograd_registration"
and
"test_aot_dispatch_dynamic"

"test_autograd_registration" fails due to the missing MPS dispatch key in torch/testing/_internal/optests/autograd_registration.py
I have addressed this issue in a separate PR:
torch.testing._internal.optests - MPS Support #151758

test_aot_dispatch_dynamic fails due to issues that are not yet clear to me.
Issues - To be fixed
The CPU implementation: deform_conv2d_kernel.cpp is using the in-placed torch operator: .addmm.
However deform_conv2d_kernel.mm.
However, for reasons unknown to me, using .addmm in the MPS implementation, returns zero-value tensors after the first iteration in the convolution loop.
As a temporary solution, I have chosen to use the out-of-place version: addmm instead. This is not ideal and should be fixed.
MSL implementation - mps_kernels.h and mps_helpers.h implementations.
The implementation of the bilinear_interpolate function used by the “ROI”-related kernels is different from that used by the CPU and CUDA implementations of the deform_conv2 operator.
Currently, I have chosen to keep both implementations in mps_kernels.h and named the function: bilinear_interpolate_deform_conv2.
Suggestions
However, I suggest that mps_kernels.h be split into separate kernels, one for each ”ROI”-related operator, and one for the “deform_conv2” operator. Future implementations of ops should have their own separate kernel files.
This will not only be in keeping with the implementation design in Pytorch but also lead to safer and more maintainable code in the long run.
I also suggest that any common utility functions and constants found in mps_kernels.h be moved into mps_helpers.h in the future. Also, maybe consider renaming mps_helpers.h to a more generic name.

…el.h

…ntation.

Removed the product which is not part of the repo.

…sed for cpp and cuda implementation of deform_conv2d, and the implementation used in the optest. kernel deformable_im2col: Using threadgroups_per_grid as the n_tgs in the MPS_1D_KERNEL_LOOP to prevent multiple index values generated by the macro, when threadgroups_per_grid is larger than 1.

…m_coord kernels Modified the MPS_1D_KERNEL_LOOP to use the new threadgroups_per_grid

Substitutes addmm_ with addmm in the forward pass because addmm_ failes for weight groups > 1 (see comment in the code)

test_forward is passing test_backward is failing

Getting rid of redundant contiguous conversions of tensors.

Skipping backward test for batch_sz == 0

Removed temporary debug functions Removed redundant comments Temporary substitution of .addmm op with addop. Clean-up of debug std::cout statements

pytorch-bot · 2025-04-20T13:55:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9026

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[Infra] Jobs got intermittently cancelled/fail midway checkout

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2025-04-20T13:57:53Z

Didn't find following labels among repository labels: topic: not user facing

goldfishsound · 2025-04-20T14:03:10Z

@pytorchbot label "enhancement"
@pytorchbot label "module: ops"
@pytorchbot label "module: vision"

goldfishsound · 2025-04-21T08:22:28Z

@pytorchbot label "topic: not user facing"

pytorch-bot · 2025-04-21T08:22:32Z

Didn't find following labels among repository labels: topic: not user facing

chenjingcheng · 2025-04-25T07:19:33Z

I test it,it is very slow:
Running on cpu
Time: 0.005s
Output: torch.Size([1, 8, 128, 128])

Running on mps
Time: 0.292s
Output: torch.Size([1, 8, 128, 128])
I built it from source code.

goldfishsound added 30 commits October 7, 2024 16:21

Start of branch

93e044b

Setting up for development

d838bf7

Initial commit for deform_conv2d for MPS

95eb1cd

New mps kernel for deform_conv2d and updated shader functions in kern…

c53e1bd

…el.h

Renaming source file.

1153b84

Changed part of the file name from _kernal to _kernel

1c87a26

Remove files in product dir

8a984de

Removing framework dir and included files.

970183d

Removing build_xcode dir and included files.

2895f4f

Changed location references to pytorch

2f06f7f

Clean up git - Removing .DS_Store

66d76d3

Altering the kernel deformable_im2col to mimic the cpp kernel impleme…

c8eb2ea

…ntation.

Re-ordering include sequence

c92eaa4

Including mps in TestDeformConv::test_is_leaf_node

b445aed

Updates gitignore

951880c

Merge branch 'main' into deform_conv2d_mps_support

1aa7c0b

Merge branch 'pytorch:main' into deform_conv2d_mps_support

83080da

Update .gitignore

9f68fd4

Removed the product which is not part of the repo.

CleanUp

dc305ae

Cleaned up - removed added exclusions.

e25e620

Updated

e4fb8c5

Removed CMakePresets.json

e39867f

Updated to exclude CMakePresets.json

3e2bc0e

Reorganized the numbering of argumnet indexes in img2col

350454f

Added threadgroups_per_grid to deformable_col2im and deformable_col2i…

b31a28c

…m_coord kernels Modified the MPS_1D_KERNEL_LOOP to use the new threadgroups_per_grid

Added printTensor utility function - only temporarily

358dacc

Substitutes addmm_ with addmm in the forward pass because addmm_ failes for weight groups > 1 (see comment in the code)

Modifying TestDeformConv to include mps tests.

7da876a

test_forward is passing test_backward is failing

Merge branch 'pytorch:main' into deform_conv2d_mps_support

da6134d

House Cleaning:

25a2944

Getting rid of redundant contiguous conversions of tensors.

goldfishsound added 4 commits April 20, 2025 14:45

Renaming of bilinear_interpolate2 to bilinear_interpolate_deform_conv2d

bf7784d

Added constant mps_backward_eps for eps in backward test.

9d3105f

Skipping backward test for batch_sz == 0

Removed unused includes

501d617

Removed temporary debug functions Removed redundant comments Temporary substitution of .addmm op with addop. Clean-up of debug std::cout statements

Delete

a294c2e

facebook-github-bot added the cla signed label Apr 20, 2025

pytorch-bot bot added the enhancement label Apr 20, 2025

goldfishsound mentioned this pull request Apr 20, 2025

deform_conv2d for mps #7490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deform conv2d mps support #9026

Deform conv2d mps support #9026

goldfishsound commented Apr 20, 2025

pytorch-bot bot commented Apr 20, 2025

pytorch-bot bot commented Apr 20, 2025

goldfishsound commented Apr 20, 2025

goldfishsound commented Apr 21, 2025

pytorch-bot bot commented Apr 21, 2025

chenjingcheng commented Apr 25, 2025

Deform conv2d mps support #9026

Are you sure you want to change the base?

Deform conv2d mps support #9026

Conversation

goldfishsound commented Apr 20, 2025

DeformConV2 MPS Support

This PR feature a full MPS implementation of the DeformConV2 operator.

pytorch-bot bot commented Apr 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9026

❗ 1 Active SEVs

pytorch-bot bot commented Apr 20, 2025

goldfishsound commented Apr 20, 2025

goldfishsound commented Apr 21, 2025

pytorch-bot bot commented Apr 21, 2025

chenjingcheng commented Apr 25, 2025