You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The TestAffineQuantizedTensorParallel fails on H100 for bfloat16, float16 and float32 dtypes. Need to debug the reason and fix for it.
Error:
ERROR: test_tp_float32 (torchao.testing.utils.TorchAOTensorParallelTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/appy/.conda/envs/dev_ao/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 541, in wrapper
self._join_processes(fn)
File "/home/appy/.conda/envs/dev_ao/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 767, in _join_processes
self._check_return_codes(elapsed_time)
File "/home/appy/.conda/envs/dev_ao/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 821, in _check_return_codes
raise RuntimeError(
RuntimeError: Process 0 terminated or timed out after 300.0712020397186 seconds
----------------------------------------------------------------------
Ran 9 tests in 2701.009s
FAILED (errors=9)
/home/appy/.conda/envs/dev_ao/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 36 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d
The text was updated successfully, but these errors were encountered:
* Add warning comments referring to unimplemented functionality
* JSON formatted response using OpenAI API types for server completion requests
* Add models endpoint (pytorch#1000)
The TestAffineQuantizedTensorParallel fails on H100 for bfloat16, float16 and float32 dtypes. Need to debug the reason and fix for it.
Error:
The text was updated successfully, but these errors were encountered: