1Cademy - Example of Asynchronous Benchmarking

Learn Before

Asynchronous Execution in Deep Learning Frameworks

Example

Example of Asynchronous Benchmarking

To demonstrate the effects of asynchronous execution, consider a warmup toy problem that generates a random $1000 imes 1000$ matrix and multiplies it by itself. When benchmarking this matrix multiplication in a deep learning framework like PyTorch or MXNet against NumPy, the framework's output appears to be orders of magnitude faster. While GPU execution provides significant speedup, the massive time difference primarily occurs because the framework's operations are asynchronous: the backend executes the computation while the frontend immediately returns control to Python. Accurate benchmarking requires forcing the framework to finish all backend computations prior to returning the measured time, revealing the true execution duration.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

GPU Warm-up for Accurate Benchmarking

Learn Before

Related

Learn After