Example

Example of Asynchronous Benchmarking

To demonstrate the effects of asynchronous execution, consider a warmup toy problem that generates a random 1000imes10001000 imes 1000 matrix and multiplies it by itself. When benchmarking this matrix multiplication in a deep learning framework like PyTorch or MXNet against NumPy, the framework's output appears to be orders of magnitude faster. While GPU execution provides significant speedup, the massive time difference primarily occurs because the framework's operations are asynchronous: the backend executes the computation while the frontend immediately returns control to Python. Accurate benchmarking requires forcing the framework to finish all backend computations prior to returning the measured time, revealing the true execution duration.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L