Example of Asynchronous Benchmarking
To demonstrate the effects of asynchronous execution, consider a warmup toy problem that generates a random matrix and multiplies it by itself. When benchmarking this matrix multiplication in a deep learning framework like PyTorch or MXNet against NumPy, the framework's output appears to be orders of magnitude faster. While GPU execution provides significant speedup, the massive time difference primarily occurs because the framework's operations are asynchronous: the backend executes the computation while the frontend immediately returns control to Python. Accurate benchmarking requires forcing the framework to finish all backend computations prior to returning the measured time, revealing the true execution duration.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Global Synchronization in MXNet
Variable-Specific Synchronization in MXNet
Implicit Blockers in Deep Learning Frameworks
Global Synchronization in PyTorch
Example of Asynchronous Benchmarking
Scheduling Overhead in Multithreaded Deep Learning Systems
Example of Synchronous vs. Asynchronous Increment Benchmark
Minibatch Synchronization to Prevent Task Queue Overflow
Chip Vendor Performance Analysis Tools for Deep Learning
Automatic Multi-GPU Parallelism via Asynchronous Execution