Learn Before
Asynchronous Execution in Deep Learning Frameworks
By default, operations in deep learning frameworks are executed asynchronously in the backend. When a user issues a command via a frontend language (such as Python), the task is immediately placed into a backend queue, and the frontend instantly regains control without waiting for the computation to finish. This design allows the frontend thread to continue executing subsequent statements quickly, ensuring that the frontend language's performance overhead does not bottleneck the heavy computations being processed simultaneously on hardware accelerators like GPUs.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Learn After
Global Synchronization in MXNet
Variable-Specific Synchronization in MXNet
Implicit Blockers in Deep Learning Frameworks
Global Synchronization in PyTorch
Example of Asynchronous Benchmarking
Scheduling Overhead in Multithreaded Deep Learning Systems
Example of Synchronous vs. Asynchronous Increment Benchmark
Minibatch Synchronization to Prevent Task Queue Overflow
Chip Vendor Performance Analysis Tools for Deep Learning
Automatic Multi-GPU Parallelism via Asynchronous Execution