Minibatch Synchronization to Prevent Task Queue Overflow
While asynchronous execution keeps the Python frontend highly responsive by allowing it to continuously enqueue operations without waiting, this responsiveness introduces a risk: if the frontend submits work faster than the backend can process it, the task queue grows unboundedly, leading to excessive memory consumption. To prevent such overflow, it is recommended to insert a synchronization barrier after each minibatch during training. This per-minibatch synchronization forces the frontend to pause briefly while the backend catches up, keeping the two approximately in step and bounding the queue's memory footprint without sacrificing the major throughput advantages of asynchronous execution.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Global Synchronization in MXNet
Variable-Specific Synchronization in MXNet
Implicit Blockers in Deep Learning Frameworks
Global Synchronization in PyTorch
Example of Asynchronous Benchmarking
Scheduling Overhead in Multithreaded Deep Learning Systems
Example of Synchronous vs. Asynchronous Increment Benchmark
Minibatch Synchronization to Prevent Task Queue Overflow
Chip Vendor Performance Analysis Tools for Deep Learning
Automatic Multi-GPU Parallelism via Asynchronous Execution