Learn Before
Example of Benchmarking Explicit Synchronization in MXNet
Benchmarking different explicit synchronization methods in MXNet reveals that isolated operations can take approximately the same time to complete, despite having vastly different impacts on the overall computational graph. For instance, timing a simple matrix dot product operation b = np.dot(a, a) and then invoking global synchronization via npx.waitall() might take roughly 0.0180 seconds. Alternatively, performing the exact same calculation but using variable-specific synchronization via b.wait_to_read() takes a comparable 0.0189 seconds. While the individual execution times are nearly identical, wait_to_read() is generally preferred in practice because it only halts the specific branch of computation rather than stalling the entire global backend queue.
0
1
Tags
D2L
Dive into Deep Learning @ D2L