Learn Before
Example of Benchmarking Explicit Synchronization in MXNet
Benchmarking different explicit synchronization methods in MXNet reveals that isolated operations can take approximately the same time to complete, despite having vastly different impacts on the overall computational graph. For instance, timing a simple matrix dot product operation b = np.dot(a, a) and then invoking global synchronization via npx.waitall() might take roughly seconds. Alternatively, performing the exact same calculation but using variable-specific synchronization via b.wait_to_read() takes a comparable seconds. While the individual execution times are nearly identical, wait_to_read() is generally preferred in practice because it only halts the specific branch of computation rather than stalling the entire global backend queue.
0
1
Tags
D2L
Dive into Deep Learning @ D2L