Enhanced MTPP Slicing of Topological Order
The Max-Throughput Partitioning Problem (MTPP) is fundamental to our goal of optimizing the partitioning of computation graphs for distributed deep neural network inference. Acknowledging MTPP's NP-hardness which initially is established through a reduction from the minimum makespan scheduling problem highlights the impracticality of seeking a fully polynomial time approximation scheme. This emphasizes the need for a heuristic approach to achieve high throughput pipelined inference. We integrate a heuristic algorithm that leverages the strengths of Kahn's algorithm for topological sorting which is enhanced by dynamic programming and segment trees. This approach aims to balance the computational load across the nodes and minimize the communication overhead, which is critical for maintaining high throughput in distributed systems. Kahn's algorithm is chosen for its effectiveness in ensuring a valid topological order, which is essential for acyclic partitioning. Moreover, dynamic programming offers an optimal partition strategy across these ordered segments, and segment trees are utilized for their efficiency in querying segment costs which enables rapid adjustments to partitioning as we explore the solution space for an optimal configuration.
In our approach, the introduction of the segment cost data structure is crucial, as it enables the efficient computation of costs for any segment within the topological order. Initialized with a computation graph GG and a topological order ππ, it calculates costs that include both computational workload and communication overhead. The SliceGraph algorithm plays a central role in our approach, leveraging the segment cost data structure to partition the computation graph into segments that optimize throughput while minimizing latency. It dynamically partitions the graph into a specified number of blocks, ensuring equitable workload distribution and minimizing inter-node communication.
The initialization time complexity of the segment cost data structure is O(n2)O(n2) for a graph with nn nodes, enabling the pre-computation of costs associated with all possible segments. This setup is crucial for the efficient execution of the SliceGraph algorithm. The overall time complexity of the SliceGraph algorithm, before applying the convex hull trick, is O(n2k+mlog2(n))O(n2k+mlog2(n)) where kk is the number of blocks into which the graph is partitioned, and mm is the number of edges in the graph. This complexity takes into account the dynamic programming process and the cost queries facilitated by the segment cost data structure. By applying the convex hull trick, we optimize the dynamic programming aspect of the SliceGraph algorithm, reducing its time complexity to O(nklogn)O(nklogn). This optimization significantly enhances the efficiency of partitioning, allowing for quicker determination of the optimal graph partitioning strategy.
Biased Random-Key Genetic Algorithm
The integration of the Biased Random-Key Genetic Algorithm (BRKGA) into our approach addresses the inherent limitations of the above heuristic-based methods by introducing a stochastic and evolutionary mechanism. This allows for an expansive exploration of partition configurations, crucial for solving the NP-hard Max-Throughput Partitioning Problem (MTPP) efficiently. BRKGA's capability to evolve partitioning strategies over generations enables the discovery of high-quality solutions that balance the computational load and minimize communication overhead, which deterministic methods alone may not identify, ensuring optimal throughput in distributed deep neural network inference.
Last updated