From Two Dimensions to One

Original DP: $dp[i][k]$ = best cost to partition positions $1$ to $i$ into exactly $k$ segments. This is $O(n \cdot K)$ states. Penalized DP: $dp_{\lambda}[i]$ = best (cost + $\lambda \times$ count) to partition positions $1$ to $i$ into any number of segments.

This is $O(n)$ states. The penalized DP is much faster. You compute it for different $\lambda$ values until you find the one that produces exactly $K$ segments.