The greedy insight: always connect the two smallest sticks.
When you merge sticks, the result is used in future merges. A stick that is merged early contributes its length to more future costs. Keep small sticks involved early to minimize cumulative cost.
This is the same logic as Huffman coding: merge the two least frequent (smallest) nodes first to minimize weighted path lengths. Why merging smallest first is optimal: a stick merged early participates in all subsequent merges.