Floyd-Warshall uses a array dist[i][j] of size . For , this is million entries (- MB for integers). This fits in cache on modern machines. Initialize the array carefully: set diagonal to , edges to weights, rest to infinity (use a large constant like ).
Proper initialization is critical. If you mess up the initial state, the algorithm computes garbage. Memory layout affects performance on large inputs.