The WINDOW_PATTERN string controls which layers use sliding window versus full context attention.
S layers use a half-context sliding window. If MAX_SEQ_LEN is , each S layer attends to tokens. L layers use full-context attention across all tokens. The default pattern is "SSSL": three short layers, then one long layer, repeating.
Agents have found that adding more S layers helps. "SSSSL" gave a improvement. Shorter windows also worked: quarter-context () and eighth-context (, or tokens). Going too extreme fails. Sixteenth-context and all-short patterns both regressed.