So still in regime of representing actions with discrete tokens
Trains 5x faster than diffusion or flow matching
You take a normalized action chunk, perform a discrete cosine transform to get the frequency components, then quantize, then convert to a sparse frequency matrix, then flatten by sequencing the lowest frequencies first, then train a BPE tokenizer
combining the DCT with BPE yields sequences 10x shorter than otherwise