So you can get determinism locally. On a cursory search I wasn't able to find any LLM provider advertising determism; if you need it for research you might have to rent a dedicated GPU pod and run vllm there with the appropriate settings.
see the temperature parameter ? with current tech even if you set it to zero when you provide the same input multiple times you will rarely get the exact same output.
"Defeating Nondeterminism in LLM Inference" ( https://thinkingmachines.ai/blog/defeating-nondeterminism-in...) has a repo: https://github.com/thinking-machines-lab/batch_invariant_ops
which seems to have eventually been merged into vllm: https://docs.vllm.ai/en/latest/features/batch_invariance/
So you can get determinism locally. On a cursory search I wasn't able to find any LLM provider advertising determism; if you need it for research you might have to rent a dedicated GPU pod and run vllm there with the appropriate settings.
thanks
you mean... an algorithm? what is a "deterministic LLM"?
see the temperature parameter ? with current tech even if you set it to zero when you provide the same input multiple times you will rarely get the exact same output.
The temperature parameter cannot be set to 0 via GUI, merely to a sall vale close to 0, as that would lead to a division by 0 in the code.
in proper implementation 0 means greedy decoding
yeah. because it's an LLM.
again, you are looking for an algorithm.
Besides caching I mean