Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding

(arxiv.org)

4 points | by chrsw 5 hours ago ago

No comments yet.