Steering interpretable language models with concept algebra

(guidelabs.ai)

77 points | by luulinh90s 5 days ago ago

8 comments