Entropy as a Measure of Surprise

(nchagnet.pages.dev)

4 points | by Brajeshwar 13 hours ago ago

3 comments

jll29 12 hours ago ago
Careful:
- "entropy" is information; "information", therefore, already is surprise; thus, it's dangerous to re-define "surprise" as -log P(x), which is already part of the definition of suprise, as that leads to ambiguity and a circularity;
- KL divergence is relative entropy (added surprise by a second distribution, given a first, so _relative_ surprise);
- I would caution about terms like "expected surprise" for the same reason as I object to "dry water"...
[-]
- pizza 5 hours ago ago
  OP is correct; surprisal is outcome-dependent and entropy is distribution-dependent
  - entropy is E_p[informativeness of measuring outcome x]
  - take n outcomes, then a distribution over them lives on the simplex \delta ^ (n - 1). you can lift this to R^n via the log odds map p_k -> x_k = log p_k -- now x \in R^n can describe a histogram with n-1 degrees of freedom
  - in log odds space, measurement is literally a linear functional from vector space of log probability onto the index of the outcome k.
  - imo surprisal of some p(x) is best understood as "the length of a pointer", entropy "the rarity-weighted average length of a pointer", and collision entropy "how specific you would have to be to describe witnessing a specific outcome"
  and in the same way, a single molecule of water, you might get by, calling dry
- nchagnet 9 hours ago ago
  Hi, author here! Thanks for the feedback, as I mentioned this is also to clarify things for myself so this helps a lot.
  Regarding your points:
  - I'm not sure I get your meaning here. My understanding is that for a random variable X, thr surprise is defined at the outcome level I(x) = - log p(x) while the entropy is essentially just the average value - sum_x p(x) log(p(x)). So to me it does look like entropy is expected surprise no? I do agree though that by being _expected_ surprise, entropy is itself a measure of surprise.
  - I very much agree with that which is why I used _excess_ surprise (maybe relative is a better choice, but the intent is the same).
  - That one I'm also confused about. It gets back to my first point: to me surprise (or information) is always defined at the outcome level first, so taking a moment is not tautological, it's meaningful, no?