Andrew Rothman
2 min readSep 27, 2021

--

Hello Konstantin,

Thank you for the feedback. Regarding the section on conjugate priors, I’m sorry if you found the Binomial and Poisson proofs confusing. However, I would push back on your comment that these proofs are somehow “improper” or incomplete. That statement is simply incorrect. Both proofs are fully complete and fully specified.

Concentrating first on the Binomial proof:

- the term B(/alpha, /beta) is standard mathematical notation for the Beta function (not to be confused with the Beta distribution). I would expect the audience for this piece to be familiar. Resources for the Beta function and Beta distribution can be found here: https://en.wikipedia.org/wiki/Beta_function

https://en.wikipedia.org/wiki/Beta_distribution

- neither the terms B(/alpha, /beta) or nCx are functions of parameter “p”, and can therefore be multiplicatively absorbed into Normalizing Constant NC. This leads to defining (1/NC_*) = (1/NC)* nCx* B(/alpha, /beta)

- both alpha_* and beta_* are defined as functions of x, n, alpha_, and beta_

- the last step of the proof leverages that the posterior is a valid PDF, and therefore when integrated over the support of “p” must, by definition, be equal to 1. At this point in the proof the functional form of the posterior looks nearly identical to a Beta distributed random variable with parameters alpha_* and beta_*, expect the multiplicative term (1/B(alpha_*, beta_*)) is currently missing. We instead have in its place multiplicative normalizing constant (1/NC_*). But, given the constraint that this posterior integrated over it’s support must be equal to 1, by deduction it mathematically must be the case that NC_* = B(alpha_*, beta_*). This leaves us with a valid PDF of the posterior that is Beta distributed.

The Poisson proof has a very similar set of justifications.

Given the proofs are fully specified, I’m going to opt to leave them as is for now.

To wrap-up, I was concentrating this piece more on the numerical methods than analytical methods. Regardless, in general I do think learning parametric Bayesian Inference is quite challenging without a very strong foundation in probability theory and comfortability with widely leveraged parametric distributions (Normal, Binomial, Poisson, Negative Binomial, Gamma, Beta, Exponential Dispersion Family, etc). Therefore, I’m expecting the audience for this piece to already have a certain amount of background knowledge.

Thank you again for the feedback.

- Andrew

--

--

Andrew Rothman

Principal Data/ML Scientist @ The Cambridge Group | Harvard trained Statistician and Machine Learning Scientist | Expert in Statistical ML & Causal Inference