class: center, middle, inverse, title-slide # Introduction to Computational Bayesian Methods for Actuaries ### Michael Jones ### 10 October 2018 --- class: center, inverse, middle .emph_light[Actuaries]<br> .emph_dark[should be]<br> .emph_light[scientific]<br> --- class: center, inverse, middle .emph_light[Actuaries]<br> .emph_dark[should be]<br> .emph_light[Bayesian]<br> --- class: center, middle .emph_blue[What?] --- class: center, middle .emph_blue[Why?] --- class: center, middle .emph_blue[How?] --- class: middle, inverse .emph_dark[Bayesian]<br> .emph_dark[Recap] --- class: middle, center <span style="font-size:600%"> `$$P(\theta | D) = \frac{P(D|\theta)P(\theta)}{\int d\theta' P(D|\theta')P(\theta')}$$` </span> --- class: middle, center <span style="font-size:600%"> `$$\color{lightgrey}{P(\theta | D) = \frac{P(D|\theta)\color{black}{P(\theta)}}{\int d\theta' P(D|\theta')P(\theta')}}$$` </span> --- class: middle, center <span style="font-size:600%"> `$$\color{lightgrey}{P(\theta | D) = \frac{\color{black}{P(D|\theta)}{P(\theta)}}{\int d\theta' P(D|\theta')P(\theta')}}$$` </span> --- class: middle, center <span style="font-size:600%"> `$$\color{lightgrey}{\color{black}{P(\theta | D)} = \frac{P(D|\theta){P(\theta)}}{\int d\theta' P(D|\theta')P(\theta')}}$$` </span> --- class: middle, center <span style="font-size:600%"> `$$\color{lightgrey}{P(\theta | D) = \frac{P(D|\theta){P(\theta)}}{\color{black}{\int d\theta' P(D|\theta')P(\theta')}}}$$` </span> --- class: middle, inverse .emph_dark[Why be]<br> .emph_dark[Bayesian] --- class: middle .emph_blue[It works] ??? It produces results that are testable and given the right models generally are as good (sometimes better) than frequentist analysis. --- class: middle .emph_blue[It's coherent] ??? Most Bayesian analysis stems from a few foundational ideas compared to frequentist statistics which has a much more varied set of fundamentals. Posterior information is rich and useful --- class: middle .emph_blue[It's natural] ??? Humans naturally think in Bayesian terms which are then 'unlearned' when studying frequentist analysis. --- class: middle .emph_blue[It's self]<br> .emph_blue[documenting] ??? Through the priors, assumptions are formally absorbed into the analysis and cannot be hidden as in classical statistics. --- class: middle .emph_blue[You get the]<br> .emph_blue[full Posterior] --- class: middle .emph_blue[It's modular] ??? Multilevel models, choice of priors, e.g. simple to make robust to outliers --- class: middle, inverse .emph_dark[Why ] .emph_light[not ] .emph_dark[be]<br> .emph_dark[Bayesian] --- class: middle .emph_blue[What]<br> .emph_blue[prior?] --- class: middle .emph_blue[It's]<br> .emph_blue[awkward] ??? A whole posterior distribution can be difficult to communicate/process --- class: middle .emph_blue[It's]<br> .emph_blue[hard] ??? ok if conjugate pairs, but otherwise, it's difficult to do without MCMC or other computationally intensive programs. --- class: middle .emph_blue[It's]<br> .emph_blue[weird] ??? It's not what people are used to, and the themes are unfamiliar, though it's gaining traction in social sciences and pharmacology (and election prediction) --- class: middle, inverse .emph_light[Frequentism]<br> .emph_dark[vs]<br> .emph_light[Bayesianism] --- class: middle .emph_blue[Probability] --- class: middle .emph_blue[Data] --- class: middle .emph_blue[Parameters] --- class: middle .emph_blue[Ranges] ---
0.100
0.125
0.150
0.175
0.200
100 draws from a Binomial with mean 15
Frequentist Confidence Intervals
---
95% HDI: 0.05 to 0.57
0
200
400
600
800
0.00
0.25
0.50
0.75
y
count
Bayesian Highest Density Intervals
--- class: middle .emph_blue[p-values &]<br> .emph_blue[significance]<br> .emph_blue[testing] --- class: middle, inverse .emph_dark[How to be]<br> .emph_dark[Bayesian] --- class: middle .emph_blue[The Bayesian]<br> .emph_blue[Workflow] --- class: middle - Identify your data - Define a descriptive model - Specify a prior - Compute the Posterior - Interpret the Posterior - Check the model is reasonable --- class: middle .emph_dark[Analytically:]<br> .emph_blue[Estimating the]<br> .emph_blue[Bias of a coin] --- ##Parameter <span style="font-size:400%"> `$$\theta = P(y_i = \text{heads})$$` </span> --- ## Likelihood ### Bernoulli: <span style="font-size:400%"> $$ p(y|\theta) = \theta^y(1 - \theta)^{(1-y)}$$ </span> --- ## Prior ### Beta <span style="font-size:400%"> `$$\begin{align} P(\theta|a,b) &= \text{beta}(a,b)\\\\ &=\frac{\theta^{(a-1)}(1-\theta)^{(b-1)}}{B(a,b)} \end{align}$$` </span> ---
20, 0.1
20, 1
20, 2
20, 3
20, 4
20, 20
4, 0.1
4, 1
4, 2
4, 3
4, 4
4, 20
3, 0.1
3, 1
3, 2
3, 3
3, 4
3, 20
2, 0.1
2, 1
2, 2
2, 3
2, 4
2, 20
1, 0.1
1, 1
1, 2
1, 3
1, 4
1, 20
0.1, 0.1
0.1, 1
0.1, 2
0.1, 3
0.1, 4
0.1, 20
θ
y
--- ## Posterior ### Another Beta <span style="font-size:300%"> `$$\begin{align} P(\theta|n_{heads},n_{tails},a,b) &= \text{beta}(a+n_{heads},b+n_{tails})\\\\ &=\frac{\theta^{(a+n_{heads}-1)}(1-\theta)^{(b+n_{tails}-1)}}{B(a+n_{heads},b+n_{tails})} \end{align}$$` </span> --- # Live demo time: https://mjones.shinyapps.io/coin/ .pull-left[ ## If it works click [here](#47) ] .pull-right[ ## If it doesn't work click [here](#43) ] --- # No idea about the coin Flat `\(\text{beta}(1,1)\)` distribution:
0.50
0.75
1.00
1.25
1.50
0.00
0.25
0.50
0.75
1.00
θ
Prior
--- # Collect some data 7 Heads, 5 tails:
0e+00
1e-04
2e-04
3e-04
0.00
0.25
0.50
0.75
1.00
theta
likelihood
--- # Posterior Another Beta: `\(Beta(6,8)\)`
0e+00
2e-05
4e-05
6e-05
0.00
0.25
0.50
0.75
1.00
theta
posterior
--- # Strong Prior
prior
likelihood
posterior
0.00
0.25
0.50
0.75
1.00
0
2
4
6
8
0e+00
1e-04
2e-04
3e-04
0
2
4
theta
y
--- class: middle .emph_blue[Problems] ??? - Conjugate priors - calculating the normalisation integral --- class: middle .emph_blue[What prior]<br> .emph_blue[to use?] --- class: middle .emph_blue[Normalising] --- class: middle .emph_dark[Computationally:]<br> .emph_blue[First Steps] ---
0
1
2
0.00
0.25
0.50
0.75
1.00
x
y
---
0
1
2
0.00
0.25
0.50
0.75
1.00
x
y
---
0
1
2
0.00
0.25
0.50
0.75
1.00
x
y
---
0
1
2
0.00
0.25
0.50
0.75
1.00
x
y
---
0
10
20
0.00
0.25
0.50
0.75
1.00
x
y
??? - If the posterior is narrower, you waste a lot of time/computing power calculating the posterior in places which do not matter - Not only that, but as your dimensions increase (i.e. number of parameters), the number of points increases exponentially. --- class: middle .emph_dark[Computationally:]<br> .emph_blue[MCMC] --- class:middle .emph_blue[Markov Chain]<br> .emph_blue[Monte Carlo] --- class: middle .emph_dark[Markov Chain]<br> .emph_blue[Monte Carlo] --- class: middle .emph_blue[Markov Chain]<br> .emph_dark[Monte Carlo] --- class: middle .emph_blue[Metropolis] --- # Metropolis Algorithm ![](metropolis_pics/1_current_place.svg)<!-- --> --- # Metropolis Algorithm ![](metropolis_pics/2_choices.svg)<!-- --> --- # Metropolis Algorithm ![](metropolis_pics/3_chosen.svg)<!-- --> --- # Metropolis Algorithm ![](metropolis_pics/4_definitely_move.svg)<!-- --> --- # Metropolis Algorithm ![](metropolis_pics/5_maybe_move.svg)<!-- --> --- # Metropolis Algorithm Example -- - 50 Islands -- - Populations in ratio 1:2:3:...:50 -- - Want to visit each in accordance with its population ---
beginning
end
0
100
200
300
400
500
999500
999600
999700
999800
999900
1000000
0
10
20
30
40
50
Step Number
Island Number
---
1000
10000
1e+06
100
200
300
1
10
20
0
10
20
30
40
50
0
10
20
30
40
50
0
10
20
30
40
50
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.03
0.06
0.09
0.00
0.01
0.02
0.03
0.04
0.0
0.1
0.2
0.3
0.4
0.00
0.05
0.10
0.15
0.00
0.01
0.02
0.03
0.04
0.00
0.25
0.50
0.75
1.00
0.00
0.05
0.10
0.15
0.00
0.01
0.02
0.03
0.04
Step Number
Visiting proportion
--- class: middle .emph_blue[Don't make]<br> .emph_blue[your own] --- class: middle .emph_dark[Computationally:]<br> .emph_blue[In practice] --- # Two Coins ![](two_coins.svg)<!-- --> --- class: center, middle ``` ## Markov Chain Monte Carlo (MCMC) output: ## Start = 501 ## End = 531 ## Thinning interval = 1 ## theta[1] theta[2] ## [1,] 0.5331603 0.7960773 ## [2,] 0.7224562 0.4499415 ## [3,] 0.7102999 0.2461300 ## [4,] 0.5370162 0.5314258 ## [5,] 0.5014244 0.2403774 ## [6,] 0.8557981 0.5679354 ## [7,] 0.4511761 0.5866845 ## [8,] 0.5826508 0.4369579 ## [9,] 0.6994008 0.3722830 ## [10,] 0.3815623 0.3713527 ## [11,] 0.8098616 0.3163093 ## [12,] 0.5554198 0.4975122 ## [13,] 0.4028894 0.3134066 ## [14,] 0.3785868 0.3637059 ## [15,] 0.4323854 0.5251539 ## [16,] 0.7882131 0.4448705 ## [17,] 0.6833106 0.4087616 ## [18,] 0.6819669 0.4665725 ## [19,] 0.7579478 0.3968828 ## [20,] 0.5021557 0.4240982 ## [21,] 0.7010325 0.4487093 ## [22,] 0.5308746 0.4451829 ## [23,] 0.6973850 0.4182190 ## [24,] 0.5973255 0.5136492 ## [25,] 0.8815459 0.3668754 ## [26,] 0.7933463 0.4132620 ## [27,] 0.6295374 0.1937704 ## [28,] 0.7943369 0.5278090 ## [29,] 0.5191301 0.3335732 ## [30,] 0.5250645 0.4350766 ## [31,] 0.4926411 0.3410629 ``` ---
theta[1]
0.2
0.4
0.6
0.8
1.0
m
o
d
e
=
0.657
95
% HDI
0.401
0.872
+
theta[1]-theta[2]
-0.4
-0.2
0.0
0.2
0.4
0.6
m
o
d
e
=
0.239
11.8
% <
0
<
88.2
%
95
% HDI
-0.134
0.551
+
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2
0.4
0.6
0.8
theta[1]
theta[2]
theta[2]
0.2
0.4
0.6
0.8
m
o
d
e
=
0.424
95
% HDI
0.19
0.682
+
--- class: middle, inverse .emph_dark[Bayesianism]<br> .emph_dark[applied to]<br> .emph_dark[Insurance] --- class: middle, center <table> <thead> <tr> <th style="text-align:left;"> AY </th> <th style="text-align:right;"> premium </th> <th style="text-align:right;"> 6 </th> <th style="text-align:right;"> 18 </th> <th style="text-align:right;"> 30 </th> <th style="text-align:right;"> 42 </th> <th style="text-align:right;"> 54 </th> <th style="text-align:right;"> 66 </th> <th style="text-align:right;"> 78 </th> <th style="text-align:right;"> 90 </th> <th style="text-align:right;"> 102 </th> <th style="text-align:right;"> 114 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1991 </td> <td style="text-align:right;"> 10,000 </td> <td style="text-align:right;"> 358 </td> <td style="text-align:right;"> 1,125 </td> <td style="text-align:right;"> 1,735 </td> <td style="text-align:right;"> 2,183 </td> <td style="text-align:right;"> 2,746 </td> <td style="text-align:right;"> 3,320 </td> <td style="text-align:right;"> 3,466 </td> <td style="text-align:right;"> 3,606 </td> <td style="text-align:right;"> 3,834 </td> <td style="text-align:right;"> 3,901 </td> </tr> <tr> <td style="text-align:left;"> 1992 </td> <td style="text-align:right;"> 10,400 </td> <td style="text-align:right;"> 352 </td> <td style="text-align:right;"> 1,236 </td> <td style="text-align:right;"> 2,170 </td> <td style="text-align:right;"> 3,353 </td> <td style="text-align:right;"> 3,799 </td> <td style="text-align:right;"> 4,120 </td> <td style="text-align:right;"> 4,648 </td> <td style="text-align:right;"> 4,914 </td> <td style="text-align:right;"> 5,339 </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1993 </td> <td style="text-align:right;"> 10,800 </td> <td style="text-align:right;"> 291 </td> <td style="text-align:right;"> 1,292 </td> <td style="text-align:right;"> 2,219 </td> <td style="text-align:right;"> 3,235 </td> <td style="text-align:right;"> 3,986 </td> <td style="text-align:right;"> 4,133 </td> <td style="text-align:right;"> 4,629 </td> <td style="text-align:right;"> 4,909 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1994 </td> <td style="text-align:right;"> 11,200 </td> <td style="text-align:right;"> 311 </td> <td style="text-align:right;"> 1,419 </td> <td style="text-align:right;"> 2,195 </td> <td style="text-align:right;"> 3,757 </td> <td style="text-align:right;"> 4,030 </td> <td style="text-align:right;"> 4,382 </td> <td style="text-align:right;"> 4,588 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1995 </td> <td style="text-align:right;"> 11,600 </td> <td style="text-align:right;"> 443 </td> <td style="text-align:right;"> 1,136 </td> <td style="text-align:right;"> 2,128 </td> <td style="text-align:right;"> 2,898 </td> <td style="text-align:right;"> 3,403 </td> <td style="text-align:right;"> 3,873 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1996 </td> <td style="text-align:right;"> 12,000 </td> <td style="text-align:right;"> 396 </td> <td style="text-align:right;"> 1,333 </td> <td style="text-align:right;"> 2,181 </td> <td style="text-align:right;"> 2,986 </td> <td style="text-align:right;"> 3,692 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1997 </td> <td style="text-align:right;"> 12,400 </td> <td style="text-align:right;"> 441 </td> <td style="text-align:right;"> 1,288 </td> <td style="text-align:right;"> 2,420 </td> <td style="text-align:right;"> 3,483 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1998 </td> <td style="text-align:right;"> 12,800 </td> <td style="text-align:right;"> 359 </td> <td style="text-align:right;"> 1,421 </td> <td style="text-align:right;"> 2,864 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 1999 </td> <td style="text-align:right;"> 13,200 </td> <td style="text-align:right;"> 377 </td> <td style="text-align:right;"> 1,363 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> 2000 </td> <td style="text-align:right;"> 13,600 </td> <td style="text-align:right;"> 344 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> ---
1000
2000
3000
4000
5000
30
60
90
Development Period
Amount
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
--- # The model See [here](https://magesblog.com/post/2015-11-10-hierarchical-loss-reserving-with-stan/) `$$\begin{align} CL_{AY,dev} &\sim \mathcal{N}(\mu_{AY, dev}, \sigma^2_{dev})\\ \mu_{AY,dev} &= ULT_{AY} \cdot G(dev|\omega, \theta)\\ \sigma_{dev}&=\sigma \sqrt{\mu_{dev}}\\ ULT_{AY} &\sim \mathcal{N}(\mu_{ult}, \sigma^2_{ult})\\ G(dev|\omega,\theta) &=1-\exp\left(-\left(\frac{dev}{\theta}\right)^{\omega}\right) \end{align}$$` --- # Stan model output ``` ## Inference for Stan model: MultiLevelGrowthCurve. ## 4 chains, each with iter=7000; warmup=2000; thin=2; ## post-warmup draws per chain=2500, total post-warmup draws=10000. ## ## mean se_mean sd 50% 75% 97.5% n_eff Rhat ## mu_ult 5355.32 5.74 275.18 5344.66 5532.70 5914.99 2300 1 ## omega 1.30 0.00 0.03 1.30 1.32 1.36 2933 1 ## theta 47.50 0.06 2.51 47.32 49.02 52.91 1712 1 ## sigma_ult 595.88 2.44 170.08 567.25 682.87 1014.74 4868 1 ## sigma 3.08 0.01 0.33 3.05 3.28 3.82 4228 1 ## ## Samples were drawn using NUTS(diag_e) at Tue Oct 9 18:44:13 2018. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ``` ---
ult.9.
ult.10.
ult.5.
ult.6.
ult.7.
ult.8.
ult.1.
ult.2.
ult.3.
ult.4.
4000
5000
6000
7000
4000
5000
6000
7000
8000
4500
5000
5500
6000
4500
5000
5500
6000
6500
5000
5500
6000
6500
7000
5000
6000
7000
8000
3900
4200
4500
4800
4800
5200
5600
6000
6400
5000
5500
6000
6500
5000
5500
6000
6500
0.0000
0.0005
0.0010
0.0015
0e+00
3e-04
6e-04
9e-04
0.0000
0.0005
0.0010
0.0015
0.0020
0e+00
5e-04
1e-03
0.0000
0.0005
0.0010
0.0015
0.0020
0.0000
0.0005
0.0010
0.0015
0e+00
2e-04
4e-04
6e-04
0.000
0.001
0.002
0.003
0.0000
0.0005
0.0010
0.0015
0.00000
0.00025
0.00050
0.00075
0.00100
value
density
---
1999
2000
1995
1996
1997
1998
1991
1992
1993
1994
0
50
100
0
50
100
0
50
100
0
50
100
0
2000
4000
6000
0
2000
4000
6000
0
2000
4000
6000
dev
cum
--- # Shinystan Available [here](https://mjones.shinyapps.io/staninsurance/) --- class: middle inverse .emph_dark[Further]<br> .emph_dark[Topics] --- class: middle .emph_blue[Multilevel]<br> .emph_blue[Models] --- class: middle .emph_blue[Bayesian]<br> .emph_blue[Model]<br> .emph_blue[Averaging] --- class: middle .emph_blue[Causal]<br> .emph_blue[Networks] --- # References and Further Reading - *Doing Bayesian Data Analysis* by John K. Kruschke - *Computational Actuarial Science* edited by Arthur Charpentier - *Bayesian Data Analysis (3rd edition)* by Andrew Gelman *et al* - Markus Gesmann's Blog (https://magesblog.com/) - Arthur Charpentier's blog (https://freakonometrics.hypotheses.org/) # Interactive - http://mjones.shinyapps.io/coin/ - http://mjones.shinyapps.io/staninsurance/