class: center, middle, inverse, title-slide # The first step is admitting you have a Problem ## Kicking the Spreadsheet Habit ### Michael Jones ### 6 May 2021 --- class: blackout, center, middle <!-- TODO come back and get that gif or picture --> <img src="data:image/png;base64,#pictures/opinion.gif" width="1000" height="470" /> --- class: center, middle # Times are changing ??? Clients and stakeholders' needs are changing. Questions need to be answered quicker and more accurately for cheaper so we have to be more responsive Many may already be using data warehouses behind the scenes, and so we need to be comfortable with the language and technology in order to meet them where they are, or perhaps even help them get to where they want to be. You may get a huge number of files to process and you won't be able to do it no matter how many actuarial students you throw at it. --- class: center, middle # Are we becoming<br> **replaceable**? ??? The genie is out of the bottle. Aside from reserved roles, there's nothing technical that we do that couldn't be done by a similarly skilled data scientist: - cheaper (since they aren't operating in a potentially inflated salary environment driven by false scarcity?) - better? since they haven't got a huge mountain of legacy thought --- class: center, middle, inverse # **Part 1<br><br>Spreadsheets:<br>A biased history** --- class: center, middle ![](data:image/png;base64,#pictures/Visicalc.png) <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[Public domain image (GNU General Public Licence)]] </div> ??? Visicalc 1979 on the Apple II --- class: center, middle ![](data:image/png;base64,#pictures/lotus123.png)<!-- --> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[Copyright image but fair use]] </div> ??? Lotus 1-2-3 in 1983 on MS-DOS --- class: center, middle ![](data:image/png;base64,#pictures/excel_95.png)<!-- --> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[Image copyright Microsoft, used with permission of Microsoft, or at least that's What Wikipedia told me to say]] </div> ??? Microsoft Excel 1995 --- class: center, middle <img src="data:image/png;base64,#pictures/libreoffice.png" width="550" height="462" /> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[Screenshot under Mozillla Public Licence]] </div> ??? LibreOffice --- class: center, middle # People seem to like<br>spreadsheets --- # There are some **good** things - Rough work - Quick view of your data - Toy models - Data interchange Also I've seen people make art in spreadsheets and perform standup comedy about spreadsheets. --- class: center, middle # But... --- class: center, middle ![](data:image/png;base64,#pictures/genes.png)<!-- --> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates]] </div> ??? News report from The Verge We have the most successful, most popular spreadsheet program breaking peoples' data. --- class: center, middle # The tools you use dicate<br>what you **can** do<br><br>but they also dictate<br> what you **think is possible** ??? Spreadsheets make you squash problems into a format that can be solved by a spreadsheet or make you think your problem isn't solvable because it can't be done in a spreadsheet easily. --- class: center, middle, inverse # **Part 2:<br><br>Reaching their limits** --- class: center, middle ![](data:image/png;base64,#pictures/covid.png)<!-- --> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[https://www.bbc.co.uk/news/technology-54423988]] </div> --- class: center, middle # Excel is not a database --- - *2003* TransAlta lost **$24 million** due to Excel copy-and-paste error - *2005* Kodak suffered an **$11 million** severance error due to Excel typo --- - *2010* MI5 **bugged the wrong phones** due to a spreadsheet formatting error - *2011* AstraZeneca accidentally **released confidential information** due to an Excel templating error <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[https://www.teampay.co/insights/biggest-excel-mistakes-of-all-time/]] </div> --- class: center, middle # When you don't control<br>your primary data system,<br>you are at risk --- # Present and future challenges - Life risk modeling - Solvency II - IFRS 17 - Complex Models --- class: center, middle # Spreadsheets have <br>**inherent problems**<br>that limit their<br>ongoing usefulness --- # Problems - The cell as the fundamental unit - No data types & lack of abstraction - Slowness - Conflation of model, implementation and instance --- # Problems .small[(continuted)] - VBA - Fragmentation of model instances - Wider Office ecosystem - It crashes .small[.gray[a lot]] --- # What would a solution look like? .pull-left[ - quicker results - less risk - more value add work ] .pull-right[ - more fulfilling work - more efficient - more reactive ] --- class: inverse, middle, center # **Part 3:<br><br>Another way** --- # SQL - A relational model of data - A centralised source - Data constraints - Interoperability --- # Programming languages - R - Python - Julia ### Separate the model, the implementation and the use --- class: center, middle # Code<br>reuse --- class: center, middle # Abstractions and<br>Data Structures --- class: center, middle # Speed - Aviva, Julia and Solvency II - Me, a complex client and an adventu**R**e --- class: center, middle # New or better<br>methods --- class: center, middle # Version control --- class: center, middle # 3<sup>rd</sup> party<br>libraries --- class: center, middle # Automated<br>testing --- class: center, middle # Computable<br>Documents --- class: center, middle # Better<br>Plotting --- count: false .panel1-flipgraph-auto[ ```r *diamonds ``` ] .panel2-flipgraph-auto[ ``` ## # A tibble: 53,940 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 ## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 ## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 ## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 ## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 ## # … with 53,930 more rows ``` ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% * ggplot(aes(x = carat, * y = price, * colour = clarity)) ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% ggplot(aes(x = carat, y = price, colour = clarity)) + * geom_point() ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_03_output-1.png)<!-- --> ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% ggplot(aes(x = carat, y = price, colour = clarity)) + geom_point() + * facet_wrap(~cut) ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_04_output-1.png)<!-- --> ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% ggplot(aes(x = carat, y = price, colour = clarity)) + geom_point() + facet_wrap(~cut) + * theme_bw() ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_05_output-1.png)<!-- --> ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% ggplot(aes(x = carat, y = price, colour = clarity)) + geom_point() + facet_wrap(~cut) + theme_bw() + * theme(legend.position = "bottom") ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_06_output-1.png)<!-- --> ] --- count: false .panel1-flipgraph-auto[ ```r diamonds %>% ggplot(aes(x = carat, y = price, colour = clarity)) + geom_point() + facet_wrap(~cut) + theme_bw() + theme(legend.position = "bottom") + * labs(x = "Carat", * y = "Price (USD)", * title = "Some facts about Diamonds") ``` ] .panel2-flipgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/flipgraph_auto_07_output-1.png)<!-- --> ] <style> .panel1-flipgraph-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-flipgraph-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-flipgraph-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false .panel1-histogramgraph-auto[ ```r *ggplot(midwest, aes(x = poptotal)) ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_01_output-1.png)<!-- --> ] --- count: false .panel1-histogramgraph-auto[ ```r ggplot(midwest, aes(x = poptotal)) + * geom_histogram(bins = 30) ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-histogramgraph-auto[ ```r ggplot(midwest, aes(x = poptotal)) + geom_histogram(bins = 30) + * facet_wrap(~state) ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_03_output-1.png)<!-- --> ] --- count: false .panel1-histogramgraph-auto[ ```r ggplot(midwest, aes(x = poptotal)) + geom_histogram(bins = 30) + facet_wrap(~state) + * scale_x_log10(labels = scales::comma) ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_04_output-1.png)<!-- --> ] --- count: false .panel1-histogramgraph-auto[ ```r ggplot(midwest, aes(x = poptotal)) + geom_histogram(bins = 30) + facet_wrap(~state) + scale_x_log10(labels = scales::comma) + * theme_bw() ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_05_output-1.png)<!-- --> ] --- count: false .panel1-histogramgraph-auto[ ```r ggplot(midwest, aes(x = poptotal)) + geom_histogram(bins = 30) + facet_wrap(~state) + scale_x_log10(labels = scales::comma) + theme_bw() + * labs(x = "Total Population", * y = "Count", * title = "Distribution of Total county population in Midwestern States") ``` ] .panel2-histogramgraph-auto[ ![](data:image/png;base64,#spreadsheets_files/figure-html/histogramgraph_auto_06_output-1.png)<!-- --> ] <style> .panel1-histogramgraph-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-histogramgraph-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-histogramgraph-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: center, middle # Control & reproducibility --- # TAS 200 (Insurance): > ## Implementations and realisations of models shall be reproducible. ## But what does that **mean**? --- class: center, middle # All Analysis is a **DAG** ![](data:image/png;base64,#pictures/dag_diagram.svg)<!-- --> --- class: center, middle, inverse # Interlude:<br>A live demo --- class: center, middle, inverse # **Part 4:<br><br>How to learn** --- class: center, middle # Pick a Language --- class: center, middle # Probably don't bother<br>with the IFoA Certificate<br>in Data Science --- class: center, middle # Start<br>Small --- ![](data:image/png;base64,#pictures/is_it_worth_the_time.png)<!-- --> <div style="position:absolute; bottom:0; height:40px;"> .gray[.small[www.xkcd.com/1205]] </div> --- # The best things in life are free - [R for Data Science](https://r4ds.had.co.nz/), Hadley Wickham - Rstudio tutorials - Package documentation - Conference recordings --- # The best things in life are free - People at work - Online communities, e.g. Twitter - Data Science screencasts - Other books and blogs --- # If someone else is paying - Paid training - Dead-tree books - Physical conferences --- class: center, middle # Come to terms<br>with the cost --- class: center, inverse, middle # **Epilogue** --- class: center, middle # If you must use spreadsheets,<br>use them well - <span style="font-style: italic;">Data Organization in Spreadsheets</span>, Broman & Woo, .small[The American Statistician] - <span style="font-style: italic;">Tidy Data</span>, Hadley Wickham, .small[The Journal of Statistical Software] --- # Final Thoughts - Actuaries could probably be replaced - Code-based work flows offer massive advantages of speed, accuracy and quality - It's never too late to start learning