The developing age of data science has brought along a plethora of new tools for processing large amounts of data using complicated distributed systems. However, for the statistician interested in principled methods of statistical inference, i.e., generalization, interpretability and causality, much of the recent technology seems to fall short of the needed tools for conducting such data analysis in a scientific and reproducible manner. On the other hand, companies eager for data science/AI "insight" are expecting newly minted statisticians and data scientists to rapidly develop, deploy, and monitor statistical models in production in complex domains. This talk will present a few tools designed to assist the statistican in the process of training, deloying, validating, and monitoring statistical machine learning models in practice. While a few new tools will be presented, the focus is not on learning the intricacies of the technologies, but a description of best practices of using these tools to increase your productivity as a data analyst and statistician.
Date: 10/23/2018
Additional ReadingsThere are no additional readings for this class. Slides and case studies with code repositories will be made available for users to try on their own, with free trial subscriptions on Azure.
Ali Zaidi is a data scientist in Microsoft’s AI and Research Group, where he works in the language modeling team and develops tools to make distributed computing and machine learning in the cloud easier, more efficient, and more enjoyable for data scientists and developers alike. Before that, Ali was a research associate at National Economic Research Associates (NERA), providing statistical expertise on financial risk, securities valuation, and asset pricing. He studied Statistics at the University of Toronto and Computer Science at Stanford University.