stats285.github.io

Data Engineering and Data Science with Databricks and Apache Spark

Apache Spark is a fast and general engine for large-scale data processing. Since it was created in 2009 at UC Berkeley, Spark has become the most popular open source data analytics framework today: over 1000 developers from over 200 companies have contributed to Spark and its committers represent 19 organizations. In 2013 the creators of Apache Spark spun out of UC Berkeley to create Databricks, a company that provides a cloud hosted unified analytics platform for Data Engineers, Data Scientists and Business specialists such as analysts, managers and executives. Databricks also fosters the Apache Spark community, donating millions of dollars each year via open source code contributions and community stewardship efforts such as meetup and conference planning. Andy Konwinski is a cofounder of Databricks and a member of the team that created Spark at UC Berkeley. He graduated from Berkeley in 2012 with his PhD in computer systems. In this talk Andy will introduce Spark and Databricks, as well as demonstrate their use for analyzing Data.

Andy Konwinski

Andy Konwinski is a founder of Databricks. Before that, he was a PhD student and then Postdoc in Computer Science in the AMPLab at the University of California, Berkeley. His research is on large scale distributed computing systems like Hadoop, Spark, and Mesos. He is one of the creators of Apache Mesos, a cluster scheduling system that was adopted by Twitter, Apple and other companies as their private cloud platform.