PyData Eindhoven 2022

Greg Michaelson

Greg Michaelson is Cofounder and Chief Product Officer at Zerve, a young, stealthy startup that’s rethinking the data science development experience. Previously, Greg was an early joiner at DataRobot where he played many roles, including Chief Customer Officer. Prior to that, he worked as a data scientist in the financial sector after earning a PhD in Applied Statistics from the University of Alabama. In his spare time, Greg manufactures a line of flavored breakfast cereal toppings called Cerup. He lives in Spring Creek, Nevada with his wife, four children, and two Clumber Spaniels.

The speaker's profile picture

Sessions

12-02
14:05
30min
Significant Roadblocks to Usefulness for Jupyter Notebooks and a Recipe to Fix them
Greg Michaelson

The most popular data science development tools have largely been developed by academics as scratch pads for interactive data exploration. Jupyter notebooks, for instance, were developed 20 years ago at Berkeley (they were called iPython notebooks at the time). Because of their flexibility and interactivity, these tools have become widespread amongst coding data scientists. More recently, GUI-based tools have begun to be popular. They reduce the technical load on the user, but typically lack much needed flexibility and interoperability. Both avenues of innovation are wildly inadequate for modern data science development. GUI-based tools are typically too expensive, too restrictive, and too closed. The development of automated machine learning tools only made this problem worse, with dozens of software startups urging business analysts to start building machine learning solutions, often with questionable results and even more questionable customer retention metrics. On the other hand, notebook-based solutions are typically too error-prone, too loose, and too isolated to be sufficient. The result is intractable challenges around collaboration, communication, and deployment. The most recent entrants into the notebook space have only marginally improved the experience without fixing the underlying flaws. This talk discusses the fundamental flaws with the way these tools have been developed and how they currently function. Advancement in this space will require reworking the architecture and functionality of these tools at some of the most basic levels. These fixes include things like multiprocessing capabilities; real-time collaboration tools; safe, consistent code execution; easy API deployment; and portable communication tools. Future innovation in the data science development experience will have to tackle these problems and more in order to be successful.

Planck