by Abhishek Gupta
Ever got excited about a piece of new machine learning research that you saw come out on arXiv or your favorite research lab’s blog hoping it will finally solve that last bit of optimization you need in your own work that will make you the ML superstar of your team? But after spending days trying to get the same results, you end up failing despite having tried everything in the paper including looking through their Github page, contacting the authors, etc. If this sounds familiar, you’re not alone! Everyday researchers and practitioners alike spend countless hours trying to replicate results from new ML research coming out but inevitably lose precious time and compute resources failing to achieve the required results. We’re facing a massive reproducibility crisis in the field of machine learning. There has been a rise in the ease of use of tools to develop machine learning (ML) based solutions, e.g. AutoML and Keras are two of many. At the same time, there are a lot more public datasets available, aimed at socially oriented research. With more people entering the field coming from diverse trainings, it is not necessary that all adhere to rigorous standards of scientific research. This is evidenced by recent calls by the technical research community at conferences like NeurIPS. We see that a lack of reproducibility in ML research will be a key hindrance in meaningful use of R&D resources. There is currently a lack of a comprehensive framework for doing reproducible machine learning. We, as Pythonistas, can do something to help this! Through my own work in this domain and the work of the intern cohort that worked on the Reproducibility in Machine Learning project this summer at the Montreal AI Ethics Institute, let’s talk through some of the social and technical aspects of this problem and how you can take these principles from the talk today and become the superhero of your ML team elevating the quality of the work coming from your team and helping others build on top of your work. We’ll walk through the following principles and apply them to a case study to understand how this simple yet effective mechanism can help address a ton of the issues that we face in the field. Our framework combines existing tooling with policy applied to solution design, data collection, model development, data and model legacy, and deployment performance tracking.
About the Author
Talk Details
Date: Sunday Nov. 17
Location: Round Room (PyData Track)
Begin time: 16:05
Duration: 25 minutes