Why that great machine learning research can’t be reproduced and how to fix it

by Abhishek Gupta

Machine Learning & Data Science

Ever got excited about a piece of new machine learning research that you saw come out on arXiv or your favorite research lab’s blog hoping it will finally solve that last bit of optimization you need in your own work that will make you the ML superstar of your team? But after spending days trying to get the same results, you end up failing despite having tried everything in the paper including looking through their Github page, contacting the authors, etc. If this sounds familiar, you’re not alone! Everyday researchers and practitioners alike spend countless hours trying to replicate results from new ML research coming out but inevitably lose precious time and compute resources failing to achieve the required results. We’re facing a massive reproducibility crisis in the field of machine learning. There has been a rise in the ease of use of tools to develop machine learning (ML) based solutions, e.g. AutoML and Keras are two of many. At the same time, there are a lot more public datasets available, aimed at socially oriented research. With more people entering the field coming from diverse trainings, it is not necessary that all adhere to rigorous standards of scientific research. This is evidenced by recent calls by the technical research community at conferences like NeurIPS. We see that a lack of reproducibility in ML research will be a key hindrance in meaningful use of R&D resources. There is currently a lack of a comprehensive framework for doing reproducible machine learning. We, as Pythonistas, can do something to help this! Through my own work in this domain and the work of the intern cohort that worked on the Reproducibility in Machine Learning project this summer at the Montreal AI Ethics Institute, let’s talk through some of the social and technical aspects of this problem and how you can take these principles from the talk today and become the superhero of your ML team elevating the quality of the work coming from your team and helping others build on top of your work. We’ll walk through the following principles and apply them to a case study to understand how this simple yet effective mechanism can help address a ton of the issues that we face in the field. Our framework combines existing tooling with policy applied to solution design, data collection, model development, data and model legacy, and deployment performance tracking.


About the Author

Abhishek Gupta is the founder of Montreal AI Ethics Institute (https://montrealethics.ai) and a Machine Learning Engineer at Microsoft where he serves on the CSE AI Ethics Review Board. His research focuses on applied technical and policy methods to address ethical, safety and inclusivity concerns in using AI in different domains. He has built the largest community driven, public consultation group on AI Ethics in the world that has made significant contributions to the Montreal Declaration for Responsible AI, the G7 AI Summit, AHRC and WEF Responsible Innovation framework and the European Commission Trustworthy AI Guidelines. His work on public competence building in AI Ethics has been recognized by governments from North America, Europe, Asia and Oceania.


Talk Details

Date: Sunday Nov. 17

Location: Round Room (PyData Track)

Begin time: 16:05

Duration: 25 minutes