Foundations of ML Project
2 December 2017
What if I told you that there is an app on the market that can tell the difference between a Vadapav and a Not Vadapav?
While binary classification of images using CNNs is not a particularly challening task in the present decade, the subject of our exploration was the effect of negative labelled points in the dataset. To put it simply, a burger and a vadapav look similar. We seek to answer questions like; what percentage of my dataset should be burgers such that my network learns the differentiation? What ensures that it would learn the correct features and be able to mark the difference even without training it specifically against burgers? This is important towards building AI that can identify objects unambiguously. The adjacent matrix presents a brief idea of our findings. The report is linked below and you can find the complete code here
I want to take a moment to highlight that we collected the data on our own and manually filtered it. I have a firm belief that data collection and tagging, its preprocessing included is an underrated challenge in data science. In my observations, sound data, specially tagged and processed for the task at hand is as crucial to the success of the project as is the algorithm or model that you use.
Our complete presentation can be found here.
Thanks to Anuj Shetty, Tejas Srinivasan and Alok Bishoyi for working with me on this project, and needless to say, this was inspired by SV. Cheerio Jian Yang!