Gaurav Arora
ML Engineer - 2 @ Haptik, ex - Goldman Sachs

CV

I am a Machine Learning Engineer at Haptik working on fundamental Conversational-AI problems using Deep Learning. I have built the Intent Detection System for Haptik’s NLU Engine, which is 25% more accurate than their previous system, owning it from Research to Production.

I have authored research papers which have been accepted at top tier venues like EMNLP NLP-OSS workshop, EMNLP Insights workshop and FIRE.

I am also the creator of open source iNLTK library which provides out of the box support for various NLP tasks, for low resource 13 Indic Languages. The library has 40,000+ downloads, 600+ stars and 100+ forks on GitHub.

Previously, I have worked at Goldman Sachs with the User Experience and Productivity team on Analytics for Desktop Assistant, which is firm-wide used productivity tool.

I have Advanced Certification in AI and ML from International Institute of Information Technology (IIIT-Hyderabad) and Bachelor’s in Computer Science from PEC University of Technology.

I am interested in the applications of Machine Learning to solve problems which will impact millions and keep making my little open source contributions towards it.

Selected Publications

Accepted at EMNLP-2020’S NLP-OSS workshop iNLTK: Natural Language Toolkit for Indic Languages
Gaurav Arora
[Arxiv] [GitHub]
Accepted at EMNLP-2020’S Insights workshop HINT3: Raising the bar for Intent Detection in the Wild
Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi
[Arxiv] [GitHub]
Accepted at Dravidian Codemix HASOC @ FIRE-2020 Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
[Arxiv] [GitHub]

Education

2018 - 2019 Advanced Certification in Artificial Intelligence and Machine Learning
International Institute of Information Technology (IIIT-Hyderabad)
2014 - 2018 B.Tech in Computer Science
PEC University of Technology
2012 - 2014 GMSSS-16, Chandigarh

Industry Experience

July 2019 - Present Haptik, Machine Learning Engineer
June 2018 - July 2019 Goldman Sachs, Technology Analyst
May 2017 - Oct 2017 Goldman Sachs, Technology Analyst Intern
Nov 2016 - Mar 2018 Researchshala, Co-Founder and CTO

Open Source Contributions

Natural Language Toolkit for Indic Languages (iNLTK)

Star Fork Watch

• iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages
• iNLTK provides Data Augmentation, Sentence Similarity, Sentence Encoding, Word Embedding, Tokenization and Text Generation utilities for low resource 13 Indic Languages
• The library is backed by ULMFiT Language Models which I had trained using Fastai and Pytorch libraries, producing SOTA LM perplexity and Classification accuracy in 13 Indic Languages

Appreciation for iNLTK
• By Jeremy Howard, Sebastian Ruder on Twitter
Shared a lot by community on LinkedIn
• iNLTK has 23,000+ Downloads on PyPi
• Data Augmentation post about iNLTK was trending on LinkedIn
• iNLTK was trending on GitHub in May, 2019
• Shared on Reddit, Facebook, Quora etc by the community

Code with AI

Star Fork Watch

Tool which predicts which techniques one should use to solve a competitive programming problem to get correct answer
Demo video on YouTube

Appreciation for Code with AI
• By Jeremy Howard on Twitter
• By community on Codeforces
• The tool has been used by 3000+ users

NLP for Hindi

Star Fork Watch

• Contains SOTA Language models and Classifier for Hindi
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Sanskrit

Star Fork Watch

• Contains SOTA Language models and Classifier for Sanskrit
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Nepali

Star Fork Watch

• Contains SOTA Language models and Classifier for Nepali
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Tamil

Star Fork Watch

• Contains SOTA Language models and Classifier for Tamil
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Bengali

Star Fork Watch

• Contains SOTA Language models and Classifier for Bengali
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Punjabi

Star Fork Watch

• Contains SOTA Language models and Classifier for Punjabi
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Malayalam

Star Fork Watch

• Contains SOTA Language models and Classifier for Malayalam
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Odia

Star Fork Watch

• Contains SOTA Language models and Classifier for Odia
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

NLP for Gujarati

Star Fork Watch

• Contains SOTA Language models and Classifier for Gujarati
• Pretrained Models available for download: TransformerXL, ULMFiT




[ Code ] [ Results ] [ Dataset ] [ Embeddings projection ]

Honors & Awards

Mar 2019 Fast.ai International Fellow for contributions to Fast.ai forums
Dec 2018 Top-17% rank in Human Protein Atlas Image Classification, Kaggle for developing Deep Learning model which classified mixed patterns of proteins in microscope images. The competition had 2172 teams, but I participated individually and hence had 100% contribution in the 366th placed solution
Oct 2017 1st Prize in IEEE-Hackathon for developing chat-bot to help people with emotional decisions in life
Feb 2016 Top-100 among 500,000 students in IT-Olympiad,2016.
Oct 2016 2nd-Prize in IEEE-Hackathon for developing an Augmented reality application to help teachers
Mar 2016 All India Rank-6 in IEEE Programming League, among over 1200 undergraduate students
Mar 2016 2nd Rank, CodeWars,a competitive-programming event hosted by IEEE,PEC on CodeChef
Nov 2016 - Mar 2018 Research Scholarship of 10k per month for Personal Emotional Doctor - Bot
May 2014 All India Rank-885 in JEE-Mains, among 1.4 million candidates
Aug 2014 1st Rank-Opener, PEC for best JEE-Mains rank among 600 students of the session 2014-2018
Dec 2014 1 Lakh Scholarship from CBSE for 96.4% marks in 12th Boards and 10 CGPA in 10th
Dec 2014 Letter of Appreciation from HRD Ministry,Govt. of India for 96.4% in CBSE-12th exams
June 2011 Catch Them Young - was among the top-40 students selected from tricity by INFOSYS for 2-week Programming-Basics training on their campus

Skills and Courses

Mathematics

Discrete Structures for Computer Science, Vector Calculus, Fourier Series and Laplace Transform, Operation Research,Mathematics for Machine Learning: Linear Algebra and Multivariate Calculus (Coursera)

Computer Science

Data Structures and Algorithms, Computer Architecture and Organization, OOP, Microprocessor, DBMS, OperatingSystems, Computer Networks, Theory of Computation, Artificial Intelligence, Computer Graphics, Mobile Computing, Fastai: Part 1 and Part 2, DeepLearning.ai by Andrew Ng, Deep Learning by Prof. Mitesh Khapra, IIT Madras

Programming & Web

C, C++, Python, Javascript, TypeScript, EcmaScript6, AngularJS, ReactJS, Angular4, Webpack, Django with Python

Frameworks

Pytorch, Pandas, Numpy, ScikitLearn, SciPy, Fastai, Transformers library


Last updated on 2020-10-01