Mon-S&T-2-4 Rapid Enhancement of NLP systems by Acquisition of Data in Correlated Domains

Tejas Udayakumar(Samsung Research and Development Institute) , Kinnera Saranu(Samsung Research and Development Institute), Mayuresh Sanjay Oak(Samsung Research and Development Institute), Ajit Ashok Saunshikar(Samsung Research and Development Institute), Sandip Shriram Bapat(Samsung Research and Development Institute)
Abstract: In a generation where industries are going through a paradigm shift be cause of the rampant growth of deep learning, structured data plays a crucial role in the automation of various tasks. Textual structured data is one such kind which is extensively used in systems like chat bots and automatic speech recognition. Unfortunately, a majority of these textual data available is unstructured in the form of user reviews and feedback, social media posts etc. Automating the task of categorizing or clustering these data into meaningful domains will reduce the time and effort needed in building sophisticated human-interactive systems. In this paper, we present a web tool that builds a domain specific data based on a search phrase from a database of highly unstructured user utterances. We also show the usage of Elastic search database with custom indexes for full correlated text-search. This tool uses the open sourced Glove model combined with cosine similarity and performs a graph based search to provide semantically and syntactically meaningful corpora. In the end, we discuss its applications with respect to natural language processing.
Student Information

Student Events

Travel Grants