Presently, XLScout hosts data of over 130 million patents and 200+ million research publications occupying approximately 8TB of storage. Searching such a massive database using basic, mostly keyword-based search is very cumbersome and time-consuming. Moreover, it requires domain-specific knowledge about patents. With this project, XLScout aims to alleviate the pain of searching this massive system by employing machine learning (ML) and natural language processing(NL)techniques.Text autocompletion and recommendation will provide a better and smarter way for the end-users to search thismassive database. The auto-completion will be based on the corpus of patent documents and research publication to provide suggestions ofrelevant content. The MLmodel will be trained on that massive corpus taking into consideration language semantics. Techniques such as BERT and GPT 2/3 will be considered together with various pre-processing techniques. The size of the document database, document diversity, together with the subjectivity of desired results will make it challenging to evaluate such a system. We will employ both human-centric and automated evaluation approaches.Document categorization will also be based on the semantics, and it will group documents into labeled categories. Unsupervised techniques will be examined in their ability to do this categorization. However, different companies, XLScout clients, have different preferences in respect to this categorization. Therefore, an approach will be developed for the end-users to express their preferences by providing sample categories. Then, the model will learn from those preferences and carry out categorization. The challenge in this categorization is to enable unsupervised categorizationwhile supporting semi-supervision and customization. This categorization will be client-company specific and the modelwill have to learn from a limited number of example classes identified by the end-user.

Industry Partner(s):XLScout Ltd.

Academic Institution:Western University

Academic Researcher: Katarina Grolinger

Focus Areas: AI, Business Analytics

Platforms: Cloud, GPU