- This event has passed.
Learning Parse Structure of Paragraphs and its Applications in Search
April 28, 2014 @ 6:30 pm
Boris Galitsky, Elastica
6:30 Food & Networking
We propose to combine parse forest and discourse structures to form a unified representation for a paragraph of text. The purpose of this representation is to tackle answering complex paragraph-sized questions in a number of products and services-related domains. A candidate set of answers, obtained by a keyword search, is re-ranked by matching the sequence of parse trees of an answer with that of the question. To do that, a graph representation and learning technique for parse structures for paragraphs of text have been developed. Parse Thicket (PT) as a set of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as co-reference and taxonomic relations is introduced. These arcs are also derived from other sources, including Speech Act and Rhetoric Structure theories. The operation of generalization of logical formulas is extended towards parse trees and then towards parse thickets to compute similarity between texts.
We provide a detailed illustration of how PTs are built from parse trees, and generalized. The proposed approach is subject to evaluation in the product search and recommendation domain of eBay.com, where user queries include product names, desired features and expressions for user needs in multiple sentences. We demonstrate that search relevance is improved by PT generalization, using Bing search engine API as a baseline. We perform the comparative analysis of contribution of various sources of discourse information to the relevance. An open source plugin for SOLR is developed so that the proposed technology can be easily integrated with industrial search engines.
PhD IITP, Russia
Dr. Galitsky has led the development of semantic analysis and machine learning technologies at eBay, iAskWeb, LogLogic (acquired by Tibco), Xoopit (acquired by Yahoo), UpTake (acquired by Groupon) and Zvents (acquired by eBay). ANECA Associate Professor of CS, he authored numerous papers in semantic analysis.