19 de Octubre de 2015
- Autor: Dña. Patricia Jiménez Aguirre.
- Titulo: “Enterprise Information Integration – New Approaches to Web Information Extraction”.
- Directores: Dr. D. Rafael Corchuelo Gil (Universidad de Sevilla)
- Sinopsis: Information has changed the lives of most people forever thanks to the advent of the Web, which boost people using the Net at an increasing pace. Thus, the Web has become the universally accesible distribution channel for data. However, data itself is not powerful, but rather inferring knowledge from information, which is called Business Intelligence. To do that, we need web information extractors, which are the tools intended to extract data from the Web, and endow them with structure and semantics so that the information they produce can be consumed by people or can feed automated business processes to exploit it in an intelligent way. In this dissertation, we focus on developing web information extractors that learn rules to extract information from semi structured web documents and on how to evaluate different information extraction proposals so as to rank them automatically. We developed two proposals for web information extraction called TANGO and ROLLER; they both are based on an open catalogue of features, which eases evolving them as the Web evolves. We have also devised VENICE, an automated, open, agnostic, and non ad- hoc method to rank information extraction proposals homogeneously, fairly, and stringently.
Our results have proven that we have advanced on the state-of-the-art regarding web information extraction proposals, which may help researchers and practitioners extract information from web pages effectively and efficiently. Also, we have advanced the state-of-the-art regarding how to evaluate and compare information extraction proposals so that researchers and practitioners can make informed decisions on which proposal is the most suitable for a particular problem.
- Teseo: https://www.educacion.gob.es/teseo/mostrarRef.do?ref=1182735
- Repositorio USE: http://www.doctorado.us.es/tesis-doctoral/repositorio-tesis/tesis-2015/details/2/4906
- LinkedIn: http://www.tdg-seville.info/PatriciaJimenez/Home