[DL] Three Ph.D positions available at the DKM group in FBK Trento

Luciano Serafini serafini at fbk.eu
Wed Jan 25 17:39:40 CET 2012


 				[[[ Apologies for multiple copies of this message ]]]

					     Three PhD Positions Available

					   Data and Knowledge Management Unit
						    http://dkm.fbk.eu
					    Fondazione Bruno Kessler - IRST
						     Trento, Italy


				   
The Data and Knowledge Management (DKM) research unit (http://dkm.fbk.eu) of the Bruno Kessler Foundation (FBK), Trento,
Italy, is seeking candidates for 3 Ph.D positions. (See also http://dkm.fbk.eu/index.php/PhD_thesis)

The Ph.D. studies will be held at the International Doctorate School in Information and Communication Technologies
(http://www.ict.unitn.it/) of the University of Trento, Italy.

Interested candidates should inquire for further information and/or apply by sending email to serafini<at>fbk.eu or ghidini<at>fbk.eu

Details of the positions follows: 

P1: INFORMAITON EXTRACTION FOR ONTOLOGY ENGINEERING: 
for more details and statements of interest contact Chiara Ghidini <ghidini at fbk.eu>

Albeit the growing maturity of ontological engineering tools, ontology knowledge acquisition remains a highly manual, time-consuming, and complex task, that can easily hinder the ontology building process. Automatic ontology learning is a well-established research field whose goal is to support the semi-automatic construction of ontologies starting from available digital resources (e.g., a corpus, web pages, dictionaries, semi-structured and structured sources) in order to reduce the time and effort in the ontology development process.
In spite of the efforts and progresses made in ontology learning, and of the ambitious research plan of the ontology learning field, whose aim is to extract increasingly complex information ranging from terms, to relations, to hierarchies, and finally to axioms, state of the art methods and tools still mainly focus on the extraction of terms, with few exceptions addressing more complex tasks such as the extraction of (possibly hierarchical) relations, and axioms. Moreover the performances of the current algorithms appear to be more suitable to support the construction of light-weight medium-quality ontologies, rather than good quality conceptualizations of a domain according to the good practices in ontology modeling. To make very simple examples, current algorithms for term extraction provide reasonable performances in terms of precision and recall but lack the needed quality in providing a precise, shared, and well-founded, distinction between the classification of a term in an individual or in a concept. Similarly most algorithms for relation extraction are able to identify relations at the instance level, but are not able to abstract to the concept level, or to identify further characteristics of these relations (e.g., their cardinality, functionality, symmetry, and so on).
The aim of this thesis is to investigate how to combine work in automatic ontology learning, which is mainly based on Natural Language processing, information extraction, statistics, and machine learning techniques, and work in methodologies and tools for manual knowledge engineering to produce (semi)-automatic services for ontology learning better supporting the construction of rich and good quality ontologies. The work will start from an investigation of the current techniques for information extraction available in the field of Natural Language processing and their comparison with the requirements coming from the ontology design methodologies in the ontology engineering field, and will then research how to tailor those techniques in order to fulfill these requirements and to produce tools (or services) able not only to extract individuals, concepts, relations, hierarchies, and axioms, but to ground them in good ontology practices.

The work will address key research challenges in both Natural language processing and ontology engineering. It will have strong  algorithmic and methodological aspects, together with implementation-oriented tasks.

==============================================================================

P2: INTEGRATING LOGICAL AND STATISTICAL REASONING
For more details and statements of interest contact Luciano Serafini <serafini at fbk.eu>

In the last decade, automated reasoning techniques have reached a hight level of complexity able to support reasoning on large knowledge repositories expressed in different logical language. Examples are: SAT based reasoners for propositional logic, SMT (SAT solver modulo theory), reasoners on Description Logics and other semantic web languages, and resolution based theorem provers. In the meanwile, complex statistical methods such as support vector machines, kernel methods, and graphical models have been studied and developed. These systems are capable of learning regularities in large data-set and of synthesizing the result  in a model that supports stochastic inference. The two methodologies have reached such a level of maturity, that one could figure out also the possibility of profitably combine them in a unique uniform system which allow at the same time learning and reasoning.

During the last three years the FBK joint research project Copilosk has investigated on the advantages of combining these two methods for solving problems in natural language processing, with extremely interesting and encouraging results. which show that the usage of background knowledge (available in the semantic web) in combination of machine learning methods improves the performance in many important NLP tasks [1,2,3]. Continuing in this direction we would like to design a general methodology and formal reference model. In the literature there have already been some attempts in this direction, such as Markov Logic Networks [4] Fuzzy Logics, and works that bridges logics with kernel machines [6]. These approaches however are extensions of Machine learning techniques in order to include some logical knowledge, and they presents some limits in the exploitation of logical reasoning in combination with leaning.

With this thesis we would like to define a formal framework that integrates in a uniform model reasoning and learning. In this new framework it should be possible to define the following two general tasks:

	• Learning from data in presence of background knowledge. This task is quite important as it implements what can be seen as incremental learning, where the learning is performed in successive steps, and at each step the system can reuse the knowledge acquired in the previous steps.
	• Logical reasoning in presence of real observed data. In this task logical reasoning is performed by taking also into account the statistical regularities observable in data. This allows to implement "plausible reasoning" i.e., inference which are not logically fully correct but that are in fact acceptable because some extreme cases never happens (according to the data), and are therefore irrelevant from the statistical point of view.
This new framework should combine one of the most standard statistical model, such as graphical models or regularization methods  with automatic reasoning techniques such as SAT based, or tableaux based or resolution based reasoning.

[1]	Volha Bryl, Claudio Giuliano, Luciano Serafini, Kateryna Tymoshenko. Using Background Knowledge to Support Coreference Resolution. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010), Lisbon, Portugal, August 16-20, 2010, pp. 759-764.

[2]	Volha Bryl, Claudio Giuliano, Luciano Serafini, Kateryna Tymoshenko. Supporting natural language processing with background knowledge: coreference resolution case. In Proceedings of the 9th International Semantic Web Conference (ISWC 2010), Shanghai, China, November 7-11, 2010 (Springer), pp. 80-95.

[3]	Volha Bryl, Sara Tonelli, Claudio Giuliano, Luciano Serafini. A Novel FrameNet-based Resource for the Semantic Web. To appear in the proceedings of ACM Symposium on Appliced Computing (SAC) 2012, Technical Track on The Semantic Web and Applications (SWA), Riva del Garda (Trento), Italy) March 25-29, 2012.

[4]	Matthew Richardson and Pedro Domingos, Markov Logic Networks. Machine Learning, 62 (2006), pp 107-136.

[5]	Michelangelo Diligenti, Marco Gori, Marco Maggini, Leonardo Rigutini: Bridging logic and kernel machines. Machine Learning 86(1): 57-88 (2012)


P3: BEHAVIOR RECOGNITION AND INDUCTION VIA SEMANTIC REASONING OVER HUMAN ACTIVITY PROCESSES
Thesis in collaboration with the Skil lab in Trento of Telecom Italia
for more details and statements of interest contact Luciano Serafini <serafini at fbk.eu>

The modern (smart) mobile devices allow for a very wide variety of actions (communication, browsing, application execution) and in addition to standard data related to phoning, include many different sources of information coming from sensors (e.g. GPS position, accelerometer data, etc.). This scenario has led to the birth of novel research areas such as context awareness, situation detection, activity recognition, behavior understanding and many others, which aim at exploiting all these information in order to support the user in multiple daily tasks. In parallel, but on a completely different stage, the semantic web and the  linked open data made available a huge quantity of semantic data and knowledge, concerning semantic tagging of geographical data (e.g., openstreetmap) or general knowledge about persons, locations, organizations and events, (e.g., available in dbpedia, freebase, etc.) and general terminologic and ontological knowledge (e.g., schema.org, sumo and dolce upper level ontologies, yago2, wordnet and Framenet. The above scenario opens the possibility of new research challenges of combining raw sensor data with semantic information and ontological knowledge for the analysis of human behavior. The implementation of this vision requires and effective and deep integration of techniques from different disciplines in computer science as data mining, machine  learning, semantic web and knowledge representation and reasoning. The aim of this PhD proposal is to address key research challenges in these fields and, in particular, to investigate the benefits of applying the semantic based technologies for modeling and reasoning over human activity processes.

The student will develop a research plan which will cover the following three important and complementary aspects:

(i) investigate on models for combining/modifying/extending the standard techniques of data and knowledge processing in order to provide a framework that support reasoning/learning with raw data, information and knowledge.

(ii) definition of reasoning services on top of the applied techniques/formalisms; (iii) modeling, development and experimentation on practical real-world problems in different fields (e-health, smart-city, ...)

Academic advisor: Prof. Luciano Serafini Industrial Advisor: Michele Vescovi (Michele.vescovi at guest.telecomitalia.it)

===================================================================================================

Candidate Profile
=================

The ideal candidate should have an MS or equivalent degree in computer science, mathematics or electronic engineering, phisics and philosophy, and combine solid theoretical background and software development skills.

The candidate should be able to work in a collaborative environment, with a strong commitment to reaching research excellence and achieving assigned objectives.




More information about the dl mailing list