The project
Team
Resources
Publications
Supplementary material
See a demonstration
Next appointments
Related projects
Interesting links
Contact us

The sucinto project aims at investigating and exploring generic and topic-focused multi-document summarization strategies for providing a more feasible and intelligent access to on-line information provided by news agencies. This commitment brings back old and well-known scientific challenges from the first studies in summarization in the 50s as well as introduces several new and exciting challenges, e.g., to deal with redundant, complementary and contradictory information, to normalize different writing styles and referring expression choices, to balance different perspectives and sides of the same events and facts, to properly deal with evolving events and their narration in different moments, and to arrange information pieces from different texts to produce coherent and cohesive summaries, among several others. An ultimate goal of this project is to pull the developed tools together as on-line applications for final users.

This project takes into consideration not only classical approaches to single and multi-document summarization, but also new ones, following different paradigms and using knowledge of varied nature ranging from empirical and statistical data to semantic and discourse models. Research interests include (i) the modeling of the summarization process (content selection, planning, aggregation, generalization, substitution, information ordering, etc.) by means of Cross-document Structure Theory (CST), Rhetorical Structure Theory (RST), ontologies, and language and summarization statistical models, (ii) the investigation of related tasks as discourse parsing, topic detection, temporal annotation and resolution, coreference resolution, text-summary alignment, and multilingual processing, and (iii) the linguistic characterization of multi-document summaries and their manual production.

The project has been developed at NILC (Interinstitutional Center for Computational Linguistics), one of the biggest research groups on Natural Language Processing and Computational Linguistics in Brazil. It started in 2007 as a natural follow up to some previous projects on single-document summarization carried out at NILC (FAPESP #2006/02887-9; see also related projects). It has been supported by the research agencies FAPESP, CNPq, and CAPES, which have granted scholarships for undergraduate and graduate students and regular financial support for the project (FAPESP# 2015/17841-3, FAPESP #2012/03071-3, FAPESP #2009/05603-0).

 

Best if viewed with Google Chrome


 

NILC - Interinstitutional Center for Computational Linguistics