sucinto

The project

Team

Resources

Publications

Supplementary material

See a demonstration

Related projects

Interesting links

The sucinto project aimed at investigating and exploring generic and topic-focused multi-document summarization strategies for providing a more feasible and intelligent access to on-line information provided by news agencies. This commitment brought back old and well-known scientific challenges from the first studies in summarization in the 50s as well as introduced several new and exciting challenges, e.g., to deal with redundant, complementary and contradictory information, to normalize different writing styles and referring expression choices, to balance different perspectives and sides of the same events and facts, to properly deal with evolving events and their narration in different moments, and to arrange information pieces from different texts to produce coherent and cohesive summaries, among several others. An ultimate goal of this project was to pull the developed tools together as on-line applications for final users.

This project took into consideration not only classical approaches to single and multi-document summarization, but also new ones, following different paradigms and using knowledge of varied nature ranging from empirical and statistical data to semantic and discourse models. Research interests included (i) the modeling of the summarization process (content selection, planning, aggregation, generalization, substitution, information ordering, etc.) by means of Cross-document Structure Theory (CST), Rhetorical Structure Theory (RST), ontologies, and language and summarization statistical models, (ii) the investigation of related tasks as discourse parsing, topic detection, temporal annotation and resolution, coreference resolution, text-summary alignment, and multilingual processing, and (iii) the linguistic characterization of multi-document summaries and their manual production.

The project was developed at NILC (Interinstitutional Center for Computational Linguistics), one of the biggest research groups on Natural Language Processing and Computational Linguistics in Brazil. It started in 2007 as a natural follow up to some previous projects on single-document summarization carried out at NILC (FAPESP #2006/02887-9; see also related projects). It was supported by the research agencies FAPESP, CNPq, and CAPES, which have granted scholarships for undergraduate and graduate students and regular financial support for the project (FAPESP# 2015/17841-3, FAPESP #2012/03071-3, FAPESP #2009/05603-0). The project was officially over at the end of 2017.

Best if viewed with Google Chrome

NILC - Interinstitutional Center for Computational Linguistics