sucinto

The project

Team

Resources

Publications

Supplementary material

See a demonstration

Related projects

Interesting links

Tools

NILC-WISE - Web Interface for Summary Evaluation - an online and easy to use interface for running ROUGE (Lin, 2004) for evaluating summaries
Summarization extension to Google Chrome - extension for on-line news summarization, based on RSumm system
OpCluster-PT - as described in the MSc Dissertation of Vargas (2017), a new computational method based on semantic relations and linguistic rules to automatically detect fine-grained opinions in User-Generated Content (UGC)
Models for summary coherence evaluation - a set of implemented models for summary coherence evaluation, following several approaches, from traditional entity grids to discourse grids. See the PhD thesis of Marcio de Souza Dias for more information.
RC-4 multi-document summarizer - based on the best RST & CST-based summarization strategy proposed by Cardoso (2014)
RCT-4 multi-document summarizer - based on the best RST & CST & subtopics-based summarization strategy proposed by Cardoso (2014). Notice that the difference of this summarization method in relation to the above one is the inclusion of subtopic segmentation and treatment.
Text-summary alignment - tool that includes a set of methods for aligning texts and their multi-document summaries, as developed by Agostini et al. (2014)
TextTiling for Portuguese - topical segmentation tool adapted to news texts in Brazilian Portuguese, based on the work of Hearst (1997)
ViSum - a visualization system for multi-document summarization (described by Lima, 2013)
Lemmatizer for Portuguese - based on the MXPOST part of speech tagger and UNITEX dictionaries for Portuguese, this tool produces the lemmas of the words of a text stored in a plain text file. The source code is also provided. For more details, see the readme.pdf file or contact Erick G. Maziero (the developer of the system)
NCLEANER trained model for Portuguese - a trained model to be used with NCleaner (Evert, 2008) for cleaning web pages in Portuguese. The model was trained with 184 texts from several online sources, as Terra, UOL, BBC, Exame, Estadão, IG, R7, Zero Hora, G1, JB Online, and O Globo, among others.
CSTTool - a semi-automatic edition tool for annotating texts according to the Cross-document Structure Theory (see Aleixo and Pardo, 2008)
Newshead - an on-line tool for searching and clustering related news
RSTeval - a tool for discourse parsing evaluation, following Marcu (2000) evaluation method - the tool is able to compare RST trees (automatically or manually produced), producing precision and recall numbers (see Maziero and Pardo, 2009)
Syntax-based text segmentation tool - a tool for detecting elementary discourse units in texts - it uses the parser PALAVRAS (Bick, 2000) for analyzing the input text and, then, applies syntactical segmentation rules
RST Toolkit - utility programs for processing RST files, offering several computational facilities for both computational and linguistic purposes
Sentence ordering program - program for ordering sentences in a multi-document summary (given the source-texts) (see Lima and Pardo, 2012)
CSTSumm - a multi-document summarizer based on CST information (see README.txt in the rar file) (see Castro Jorge, 2010)
RSumm - a multi-document summarizer based on the relationship maps proposed by Salton et al. (1997) (see Ribaldo et al., 2012 and Ribaldo, 2013)
DiZer 2.0 - an on-line RST discourse parser, which is easily adaptable and portable to different text types/genres and languages (see Maziero et al., 2011)
CSTParser - a state-of-the-art CST discourse parser for Portuguese, using both symbolic and machine learning techniques (see Maziero, 2012)
--> Its stand-alone (offline) version (with some adaptations in relation to the online version) is also freely available for use
NASP (see NASP++ below) - a tool for aiding in word sense annotation of nouns in Portuguese, using Princeton Wordnet as sense repository
NASP++ - an improved version of NASP (see above), with more facilities (e.g., the underlying generation of ontologies for the annotated words) and adapted to other part of speech tags
MulSEN - a multilingual version of NASP (see above)

Corpora and related resources

CSTNews-Update - a new arrangement of CSTNews texts for training and testing update summarization methods for Portuguese
Corpora for sentence compression - two corpora composed by long (original) sentences and their compressed versions for Portuguese
Corpus of automatic multi-document summaries with linguistic errors - a corpus of automatic multi-document summaries (for the texts of CSTNews corpus) produced by 4 different summarizes with varied performances, manually annotated with linguistic errors
OpiSums-PT - a corpus of (extractive and abstractive) opinion summaries (170, in total) for reviews of books (13 reviews) and electronic products (4 reviews), written in Brazilian Portuguese
Aspect ontologies - groups of (hierarchically organized) opinion aspects for supporting opinion mining tasks, including the domains of smartphones, digital cameras and books, in OWL format
CSTNews interface - on-line browsing interface to CSTNews corpus
CSTNews - a corpus with 50 clusters of news texts - in Portuguese - along with their multi-document summaries, as well as several discourse and semantic annotations (see Aleixo and Pardo, 2008; Cardoso et al., 2011)

More resources

NILC - Interinstitutional Center for Computational Linguistics