Research Projects

I have participated in several research projects funded by FAPESP, CNPq and Bill and Melinda Gates Foundation. I thank FAPESP for funding most of my projects and also for funding several of my graduate students.

Project as Principal Investigator - PI

An Intelligent Trap and Mobile Application to Motivate Local Mosquito Control Activities (2016 - 2018)
In recent years, we have witnessed a tremendous increase of mosquito-borne diseases (MBD) such as the explosion of the cases of dengue fever around the world, soon joined by the expansion of the distribution of chikungunya, and then Zika fevers. All these diseases have in common the Ae. aegypti mosquito as the main vector. The Ae. aegypti is highly adapted to urban conditions and its resilience to insecticides has made unilateral governmental mosquito control activities largely ineffective. The control of mosquito populations is only possible with the joint effort of organizations and governments and the active participation of the community in question. In this project, we propose an innovative approach for community engagement and vector control. Our idea is to produce an inexpensive intelligent trap that will empower the population with the knowledge of Ae.aegypti densities. Such a trap will make use of mobile devices to educate the population about proper mosquito control activities, as well as evaluate the effectiveness of these activities based on the number of captured mosquitoes.
Funding Agency: USAID.

Controlling Dengue Fever Mosquitoes using Intelligent Sensors and Traps (2015-2016)
In the last decades we have witnessed a tremendous increase of dengue fever cases. Four decades ago only 9 countries had reported severe dengue epidemics. Currently, dengue is endemic in more than 100 countries. In this project we discuss why we are losing the war against dengue and propose a completely different approach for vector control. We propose to further develop our recent research on intelligent sensors to field conditions. Our idea is to propose an inexpensive device that will empower the population with the knowledge of Aedes aegypti densities. This will motivate local mosquito control activities and put the population, governmental and aid organizations far ahead of disease outbreaks.
Funding Agency: Google.

Intelligent Traps to Control Insect Pests in Agriculture and the Environment (2015-2017)
Insects are undoubtedly important to agriculture and the environment. Although pest insects attract the most attention, many insects are beneficial to the environment and humans. For example, insects are responsible for pollinating at least two-thirds of all food consumed in the world. Due to its importance to humans, the recent decline in populations of pollinator insects, especially bees, it is considered a serious environmental problem; frequently associated with pesticide exposure. We believe it is possible to reduce the use of pesticides through the use of technology. We propose a low-cost intelligent trap that selectively captures harmful insect species, saving all other species. Such a trap will have a minimal negative impact on the environment. At the heart of the intelligent trap is a novel sensor that we are developing. This sensor uses a laser light to capture insect data from a distance and machine learning techniques to identify the insect species. In this project we propose the development of such an intelligent trap applied to the Asian citrus psyllid. This insect pest affects orange plantations and is present in Brazil and the United States. We describe the scientific and technological challenges of developing such an intelligent trap. We discuss our plan to develop the trap within a four-year time frame from the current development state to field trials.
Funding Agency: CNPq.

Research on Geo-spatial Marine Biology Data Mining Using Time Series, Text Mining and Visualization (2013-2015)
We propose a focused, interdisciplinary research project on data mining and data visualization with a specific focus on marine data. This data is particularly challenging for data mining as it presents only a very sparse set of data points with respect to the volume of the marine space that is being modelled and investigated. It also presents a set of challenges in visualization of data and of modelling results, as the data are inherently three-dimensional and from an unfamiliar context relative to data from on land. We will work in an interdisciplinary team with researchers in data mining, data visualization, and marine biology to develop visualization methods that will be appropriate for marine biology applications of data mining. The data may be derived from multiple disparate sources, including fisheries or scientific surveys, autonomous sensors, satellite data or field studies. For model outputs, we will particularly work on the visualization of results from a new generation of ecosystem model, analogous to the general circulation models used to predict global climate. This model includes all organism types on both land and sea. We face the challenge that it can produce gigabytes to terabytes of outputs, including tracking all organism interactions, individual states, and the spatial distribution of individuals. Thus we need to summarize, extract, and visualize outputs at multiple scales including that of individuals, ecological communities, and the globe. These data need to be visualized in a manner that will then be useful and interpretable for the international policy community.
Funding Agency: FAPESP. FAPESP-CALDO call with Stan Matwin (Dalhousie University, Canada).

Intelligent Sensor for controlling agricultural pests and disease-vector insects (2013-2015)
Applications such as intelligent sensors should be able to collect environment information and to make decisions based on input data. An example is an under-development low-cost sensor to detect and classify insects in their species using laser light and machine learning techniques. This sensor is an important step towards the development of intelligent traps able to attract and selectively capture insect species of interest such as disease vectors or agricultural pests, without affecting the beneficial species. The data gathered by the sensor constitutes a data stream with non-stationary characteristics, since the insects metabolism is influenced by environmental conditions such as temperature, humidity and atmospheric pressure. This research grant proposal has two main objectives: the first one is to develop new algorithms to classify in real-time signals from the sensor obtained from the data stream; the second one is to technologically develop the sensor in order to allow the developed machine learning techniques to be embedded in the sensor.
Funding agency: FAPESP.

Complexity-invariance for Classification, Clustering and Motif Discovery in Time Series (2012-2014)
Recently, there is an increasing interest in time series processing due to the large number of application domains that generate data with such property. Such interest can be measured by the vast amount of methods recently proposed in literature to tasks such as classification, clustering, summarization, abnormality detection and motif discovery. Recent studies have shown for several problems that methods based on similarity present an efficacy that is hardly surpassed, even when compared to more sophisticated methods. This is mainly due to the fact that the community has studied and proposed several invariances to distance measures for time series. The invariances make the distance measures ignore certain undesired data properties. The most well-known example is the invariance to local differences in time scale, obtained with the warping technique. Other invariances include the invariance to differences in amplitude and offset, phase and occlusion. Recently, we demonstrated to the scientific community that time series similarity classification methods can be largely benefited by a new invariance: complexity invariance. The main objective of this research project is to investigate new complexity-invariant distance measures and assess how such measures can improve the efficacy especially of clustering and motif discovery algorithms.
Funding agency: FAPESP.

Time Series Classification Algorithms Applied to Embedded Systems (2010-2011)
Integrating sequential and temporal data into the Data Mining process is of one of the most important challenges in Machine Learning. In this project, we are mostly interested in developing time series classification algorithms. The k-nearest neighbor algorithm is a common approach to time series classification. This algorithm has been known to perform well, especially when allied to distance measures that can deal with time lags, such as the Dynamic Time Warping. However, the classical k-nearest neighbor algorithm is computationally intensive. One may solve this problem by using indexes to increase the efficiency of similarity queries. This project proposes to investigate indexing algorithms that have the properties of anyspace algorithms. Anyspace algorithms are able to deal with different amounts of memory, in such a way that the algorithm performance depends directly on the amount of available memory. Such algorithms allow specifying the amount of memory based on the performance required by an embedded application. This project also deals with classification methods based on induction of classification rules. An approach to induce rules from time series data is the identification of motifs. Motifs are frequently occurring subsequences that usually represent a phenomenon of interest. A convenient aspect of rules is the ease one finds in writing a procedural program which implements the rule's logic with little memory and processing resources. The algorithms developed in this post-doctoral stage will be applied in insect control and monitoring using devices developed by ISCA Technologies.
Funding agency: FAPESP.

Machine Learning with Imbalanced Data Sets (2007-2009)
Funding agency: FAPESP.

Project as co-Principal Investigator - co-PI

MAP: Aprendizado de Máquina: Uma Abordagem Baseada em Múltiplas Estratégias
Funding agency: CAPES.
PI: André Ponce de Leon Fereira de Carvalho.

Research Support Center for Machine Learning in Data Analysis - NAP-AMDA (2012-2016)
The growing value of data produced by different knowledge areas and the complexity of the problems to be computationally treated point out to the need for new computational tools able to support data analysis. Many of the current computational tools that allow automatic and efficient data analysis are based on concepts from Artificial Intelligence, particularly Machine Learning (ML). Besides Artificial Intelligence, ML is associated with other areas, like statistics, probability, cognition, computing theory, neuroscience, information theory, just to name a few. There are several well-established research centers for data analysis using ML techniques, in Universities and companies abroad. In Brazil, despite a productive and high-quality ML research, there is not an equivalent research center. Thus, this project proposes the creation of the Research Support Center for Machine Learning in Data Analysis, NAP-AMDA (from the Portuguese Núcleo de Apoio à Pesquisa de Aprendizado de Máquina em Análise de Dados). The main goal of the NAP-AMDA is the establishment of an internationally recognized interdisciplinary and multidisciplinary center of excellence in the use of ML techniques for data analysis in São Paulo, Brazil. The center will include researchers from ML in data analysis and from knowledge areas demanding data analysis. The center will also stimulate collaborations with companies and government institutions whose data can be analyzed by ML techniques. The use of these techniques by the Brazilian companies can lead to better products and services, increasing their competitiveness. The use by the Brazilian government may improve the quality of public services. The center will promote and organize meetings and workshops with members from the participating institutions to discuss data analysis problems to be solved. The NAP-AMDA is comprised of faculty members, researchers and students from the Universidade de São Paulo and from other Universities and Research Centers, from Brazil and abroad. The NAP-AMDA will be based at the Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo.
Funding agency: Universidade de São Paulo.
PI: André Ponce de Leon Fereira de Carvalho.

Counting and classifying insects with ultra-cheap sensors (2010-2012)
We propose to build ultra cheap (less than $5) sensors that can count and distinguish between various kinds of insects (including malaria vectors) from a large distance. Our work has the potential to revolutionize epidemiological modeling by proving accurate real-time counts of vectors down to the species/sex level, thus allowing for more effective vector control.
Funding agency: Bill and Melinda Gates Foundation.
PI: Eamonn Keogh.