Affective Network: Giving emotional awareness to Twitter users.
Emotional contagion in online social networks has been of great interest over the past years. Previous studies have mainly focused on finding evidence of affection contagion in homophilic atmospheres. However, these studies have overlooked users' awareness of the sentiments they share and consume online. In this work, I present an experiment with Twitter users that aims to help them better understand which emotions they experience on this social network. I introduce Affective Network (Aff-Net) (available at https://AffectiveNetwork.media.mit.edu), a Google Chrome extension that enables Twitter users to filter and make explicit (through colored visual marks) the emotional content in their News Feed.
Find more at the Affective Network page.
When humans count the number of objects in a scene, we may not remember the color specific objects, because we tend to focus our attention on the task we aim to solve. However, we usually have a panoramic view that allows us to understand the context of the scene. Similarly, Visual Question Answering (VQA) machines can be trained to focus on different general and specific image descriptors to become more efficient or accurate. Previous works have mainly focused on features extraction as well as combining different attention types, but have not paid much attention to evaluate which attention features type, bottom-up or top-down, responds better to which question types. In this work, I present a model that concatenates bottom-up and top-down features for VQA. Applying a relevance analysis to abstract and real-world images shows that bottom-up features strongly influence responses to general and ambiguous questions, while top-down attention focuses mainly on object-detection tasks classes.
Find all the code and results in this repo.
2016 - 2017. I got my master degree at the Pontifical Catholic University of Chile (PUC). I worked under Karim Pichara (PUC) and Pavlos Protopapas (Harvard University) supervision. They both strongly supported me through all my work. During my master, I spent four months working at Harvard University. This allowed me to learn from others and focus my research to conclude it successfully.
Best Computer Science Thesis Award
2017 - Pontifical Catholic University of Chile
Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.
Before my thesis defense, I submitted a paper titled "A full probabilistic model for yes/no type crowdsourcing in multi-class classification" at the journal Data Mining and Knowledge Discovery. I am still waiting for the first review.
We crowdsourced labels for an astronomical dataset. This contest allows the proposed model to classify not uniformed sampled time-series. Several astronomers and engineers participated in this crowdsourcing task.
We tested the model in two different scenarios. We will release the full databases once the paper gets accepted. The animal contest is also available.
I am Kaggle competitions' fan. I have tried a few solutions, shared my ideas and learned from others.
My first submission was in 2015 for San Francisco Crime Classification when I was just starting at machine learning. I was in the top 5% of the leaderboard. I compared soft SVM to Logistic Regression one-vs-all where the last one achieved lower log-loss.
Analysis of 2500 votes of the Chilean Senate chamber. I analyzed who influenced who through national agreements. I applied data mining techniques to get the dependent behavior among the senators, and visualization techniques to represent the founded insights.
This project may be solved using correlation techniques, probabilistic models, association rules or decision trees.
With a group of 5 classmates, we evaluated the project of extending a public drugstore service in Peñalolén (commune of Santiago, Chile).
The main challenge was to forecast the expected demand for the following 20 years. We build a database with information of almost every inhabitant in this commune. This database contained patients and illnesses information, competitors prices, economical distribution of people, among others. Finally, we recommended the optimum location to launch a new drugstore and maximize the commune wellness.
Finding patterns in a database os complaints about a Chilean company. All employees were asked to respond a survey which I was in charge to report the manager.
My conclusions and analysis helped the human resources manager to understand his workers better. Results showed that employees increased loyalty and creative thinking once the company gave them more space to show their skills.
The main models used here were LDA and preprocessing techniques such as Tokenization, Stop Word Removal, and Stemming.
Developed smart tool to obtain the trendy words of a tweeter user. Each color represents the average sentiment of the sentences containing the word. In the left image, we can see how some words change their size according to the tendency of use.
Search engine queries' analysis. Using Tf-idf trained in Spanish Wikipedia we could understand how a user query is composed. This visualization uses t-SNE to interpret the data. Knowing the customer's behavior allows the optimization of the products' indexing.