To empirically address these research questions, I've worked in implementing Odessa – Odessa, a DEcentralized Social Systems App!
Dissertation committee: Deb Roy, Jonathan Zittrain, Rosalind Picard
General exams committee: Deb Roy, Sasha Rush, Veronica Barassi
Sharing personal narratives is a fundamental aspect of human social behavior as it helps share our life experiences. A substantial effort has been made towards developing storytelling machines or inferring characters' features. However, we don't usually find models that compare narratives. This task is remarkably challenging for machines since they, as sometimes we do, lack an understanding of what similarity means. We first introduce a corpus of real-world spoken personal narratives comprising 10,296 narrative clauses from 594 video transcripts. Second, we ask non-narrative experts to annotate those clauses under Labov's sociolinguistic model of personal narratives (i.e., action, orientation, and evaluation clause types) and train a classifier that reaches 84.7% F-score for the highest-agreed clauses. Finally, we match stories and explore whether people implicitly rely on Labov's framework to compare narratives. We show that actions followed by the narrator's evaluation of these are the aspects non-experts consider the most.
@Belen Saldias and Deb Roy
Intelligence Never Seeks Perfection, Instead Requires Effort
INSPIRE is a program in partnership with a local middle school that seeks to expose students to stories of triumph and hope that they find personally meaningful. A key part of the program is a mobile application that delivers short video stories of professionals talking about their life journeys. Stories are sourced from our content partner Roadtrip Nation, and delivered through a virtual coach ("Jo Jo"). Jo Jo personalizes video recommendations for students and prompts them to reflect after they watch videos. The program also includes in-person community building and engagement initiatives developed in collaboration with the school to supplement and extend what the mobile application can offer on its own.
@Nabeel Gillani, Belen Saldias, Sneha Makini, Maggie Hughes, Deb Roy.
paper, poster, news, @medialab
Affective Network: Giving emotional awareness to Twitter users.
Emotional contagion in online social networks has been of great interest over the past years. Previous studies have mainly focused on finding evidence of affection contagion in homophilic atmospheres. However, these studies have overlooked users' awareness of the sentiments they share and consume online. In this work, I present an experiment with Twitter users that aims to help them better understand which emotions they experience on this social network. I introduce Affective Network (Aff-Net), available at https://AffectiveNetwork.media.mit.edu , a Google Chrome extension that enables Twitter users to filter and make explicit (through colored visual marks) the emotional content in their News Feed.
@Belen Saldias and Roz Picard
Language generation focuses on producing fluent sentences, commonly by learning to predict the next word in a sequence. Conditional generation allows a model to constrain this generation on additional attributes such as style or another source language. However, most conditional models are still trained by minimizing cross-entropy loss, without explicitly building the conditions into the loss. In this work, we develop a REINFORCE-like algorithm to penalize failures to match the desired constraints and to help reduce the mismatch between the loss and evaluation metrics. We implement and train a baseline model according to the current state-of-the-art for comparison. Our models greatly improve the accuracy with which generation satisfies the desired constraints, with no decrease in performance in terms of fluency (perplexity).
When humans count the number of objects in a scene, we may not remember the color specific objects, because we tend to focus our attention on the task we aim to solve. However, we usually have a panoramic view that allows us to understand the context of the scene. Similarly, Visual Question Answering (VQA) machines can be trained to focus on different general and specific image descriptors to become more efficient or accurate. In this work, I present a model that concatenates bottom-up and top-down attention features for VQA. Applying a relevance analysis to abstract and real-world images shows that bottom-up features strongly influence responses to general and ambiguous questions, while top-down attention focuses mainly on object-detection tasks classes.
@Belen Saldias
In recent years, there has been an unprecedented growth in content that is shared and presented on social media platforms. Along with this growth, however, there is an increasing concern over the lack of control social media users have on the content they are shown by invisible algorithms. Gobo aims to help users control what’s hidden from their feeds, add perspectives from outside their network to help them break filter bubbles, and explore why they see certain content on their feed. Through an iterative design process, we've built and deployed Gobo in the wild and conducted a pilot study in the form of a survey to understand how the users respond to the shift of control from invisible algorithms to themselves.
@Rahul Bhargava, Anna Chung, Neil S Gaikwad, Alexis Hope, Dennis Jen, Jasmin Rubinovitz, Belen Saldias, Ethan Zuckerman
Intelligence Never Seeks Perfection, Instead Requires Effort
INSPIRE is a program in partnership with a local middle school that seeks to expose students to stories of triumph and hope that they find personally meaningful. A key part of the program is a mobile application that delivers short video stories. Stories are delivered through a virtual coach ("Jo Jo"). Jo Jo recommends videos and prompts students to reflect.
@Nabeel Gillani, Belen Saldias, Sneha Makini, Maggie Hughes, Deb Roy.
I am Kaggle competitions' fan. I have tried a few solutions, shared my ideas and learned from others.
My first submission was in 2015 for San Francisco Crime Classification when I was just starting at machine learning. I was in the top 5% of the leaderboard. I compared soft SVM to Logistic Regression one-vs-all where the last one achieved lower log-loss.
Analysis of 2500 votes of the Chilean Senate chamber. I analyzed who influenced who through national agreements. I applied data mining techniques to get the dependent behavior among the senators, and visualization techniques to represent the founded insights.
This project may be solved using correlation techniques, probabilistic models, association rules or decision trees.
With a group of 5 classmates, we evaluated the project of extending a public drugstore service in Peñalolén (commune of Santiago, Chile).
The main challenge was to forecast the expected demand for the following 20 years. We build a database with information of almost every inhabitant in this commune. This database contained patients and illnesses information, competitors prices, economical distribution of people, among others. Finally, we recommended the optimum location to launch a new drugstore and maximize the commune wellness.
Finding patterns in a database os complaints about a Chilean company. All employees were asked to respond a survey which I was in charge to report the manager.
My conclusions and analysis helped the human resources manager to understand his workers better. Results showed that employees increased loyalty and creative thinking once the company gave them more space to show their skills.
The main models used here were LDA and preprocessing techniques such as Tokenization, Stop Word Removal, and Stemming.
Developed smart tool to obtain the trendy words of a tweeter user. Each color represents the average sentiment of the sentences containing the word. In the left image, we can see how some words change their size according to the tendency of use.
Search engine queries' analysis. Using Tf-idf trained in Spanish Wikipedia we could understand how a user query is composed. This visualization uses t-SNE to interpret the data. Knowing the customer's behavior allows the optimization of the products' indexing.
2016 - 2017. I got my master degree at the Pontifical Catholic University of Chile (PUC). I worked under Karim Pichara (PUC) and Pavlos Protopapas (Harvard University) supervision. They both strongly supported me through all my work. During my master, I spent four months working at Harvard University. This allowed me to learn from others and focus my research to conclude it successfully.
Best Computer Science Thesis Award
2017 - Pontifical Catholic University of Chile
Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.
Before my thesis defense, I submitted a paper titled "A full probabilistic model for yes/no type crowdsourcing in multi-class classification" at the journal Data Mining and Knowledge Discovery. I am still waiting for the first review.
This picture is on my defense day. From left to right, Karim Pichara, Pavlos Protopapas, Belén Saldías, Denis Parra, and Alejandro Jara.
We tested the model in two different scenarios. We will release the full databases once the paper gets accepted. The animal contest is also available.
We crowdsourced labels for an astronomical dataset. This contest allows the proposed model to classify not uniformed sampled time-series. Several astronomers and engineers participated in this crowdsourcing task.
Copyright © 2024 Belén Carolina Saldías Fuentes - All rights reserved