Ph.D. Teaching Assistant at EECS Fall 2019
▪ Serving as TA for undergraduate (introductory: 6.036) and graduate (applied: 6.862) machine learning.
▪ Guiding grad students in their projects and developing material for new problem sets related to deep learning. Worked under professors Leslie Kaelbling, Tamara Broderick, Duane Boning, Patrick Jaillet, and Jacob Andreas, among others.
Adjunct Professor - Minimum course for the students of the Master in information technologies and data management. This course gives the students a deep understanding of the Data Mining and Data Warehousing principles. It starts presenting strategies for building warehouse systems; then the lessons go through applications for visualizing and querying data by non-experts. The primary objective is to drive management decisions and marketing plans using these techniques. Mainly the customer relationship management (CRM).
Adjunct Professor - 2th- and 6th-year university students. In this course, we study (and mainly implement) advanced topics in Computer Programming such as object-oriented design, data structures, functional programming, threading, simulation, metaprogramming, input/output, unit testing and graphical interfaces. This book: "Advanced Computer Programming in Python" summarizes all the contents of this course.
Adjunct Professor - 4th- and 6th-year university students. Analysis and implementation of basic techniques and algorithms in data mining. We study data warehouses, ETL, data preprocessing, data visualization, association rules, linear regression, classification algorithms (logistic regression, decision trees, random forest, KNN), clustering methods (k-means, hierarchical, and gaussian mixtures with EM).
2016 - 2017. I got my master degree at the Pontifical Catholic University of Chile (PUC). I worked under Karim Pichara (PUC) and Pavlos Protopapas (Harvard University) supervision. They both strongly supported me through all my work. During my master, I spent four months working at Harvard University. This allowed me to learn from others and focus my research to conclude it successfully.
Best Computer Science Thesis Award
2017 - Pontifical Catholic University of Chile
Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.
Before my thesis defense, I submitted a paper titled "A full probabilistic model for yes/no type crowdsourcing in multi-class classification" at the journal Data Mining and Knowledge Discovery. I am still waiting for the first review.
This picture is on my defense day. From left to right, Karim Pichara, Pavlos Protopapas, Belén Saldías, Denis Parra, and Alejandro Jara.
We tested the model in two different scenarios. We will release the full databases once the paper gets accepted. The animal contest is also available.
We crowdsourced labels for an astronomical dataset. This contest allows the proposed model to classify not uniformed sampled time-series. Several astronomers and engineers participated in this crowdsourcing task.
It was a flip classroom environment. I was in charge of creating evaluations and weekly practical activities for more than a hundred students.
In addition, I oversaw a 25-member team. Together we gave feedback and graded students' assestments.
Finally, I supported this class' professors on developing and publishing the book "Advanced Computer Programming in Python", which includes the developed material.
I also served as TA for some introductory courses related to computer programming.
2017 - TA
We studied machine learning from a probabilistic perspective.
Bayesian learning: the beta-binomial and the Dirichlet-multinomial model.
Monte Carlo Inference: I also learned and developed customized implementations of rejection sampling, importance sampling.
MCMC inference: we solved problems implementing Gibbs sampling, metropolis hastings, annealing methods.
Variational inference: KL divergence, the mean field method, and Variational Bayes EM.
2015 - 2016 - TA
This course teaches the fundamental data structures and their main algorithms. We evaluated complexity on memory and time. We also studied the main techniques for solving discrete optimizations.
We implemented and evaluated: linked lists, queues, heaps, hash tables, trees (binary, red-black, B), graphs (BFS, DFS), dictionaries, prioritized queues, disjoint sets, greedy algorithms (Dijkstra, minimum cost coverage), divide to conquer, dynamic programming, and sorting algorithms.
2017 - TA
This course teaches widely used tools and applications for automatic analysis and data mining processes. Starting by presenting strategies for building warehouse systems, the lessons go through applications for visualizing and querying data by non-experts.
This subject gave me a big picture of every piece of software and model I developed during my time as a data scientist. Now I know how they fit together.
2015 - 2016 - Chief TA
The course introduces the basis of stochastic modeling systems. This course presents basic techniques and concepts under the most widely used analytics models in operations research for representing probabilistic systems.
We studied Poisson processes, discrete-time Markov chains, and continuous-time Markov chains. We also learned process simulation.
2016 - Chief TA
4th- and 6th-year university students. Analysis and implementation of basic techniques and algorithms in data mining. We study data warehouses, ETL, data preprocessing, data visualization, association rules, linear regression, classification algorithms (logistic regression, decision trees, random forest, KNN), clustering methods (k-means, hierarchical, and gaussian mixtures with EM).
2018 Data Mining course, 100% satisfaction & 100% recommendation level, supported by students. Engineering School of the Pontifical Catholic University of Chile.
2017 Awarded by the Engineering School of the Pontifical Catholic University of Chile.
2016-2017 Engineering School of the Pontifical Catholic University of Chile gave me a scholarship for pursuing my entire master research.
2015 Acknowledgment for Great Quality of Support to Teaching (Teacher Assistant), 2015. Courses: Advanced Programming and Stochastic Models. Pontifical Catholic University of Chile.
2012 Honor enrollment - Pontifical Catholic University of Chile.
2012 National Math top score - University Admission 2012. Ministry of Education of Chile
Copyright © 2022 Belén Carolina Saldías Fuentes - All rights reserved