Automatically Inferring User Goals with Visualization Systems

Faculty: Alvitta Ottley

Humans are entering a new era in which data increasingly surround us. This data saturation promises new opportunities for increased awareness, more informed decision-making, and enhanced quality of life. Visualization has emerged as a solution to help people explore, reason, and make judgments with data, but there are many open questions related to designing visualization systems to support non-experts with everyday tasks. The goal of this project is to explore how we can leverage interaction data to automatically learn about users’ goals and adapt to suit them. We will apply machine learning techniques to make predictions and explore methods for supporting data exploration.

Skills Required: Proficiency with web programming and JavaScript; Prior experience with D3 is a plus but not required. Familiarity with machine learning and statistics would be beneficial.

Machine Learning for Image Restoration

Faculty: Ulugbek Kamilov

In image restoration, the goal is to build algorithms for clearing images from undesired artifacts such as camera blur or sensor noise. REU students will work on advanced algorithms for image restoration that are based on large-scale optimization and machine learning. We have developed a family of such techniques that use learned information, such as natural image features, to generate clean images from the corrupt ones. REU students will have an opportunity to contribute to this exciting research area and learn the cutting edge imaging algorithms.

Skills Required: Familiarity with image processing and machine learning. Proficiency with MATLAB or Python.

Large-Scale Optimization for Machine Learning

Faculty: Ulugbek Kamilov

Optimization algorithms play an essential role in modern machine learning (ML). The choice of an optimization algorithm establishes whether a sufficiently good performance can be obtained in hours or in days. Increasingly, optimization is becoming large-scale due to modern ML models having millions of parameters trained over extremely large datasets. REU students will work on the development of novel advanced optimization algorithms for large-scale ML. Computational Imaging Group (CIG) at WashU has recently developed several new algorithms and the students will contribute to this exciting area by extending our current results. Several applications will be considered, including efficient training of deep neural nets.

Skills Required: Proficiency with Python or Matlab. Familiarity with machine learning. Mathematical maturity to understand optimization algorithms and their analysis.

Computer Vision Methods for Depth and Motion Estimation

Faculty: Ayan Chakrabarti

Students will help with developing new algorithms for estimating depth, shape, motion, and other physical characteristics of objects, from still images and videos. These methods will target applications in robotics and self-driving vehicles, virtual and augmented reality, and image manipulation for graphical design. Over the course of the project, we will study the kind of optical and geometric cues that relate image intensities to physical object properties, and we will see how we can train deep convolutional neural networks to then predict these properties from images and videos.

Skills Required: Mathematical maturity, and experience with programming. Most programming will be in python/Tensorflow, so any prior experience with either is useful but not required. Prior knowledge or coursework in machine learning, computer vision, and probability and statistics will also be useful.

Design and Implementation of Visualization Tools for Home Automation Systems

Faculty: Alvitta Ottley and William Yeoh

Through the proliferation of smart devices (e.g., interconnected programmable thermostats, lights, and washers) in our homes, home automation is becoming inevitable. Home automation is the automated control of the home’s devices with the objective of improved comfort, improved energy efficiency, and reduced operational costs. In this project, we will develop novel visualization tools and interfaces for home automation systems to display proposed schedules of the smart devices as well as enable users to modify the schedules as necessary.

Skills Required: Proficiency with web programming and JavaScript; prior experience with D3 is a plus but not required

Polarization on Social Networks

Faculty: Yevgeniy Vorobeychik

REU participants will develop a model of belief diffusion over social networks where beliefs are represented as vectors in real space (corresponding, for example, to opinions on policy). They will then consider a model of diffusion with skeptical agents, where influence among social network neighbors is weighted as a function of belief proximity. In other words, agents trust their neighbors more if they are closer to them in their views. The students will then study how such models lead to opinion polarization in a population. Upon analysis of polarization, the students will investigate the ability of a malicious external party to induce greater opinion fragmentation and polarization by selecting a small subset of issues to emphasize. Finally, the students will study this model in the context of voting preferences constructed over the issue space, and the impact that polarization (including maliciously induced polarization) can have on election outcomes.

Skills Required: Python programming, linear algebra

Automatic and Robust MRI Segmentation of Human Placenta

Faculty: Miaomiao Zhang

The task of human placenta segmentation of in-utero Magnetic Resonance Images (MRI) is to delineate pathological regions and screen for increased risk of pregnancy complications. Various segmentation techniques that are heavily dependent on users’ initial guidance have been developed in the literature. However, their practical capabilities are highly limited due to the challenges of a substantial amount of time-consuming manual work, as well as the high level of image noise that dramatically decreases the segmentation quality. To address these issues, we are interested in developing a fully automatic and robust segmentation method that can improve the quality of the state of the art.

Skills Required: Students are required to have background in image processing and can program proficiently in C/C++, or Python.

Privacy Implications of Voice-Based Home Automation Systems

Faculty: William Yeoh, Ning Zhang, Silvia Xuan Zhang

Use of voice-based home automation systems (e.g., Amazon Echo, Google Home, Apple HomePod) is growing rapidly in our homes. Despite the convenience afforded through these systems, they have a number of potential security and privacy concerns. For example, privacy of users may be compromised when a sensitive conversation is transmitted to Cloud for voice command identification. Furthermore, it is also possible to extract private context information from the network traffic patterns. In this project, REU participants will evaluate potential losses in privacy and security through the use of such systems using techniques from machine learning and natural language processing.

Skills Required: Computer Networks, Machine Learning & AI

Exploiting Centrality for Fun and Profit – a Vulnerability Research Perspective

Faculty: Silvia Xuan Zhang, Ning Zhang, William Yeoh

Distributed consensus is a cornerstone in many recent innovations in decentralized applications, including Bitcoin and Ethereum. While the distributed protocol has a sound theoretic foundation, the implementation and deployment of such technology has led to many centralized aspects at multiple layers of abstraction, from developer to development environment, from network topology to mining pool. In this project, REU participants will analyze these emerging decentralized platforms through a brand new angle, studying vulnerabilities and impacts of various centrality at different dimensions.

Skills Required: Knowledge in Software security, Proficiency in Computer Programming

Policies for Automated Scientific Discovery

Faculty: Roman Garnett

We will consider an application of active machine learning for automating scientific discovery, using drug discovery as a model problem. REU participants will design intelligent policies for actively querying a large, real-world database of compounds to quickly detect potential drugs. The database contains (short) lists of compounds binding to 120 biological targets of relevance to humans and a background set of 1 million inactive compounds. We will consider “active search:” sequentially testing these compounds to find as many positive examples as possible. There are several numerous outstanding questions for students to pursue. (1) Can we derive useful policies for situations involving dynamic stopping, for example when a target on the number of valuable items to find is given? (2) Can we derive useful policies for cost-sensitive settings, where each experiment may have a different cost? In addition to drug discovery, we will also explore other applications, including an application from materials discovery.

Skills Required: Familiarity with probability 

Interface Design for Eliciting Data from Humans

Faculty: Chien-Ju Ho

There is an increasing amount of human-generated data available on the internet – including datasets labeled using crowdsourcing, reviews for products and restaurants, user search histories, and beyond. This enormous amount of data has created an unprecedented opportunity to solve various computational problems. However, despite the recent impressive progress on learning from human-generated data, researchers often do not have control over the information structure of the data they are learning from. This REU project aims to explore the interface design problem in eliciting data from humans to solve machine learning problems. The focus will be on the perspective of information exchange in the interface design: what information is presented to users and what information is elicited from users. The goal is to explore how we can design optimal data-elicitation interfaces which maximize the performance (e.g., prediction accuracy) of machine learning algorithms while taking into account users’ incentive constraints and behavioral biases.

Skills Required: Mathematical maturity (familiarity with game theory and/or machine learning is a plus); proficiency with C/C++ or Python

Computing on Massive Data Streams using Diverse Parallel Architectures

Faculty: Roger Chamberlain, Jeremy Buhler, Ron Cytron, Angelina Lee

In a world of endless data, efficient computation demands that we harness multiple processing resources in parallel. To crunch through massive data sets efficiently, We can harness and even combine different kinds of parallel hardware — multicore processors, graphics engines, and even highly specialized processors implemented on reconfigurable logic (FPGAs). But how should application programmers write code to exploit these complex, heterogeneous architectures? How much architectural detail should the hardware expose, and how much should be hidden behind software abstractions? What shape should these abstractions take to best help the programmer, and what is their cost to application performance?

In this project, we treat massive data sets as sequential streams of inputs. Applications are designed as pipelines, trees, or graphs of computations that pass streams of data between them. We’ll investigate how to implement and tune high-impact scientific and engineering computations in the streaming paradigm, as well as how this paradigm enables us to expose and exploit these applications’ inherent parallelism. Application domains of particular interest include data cleaning and parsing, astrophysics, computational finance, and bioinformatics.

Skills Required: C++ and facility with basic algorithms and data structures such as sorting, hashing, and graph traversals. Familiarity with a scripting language such as Python or Perl is a plus. Familiarity with OpenCL or CUDA is a plus.

Benchmarking and Performance Evaluation of Interactive Parallel Applications

Faculty: Angelina Lee

Cilk is a C/C++-based multithreaded language that provides a high-level language abstraction for parallel execution. When writing a parallel program in Cilk, the programmer expresses the logical parallelism of the computation, and an underlying runtime scheduler schedules computation in a way that respects the logical parallelism specified by the programmer while taking full advantage of the processors available at runtime. Cilk is originally designed to handle throughput-oriented computations, where the most important performance criterion is to minimize the overall execution time of the computation. We are currently developing interactive Cilk — a parallel platform designed to handle modern desktop applications and web services that are interactive. For such applications, subcomponent of the program may be latency-sensitive, where the computation time of the subcomponent impacts the experience of the user. We are looking for application developers to build benchmark applications and conduct performance evaluation of such interactive parallel applications running on Cilk.

Skills Required:

Familiarity with C/C++; experience with multithreaded programming is a plus but not required.

Uncovering the “Hidden Half” of plants

Faculty: Tao Ju

Roots, the “hidden half” of a plant, play many important roles including physical support of the plant, uptake of water and nutrients, and stabilization of the soil. Their functions, as well as their amazingly complex structures, have intrigued biologists for centuries. With advanced imaging technique like CT and MRI, biologists are finally able to “see” these underground forms in 3D. However, computational methods are needed to extract relevant information from the images, such as identifying root branches, measuring their length and shape, understanding their organization and architecture, and analyzing them over time.

As part of a NSF-funded collaboration between three institutions, the REU will join an interdisciplinary team of computer scientists, mathematicians, and biologists to build automated algorithms and interactive graphical tools for image-based analysis of plant roots. The central theme of the algorithms will be using geometric skeletons, a popular shape descriptor in computer graphics and vision, to model plant roots and to infer biological information.

Skills Required:

Experience with C++ and Python, familiarity with OpenGL, good foundation in algorithms and data structures (particularly those related to graphs).

Exploring the Darkweb

Faculty: Roch Guerin

The Darkweb is a section of the web that is not accessible to standard Internet users (and search engines) and instead requires the use of special software such as the Tor browser.  Sites accessible only in the Darkweb have special names characterized by the .onion domain suffix, and are automatically generated based on a public key when configuring the corresponding onion service.  Because of its opacity and lack of accessibility by search engines and regular web crawlers, the structure of the darkweb is poorly understood.

This project builds on prior work that developed a  web crawlers for the Darkweb and leveraged it to explore its structure (connectivity) and evolution.  The project will focus on exploring temporal patterns in the accessibility of .onion sites and in particular their reliance on using different .onion names over time.  The investigation aims to identify whether different types of darkweb sites exhibit different temporal behaviors, and explore connections between those behaviors and listings of .onion sites in the regular web.

Skills Required:  Strong programming skills in python and javascript. Knowledge of network programming and uses of SOCKS5 proxies.  Experience with web crawlers and Tor desired.

 

Allocating scarce social resources based on predictions of outcomes

Faculty: Sanmay Das

Demand for resources that are collectively controlled or regulated by society, like social services or organs for transplantation, typically far outstrips supply. How should these scarce resources be allocated in terms of both efficiency and equity? In order to answer this question we must define objectives, predict outcomes, and optimize allocations, while carefully considering agent preferences and incentives. We collaborate actively with experts in homelessness services and transplant medicine to examine these questions in the context of actual applications where predictions and counterfactual policies can be evaluated on real-world datasets.

Participants will work on simulating the effects of different interventions (e.g., different types of services and shelters for homeless households, different organ matching protocols for transplantation, etc.) by building and validating realistic counterfactual models on datasets we have been collecting or have access to through collaborations with the Brown School and the Medical School at WashU. They will also analyze effects of different social policies on the entire population to answer questions of efficiency and different subpopulations to answer questions of equity, fairness, and justice.

Text Classification for the Social Sciences

Faculty: Sanmay Das

We will develop methods that use modern machine learning approaches to measure quantities of interest from large volumes of text data, for example, risks of psychiatric problems from social media posts or levels of partisanship or local vs. national focus from different types of political texts.
Skills Required: Proficiency in Python (or another modern programming language like C/C++ or Java). Familiarity with database management and basic machine learning is a plus.

Developing Fair Bayesian Machine Learning Algorithms

Faculty: Roman Garnett

Machine learning algorithms are increasingly being used to make highly sensitive decisions (e.g. credit scoring, hiring employees). In such settings, algorithms are often trained on data biased by the prejudice of previous decisions. Thus, it is important to develop techniques that ensure the algorithms being used make fair predictions to avoid further discrimination against protected groups. This REU project will focus on how Gaussian processes (GPs), a commonly-used, probabilistic machine learning algorithm, can be adapted to make fair predictions in a variety of settings. There are lots of competing definitions of fairness when it comes to machine learning. We will explore which ones can be achieved with Gaussian processes as well as different ways of efficiently achieving approximate fairness.

Skills Required: Familiarity with probability theory and machine learning