Below is a list of the projects for 2018. Keep an eye on this page for the 2019 projects, which will be posted soon!

Data-centric Approaches to Modeling Individual Behavior in Large-scale Online Social Systems

Faculty: Sanmay Das

REU students will work on different aspects of a project that attempts to unify micro-modeling of agent behavior based on data and large-scale modeling of social systems in which these agents interact. We would like to develop algorithms that use prior information and other side information (for example, mined from the natural language aspects of what the user writes on a website) to build much richer models of opinion that can be applied to web-scale data such as that on Wikipedia, Reddit, or Yelp. Building such models will allow us to study the dynamics of opinion formation and change.

Skills Required: Mathematical maturity for the theoretical project (familiarity with game theory and/or machine learning is a plus); proficiency with Java, C/C++, or Python for the simulation project.

Accelerating Scientific Computations on GPUs with MERCATOR

Faculty: Jeremy Buhler and  Roger Chamberlain

In this project, students will implement algorithms from machine learning and scientific computing on NVIDIA GPUs using MERCATOR, a novel framework being developed by our group to help build complex GPU apps efficiently. Potential target applications include DNA short-read mapping, belief propagation for machine learning, processing data from X-ray telescopes, and N-body simulation. We’ll focus on methods that are both algorithmically non-trivial and highly parallelizable on a GPU.

Skills Required: C++ and facility with basic algorithms and data structures (sorting, hashing, graph, traversal, possibly dynamic programming). Prior experience programming with CUDA is a strong plus but is not required. Familiarity with a scripting language such as Python or Perl is a plus. Prior application-specific background is not required.

Automatically Inferring User Goals with Visualization Systems

Faculty: Alvitta Ottley

Humans are entering a new era in which data increasingly surround us. This data saturation promises new opportunities for increased awareness, more informed decision-making, and enhanced quality of life. Visualization has emerged as a solution to help people explore, reason, and make judgments with data, but there are many open questions related to designing visualization systems to support non-experts with everyday tasks. The goal of this project is to explore how we can leverage interaction data to automatically learn about users’ goals and adapt to suit them. We will apply machine learning techniques to make predictions and explore methods for supporting data exploration.

Skills Required: Proficiency with web programming and JavaScript; Prior experience with D3 is a plus but not required. Familiarity with machine learning and statistics would be beneficial.

Machine Learning for Image Restoration

Faculty: Ulugbek Kamilov

In image restoration, the goal is to build algorithms for clearing images from undesired artifacts such as camera blur or sensor noise. REU students will work on advanced algorithms for image restoration that are based on large-scale optimization and machine learning. We have developed a family of such techniques that use learned information, such as natural image features, to generate clean images from the corrupt ones. REU students will have an opportunity to contribute to this exciting research area and learn the cutting edge imaging algorithms.

Skills Required: Familiarity with image processing and machine learning. Proficiency with MATLAB or Python.

Computational Imaging for Super-resolution Microscopy

Faculty: Ulugbek Kamilov

In 1873, the microscopist Ernst Abbe stipulated a physical limit for the resolution of traditional optical microscopy: it could never become better than 0.2 micrometres. However, the recent progress in our computational abilities is making it possible to circumvented this limitation by designing powerful imaging algorithms. REU students will work on the development of new algorithms for computational microscopy that is based on our recent work on super-resolution. The students will have an opportunity to contribute to this exciting research area and learn the cutting edge algorithms.

Skills Required: Familiarity with image processing and machine learning. Proficiency with MATLAB or Python.

Computer Vision Methods for Depth and Motion Estimation

Faculty: Ayan Chakrabarti

Students will help with developing new algorithms for estimating depth, shape, motion, and other physical characteristics of objects, from still images and videos. These methods will target applications in robotics and self-driving vehicles, virtual and augmented reality, and image manipulation for graphical design. Over the course of the project, we will study the kind of optical and geometric cues that relate image intensities to physical object properties, and we will see how we can train deep convolutional neural networks to then predict these properties from images and videos.

Skills Required: Mathematical maturity, and experience with programming. Most programming will be in python/Tensorflow, so any prior experience with either is useful but not required. Prior knowledge or coursework in machine learning, computer vision, and probability and statistics will also be useful.

Design and Implementation of Visualization Tools for Home Automation Systems

Faculty: Alvitta Ottley and William Yeoh

Through the proliferation of smart devices (e.g., interconnected programmable thermostats, lights, and washers) in our homes, home automation is becoming inevitable. Home automation is the automated control of the home’s devices with the objective of improved comfort, improved energy efficiency, and reduced operational costs. In this project, we will develop novel visualization tools and interfaces for home automation systems to display proposed schedules of the smart devices as well as enable users to modify the schedules as necessary.

Skills Required: Proficiency with web programming and JavaScript; prior experience with D3 is a plus but not required

Big Data Analysis for Active Scientific Discovery

Faculty: Roman Garnett

REU participants will design intelligent policies for actively querying a large, real-world database of compounds to quickly detect potential drugs. The database contains 120 different biological targets of relevance to humans and a background set of 1 million putative inactive compounds gathered from the ZINC database. Along with these, a baseline implementation of a state-of-the-art virtual screening system will be made available for comparison. There are several numerous outstanding questions for students to pursue. (1): Previous work has used simple k-NN models for predicting binding activity. Can we effectively use Gaussian process (GP) models in this context? This will require the development of new methods including (approximate) pruning of the search tree. (2): Previous methods compute a policy for labeling points that does a constant-depth lookahead into the search tree. Can we develop methods that adaptively explore deeper in the search tree to make better decisions? In addition to drug discovery, we will also explore other applications, including an application from materials discovery.

Skills Required: Familiarity with MATLAB and machine learning and mathematical maturity.

Detecting Opportunities to Teach Problem Solving in Code Puzzles

Faculty: Caitlin Kelleher

Looking Glass is a 3D programming environment designed for kids with an online community. With Looking Glass kids can program their own 3D animated stories, remix other programs, and then share their creations to the community. Over the past couple of years, we’ve been exploring code puzzles as a way to help users learn new skills, first focusing on the design of puzzles and the interface support, and then on putting together personalized pathways of puzzles based on an individual’s history. In do that, we’ve identified some behavior patterns that suggest a need for problem solving skills and metacognition. In this project, we’re interested in using log data that we’ve collected from past learning pathways studies to develop new methods for detecting when students need help around problem solving.

Skills Required: Working knowledge of Java is required. Prior experience with statistics, data analysis, user-centered design and machine learning will be beneficial.

Intelligently Segmenting the Long Tail

Faculty: Brendan Juba

Students will develop an application of new algorithms that identify subpopulations for which regression provides low-error predictions. They will investigate the quality of models produced and overall proportion of the population covered by the discovered segments on some real world domain, as compared to standard clustering techniques. For example, we might consider the domain of providing personalized medicine: For a complex, heterogeneous disease like cancer, we might seek to use patient records to pick out subpopulations for which we can effectively model the risk factors or progression of the disease. Along the way, participants will learn to use standard data science tools, and will gain experience in handling real datasets. Students will also be encouraged to experiment with variants of the proposed algorithms to try to improve the quality of models and/or coverage of the population achieved.

Skills Required: Comfort with statistics; proficiency with Python or MATLAB (Java or C/C++ also OK).

Tool Support for Parallel Programming

Faculty: I-Ting Angelina Lee

Cilk is a C/C++-based multithreaded language that provides a high-level language abstraction for parallel execution. When writing a parallel program in Cilk, the programmer expresses the logical parallelism of the computation, and an underlying runtime scheduler schedules computation in a way that respects the logical parallelism specified by the programmer while taking full advantage of the processors available at runtime.

We are currently developing a set of dynamic analysis tools for debugging Cilk programs as well as a framework for supporting these tools efficiently. A dynamic analysis tool works by gathering detailed information about the computation as the program executes. Interesting algorithmic and data structure questions arise in such a setting, since we want to minimize both the space it takes to store the logged information and the time it takes to access the information, and there are opportunities for optimizations due to how a Cilk computation operates. Students will work with the PI to design and implement different data structures and perform experimental study to validate the design.

Skills Required: Familiarity with C/C++ is required; experience working with modest-sized code base and / or with parallel programming (in any language or any platform) is a plus but not required.

Executing Big Data Applications on Heterogeneous Architectures

Faculty: Roger Chamberlain and Ron Cytron

Students will implement a set of big data applications in the Auto-Pipe and ScalaPipe development environments, assessing (via measurement and modeling) the performance of these applications on a variety of heterogeneous computer architectures. The two development environments support streaming data computation on traditional multicores, graphics engines, and reconfigurable logic. Targeted applications include astrophysics [Tyson 2008], computational biology [Jacob 2008], and computational finance, each of which can be characterized by large data streams that must be considered in their entirety to answer the scientific question(s) of interest.

Skills Required: C++ and facility with basic algorithms and data structures (sorting, hashing, graph, traversal, possibly dynamic programming). Familiarity with a scripting language such as Python or Perl is a plus. Prior biology background is not required.

Adaptive Parallel Real-Time Computing

Faculty: Chris Gill, Kunal Agrawal, and Chenyang Lu

Adaptive scheduling techniques such as the elastic task model have not yet been adapted to parallel real-time tasks or to concurrency platforms that support them. To support a new generation of adaptive parallel real-time systems, we are working to generalize and expand support for task elasticity in parallel real-time scheduling techniques, and to develop platform support for those techniques atop modern multi-core hardware so that the rates at which parallel real-time tasks are released, and the numbers of cores on which they run, can be adapted dynamically at run-time.

To validate and leverage these new capabilities, we will apply the techniques and platforms developed in this work to real-time hybrid simulation (RTHS) experiments that are relevant to structural and earthquake engineering. Dynamic reallocation of resources allows for larger structures to be simulated under more realistic (intense and dynamic) workloads. We will also explore how these new capabilities may improve mixed-criticality systems in which some tasks are innately more important than others, including for system recovery and for new system models in which the criticality of each task may change depending on the current mode of the system. We will realize and validate these capabilities within APaRTEC, a novel adaptive concurrency platform framework we are developing, in which a wide variety of parallel real-time applications can adapt their computational and/or temporal resolution using a variety of parallel scheduling techniques.

Skills Required: We welcome highly-motivated students with an interest in multi-threaded and parallel programming.  Experience with C and/or C++, and with operating systems and/or multi-threaded programming are helpful.

Interested students should please contact Prof. Chris Gill ( for more information.

Mesoscale Power Orchestration

Faculty: Xuan ‘Silvia’ Zhang and Chris Gill

What do Samsung Galaxy Note 7, Telsa Model S, and the International Space Station have in common? They are all complex systems that have experienced disastrous battery-related accidents due to mismanaged power/energy delivery. In our NSF-funded research project, we are working on solving the crucial problem of how to orchestrate power and energy distribution in a modular and intelligent manner, so that systems big or small will no longer be plagued by currently unreliable power management solutions.

We believe the new solution involves a layered approach that requires synergistic coordination between the malleable power electronic hardware, the smart and safe control scheme, and the software-assisted orchestration framework that oversees the real-time interactions between the physical domain and the computational domain. Ultimately, we envision such intelligent and modular approach to power and energy distribution will benefit emerging artificial intelligent applications such as unmanned autonomous drones and self-driving cars, where safe and efficiency management of power/energy matters a great deal.

To learn more about our research vision, please check out the recent WashU SEAS news article describing our project.  Potential activities include, but are not limited to:

– implement and characterize power consumption of a camera-guided robot performing simultaneous localization and mapping (SLAM) and other AI algorithms
– use FPGA to control power switches and implement buck or boost converter using discrete elements, predict power consumption

– model power behavior of a mobile robot with solar cell and recharging stations
– reinforcement co-learning of robotic and power controller
– design and implementation of systems software for coordinating power behavior

Skills Required: We welcome highly-motivated students who are interested in this research, and who have experience relevant to the above topics to join our labs and gain experience working on this new technology first hand.

Please contact Prof. Xuan ‘Silvia’ Zhang ( )  or  Prof.  Chris  Gill  ( )  for  more  information.

Machine learning with strategic/biased data

Faculty: Chien-Ju Ho

There is an increasing amount of human-generated data available on the internet — including online reviews, user search histories, and human-labeled datasets. This enormous amout of data has created an unprecedented opportunity for machine learning research. On the other hand, human-generated data also creates unique challenges. Humans might be strategic or biased in generating data. In this project, students will explore various aspects of human biases in data generation and develop algorithms that can detect and/or counter the biases in machine learning.

Skills Required:  Mathematical maturity for the theoretical project (familiarity with game theory and/or machine learning is a plus); proficiency with Java, C/C++, or Python for the simulation project.