*** I will not be taking any students for projects in Fl22 or Sp23. ***

I am happy to advise master’s and independent study projects in CSE. Here is a list of topics I am interested in:

    • lab and project development: “Creating labs and assignments for a new course on large-scale data analysis and machine learning using pySpark
    • autograder project: “Developing Gradescope autograder capabilities for cse416a
    • educational research: “Sentiment Analysis for CS Education
      • Techniques
        • text mining
        • sentiment classification
        • opinion mining
        • exploratory data analysis/statistics/visualization
      • Analyzing Student Data
    • graph-based ML: applications of graph-based machine learning using graph kernels or graph convolutional neural networks
      • text classification based on graph-of-word representations
      • social network analysis
      • classification/regression tasks on biological networks
      • images and 3d objects represented as graphs

Selected Previous and Current Projects

Visualizing Written Student Feedback and Emotions with Sasha Chackalackal and Nash Solon

Our goal for this project was to create an interactive visualization of the sentiment of each student’s homework reflections. Specifically, we wanted to be able to see the distribution of the student’s sentiments as predicted by VADER (Valence Aware Dictionary and sEntiment Reasoner) and self-reported by the students alongside the original text.

Predicting Student Emotions from Written Feedback using Crowd Sourcing and Machine Learning with Robert Kasumba

With the increasing enrollment numbers into popular computer science courses, there is a need to bridge the similarly increasing feedback gap between individual students and course instructors. One way to address this challenge is for instructors to collect feedback from students in form of textual reviews or unit-of-study reflections – however, manually reading these reviews is time-consuming, and self-reported Likert scale responses are noisy. Rule- based approaches to sentiment analysis such as VADER (Valence Aware Dictionary and sEntiment Reasoner) have been used to capture the sentiments conveyed in textual feedback, they however fail to capture contextual differences as many words have different sentiments in different contexts. In this work, I investigated the use of supervised machine learning approaches and compared their performance in predicting the sentiment in student feedback collected in large computer science classes with the lexicon-based approach VADER. I found that machine learning models trained solely on student self-reported sentiment ratings were only comparable with a balanced accuracy of 73.8% versus 73% (VADER) . However, a hybrid approach using the VADER score as a feature and training using the student self- ratings performed better than VADER alone. Using better quality labels collected through a crowdsourcing experiment led to the best machine learning model performance.

Capturing Student Feedback and Emotions in Large Computing Courses: A Sentiment Analysis Approach (SIGCSE TS 2021 paperpresentation) with Robin Linzmayer

When comparing the emotions students had when working on the assignments with the grades they achieved, we observe that a larger fraction of female students have significantly higher grades and a larger fraction of male students have significantly higher emotions as shown in Figure below (middle and right). This means that more female students have high assignment grades but lower emotion scores as shown in the left panel.

Text Classification with Graph Convolutional Neural Network by Walter Wang

  • This project aims to achieve traditional text classification via a Neural Network approach, where each word and document are embedded as nodes in a graph and send into Convolutional Neural Network for classification.
  • Reference: https://arxiv.org/abs/1809.05679

Mat2Py software projectWe ported cse517a machine learning implementation projects and autograder from MATLAB to Python.

Sentiment Analysis on Homework Reviews by Zac Christensen

  • Some assignments (e.g. hw2 vs. hw4) are perceived better/worse than others by students:
  • Sentiment/Emotions do not (or only weakly) correlate with assignment grades:

Study of PATCH-SAN: Learning CNNs from Graph Inputs by Yufei Zhou (code on GitHub)

Boulder Finder: Discovering Boulder Areas from LiDAR Topography Data – A Detection-Classification-Clustering Approach by Eliot Padzensky 

This project was done as an independent study with the aim of using Geographic Information Systems (GIS), big data analysis, computer vision and machine learning to search large amounts of LIDAR topographic data for clusters of shapes resembling boulders. Ultimately, we were able to identify previously unknown locations suitable for establishing new rock climbing destinations.

The algorithm was executed using data for a county in Illinois containing a popular established bouldering area. This area was successfully identified by the algorithm. Moreover, a nearby region was also marked as containing a large amount of boulders. Manual inspection of maps as well as an in-person visit confirmed that this was indeed a previously undiscovered bouldering area. This is incredibly encouraging and demonstrates the potential of this project.

Application of graphs to textural document categorization by Yu Sun (code on GitHub)

Graph Convolutional Neural Networks by Muhan Zhang (code on GitHubAAAI paper)

Developing and Teaching a Course on Topological Methods for Data Analysis and Machine Learning by Brad Flynn (contact us if you are interested in the course materials)

The goal of the project was to design and teach a course that provides computer science students with a light understanding of topology and it’s applications for data analysis and machine learning.

Lung Cancer Detection using Convolutional Neural Networks by Jingyu Xin (code on GitHub

Exploring Deep Learning for Image Generation by Zimu Wang (code and description on GitHub