List of Publications:
Find all my publications at Google Scholar, DBLP, or Research Gate.
Model AI Assignments
Introduction to Python for Data Science
Marion Neumann, Jonathan Chen in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2019. The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-19). Model AI Assignments Track
We provide an interactive guided lab to introduce Python for data science (DS). We provide two Jupyter notebooks introducing the basics of Python and the DS workflow using the Iris dataset. We interactively introduce expressions, variables, strings, printing, lists, dictionaries, control flow, and functions to students that are already familiar with a programming language from an introductory CS course. The second lab aims at motivating students to acquire skills such as using statistics to model and analyze data, knowing how to design and use algorithms to store, process, and visualize data, while not forgetting the importance of domain expertise. We begin by establishing the example problem to be studied based on the Iris dataset. The next step is to acquire and process the data, where students practice how to load data and how to process strings into numeric arrays. Then, we explain different plotting methods such as box plots, his- tograms, and scatter plots for data exploration. Finally, we split the data into training and test set, build a model, use it for predictions, and evaluate the results. The main learning objectives are to get to know and practice Python in the context of data science and machine learning.
Introducing the Data Science Workflow Using Sentiment Analysis
Marion Neumann, Zac Christensen in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2019. The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-19). Model AI Assignments Track
We provide an interactive guided lab with a follow-up homework assignment to introduce the basic data science work- flow by exploring sentiment analysis. The lab focuses on introducing the machinery using a given dataset of movie reviews and the assignment highlights data acquisition and exploration. After introducing sentiment analysis, we explain a simple rule-based approach to predict the sentiment of tex- tual reviews using three handcrafted examples. This introduction shows how to pre-process text data and how to use lists of positive and negative expressions to compute a sentiment score. Then students will implement the approach to predict the sentiment of movie reviews and evaluate the results. The lab concludes with a discussion of the limitations of the rule-based approach and a quick introduction to sen- timent classification via machine learning. The homework assignment reiterates over the process of building and analyzing a sentiment predictor with the focus on collecting and preprocessing their own dataset scraped from twitter using an API. The main learning objective of this activity is getting to know the inference problem and walking through the entire data science workflow to tackle it. This module only requires minimal programming background and is an ideal precursor to introducing machine learning.
Exploring Unfairness and Bias in Data (A newer version of this is available – please contact me. I am happy to share!)
Jonathan Chen, Tom Larsen and Marion Neumann in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2020. The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-20). Model AI Assignments Track
It is natural to assume that a model built from “real-world” data will inherently represent the world-at-large. However, this is not the case as seen in recent instances of models behaving in unexpectedly biased ways. We believe that one long term solution to this problem is to build a curriculum that inspires students to actively think about their data and the potential for it to be biased. Here, we present a Jupyter Notebook-based assignment exploring how bias can be in- troduced into a model using an example of gender-bias in credit history used to predict creditworthiness. We use an unbalanced data set to demonstrate how model evaluation methods like classification accuracy can be misleading even with standard procedures like dataset splitting and cross-validation, allowing bias to remain undetected. Then we consider whether pre- and post-processing strategies such as ignoring gender altogether will help improve the fairness of our predictions. We conclude by prompting students to discuss whether these methods can mitigate unfairness in AI and to contribute their ideas about how to tackle this problem.
Book Chapter
Chapter 14: Cell Phone Image-Based Plant Disease Classification
In Computer Vision and Pattern Recognition in Environmental Informatics (published by IGI Global, Release Date: September, 2015. Copyright © 2016.)
Marion Neumann, Universtiy of Bonn, Germany
Lisa Hallau, University of Bonn, Germany
Benjamin Klatt, Central Institute for Decision Support Systems in Crop Protection, Germany
Kristian Kersting, TU Dortmund University, Germany
Christian Bauckhage, Fraunhofer IAIS, Germany
Abstract
Modern communication and sensor technology coupled with powerful pattern recognition algorithms for information extraction and classification allow the development and use of integrated systems to tackle environmental problems. This integration is particularly promising for applications in crop farming, where such systems can help to control growth and improve yields while harmful environmental impacts are minimized. Thus, the vision of sustainable agriculture for anybody, anytime, and anywhere in the world can be put into reach. This chapter reviews and presents approaches to plant disease classification based on cell phone images, a novel way to supply farmers with personalized information and processing recommendations in real time. Several statistical image features and a novel scheme of measuring local textures of leaf spots are introduced. The classification of disease symptoms caused by various fungi or bacteria are evaluated for two important agricultural crop varieties, wheat and sugar beet.
JMLR (Machine Learning Open Source Software)
pyGPs – A Python Library for Gaussian Process Regression and Classification
Marion Neumann, CSE, Washington University, St. Louis, MO 63130, United States
Shan Huang, Fraunhofer IAIS, 53757 Sankt Augustin, Germany
Daniel E. Marthaler, Sproutling, San Francisco, CA 94111, United States
Kristian Kersting, CS, TU Dortmund University, 44221 Dortmund, Germany
Abstract
We introduce pyGPs, an object-oriented implementation of Gaussian processes (GPs) for machine learning. The library provides a wide range of functionalities reaching from simple GP specification via mean and covariance and GP inference to more complex implementa- tions of hyperparameter optimization, sparse approximations, and graph based learning. Using Python we focus on usability for both “users” and “researchers”. Our main goal is to offer a user-friendly and flexible implementation of gps for machine learning. Keywords: Gaussian Processes, Python, Regression and Classification