**This is an inactive course webpage**

Lectures: TUE/THU 10-11:20am in Seigle 103

Instructor: Marion Neumann
Office: Jolley Hall Room 222
Contact: Please use Piazza!
Office Hours: TUE 11:30am-12:30pm or individual appointment (request via email – allow for 1-2 days to reply and schedule)
Please, avoid random drop ins outside my office hours.

Assistant to the Instructor: Raj (office: Jolley 217B)

TAs: Josh (Feiyang), Sam C, Sam G,

TA/AI Office Hours:MON 3-5pm (Josh, Sam G) in Crow 204 WED 5:30-7:30 (Sam C, Sam G) in Eads 216 FRI 2-4 (Raj) in Lopata 202

We will use the representative power of graphs to model networks of social, technological, or biological interactions. Network analysis provides many computational, algorithmic, and modeling challenges. We begin by studying graph theory, allowing us to quantify the structure and interactions of social and other networks. We will then explore how to practically analyze network data and how to reason about it through mathematical models of network structure and evolution. Another main objective will be to investigate algorithms that extract basic properties of networks in order to find communities and infer node properties. Finally, we will study a range of applications including robustness and fragility of networks such as the internet, spreading processes used to study epidemiology or viral marketing, and the ranking of webpages based on the structure of the webgraph.

This course combines concepts from computer science and applied mathematics (matrix algebra and optimization) to study networked systems using data mining.

Prerequisites: CSE 240 (discrete maths/proofs), CSE 247, ESE 326 (prob/stats), MATH 309 (matrix algebra), and programming experience (note: you will need to write programs to parse data and analyze networks using Python)

Syllabus

Lectures
Lectures will be held every Tuesday and Thursday 10-11:20am in Cupples I / 115.

Course Content
Find a list of topics to be covered on the course Roadmap!

Homework Assignments 
There will be homework assignments consisting of written and coding problems:

  • should be worked on in groups of up to (not more than) 2 students
  • submit via Gradescope
  • ca. 5-6 assignments (each weighted equally)
  • contribute 50% towards your total course performance

Homeworks will be assigned concurrently to the lecture sessions covering the respective materials. Due dates will be indicated on the course webpage under homework assignments. It is every student’s responsibility to meet the submission requirements and deadlines. We cannot accept late submissions and submissions that do not follow the submission instructions for no reason (see also Late Policy below).

Each homework assignment will be graded and the total grade achieved for all homework assignments (no drops, no make-ups) will contribute 50% towards your total course performance.

Regrade Requests
Any regrade requests and claims of missing scores will have to made within one week of the grade announcement. We will not take any regrade requests after this period for no reason. Grade announcements and grading comments will be provided via Gradescope. All grades will be maintained on Blackboard. Regrade submissions should be exclusively done via Gradescope.

In-class Exam
There will be one written in-class exam contributing 25% towards your total course performance. Date:

  • exam: THU 14 Nov 2019 10-11:20am in-class
  • no final exam

Final Project 
There will be a final project assigned in the 2nd half of the course. Due date:

  • FRI 13 Dec 2019 6pm (no extension)

The final project contributes 25% towards your total course performance. It will be graded on a 0-100% score scale and grades will be assigned based on both, its implementation component and its conceptual component (a report motivating, describing, and analyzing the project and a discussion of its experimental results and impact).

Non-curricular Activities
We can not offer accommodation for examinations and given deadlines for non-curricular activities outside your Wash U commitments. This includes job interviews or flying home early. I understand that you may decide to miss a scheduled exam date for these reasons, but you will need to weigh the consequences when making such a decision.

Grading Summary
50%  homework assignments
25%  exam
25%  final project

Final course grades will be assigned using the following straight scale:

Letter GradeCutoff Percentage
A93%
A-90%
B+87%
B83%
B-80%
C+77%
C73%
C-70%
D+67%
D63%
D-60%
F< 60%

Late Policy
Your homework assignments must be turned in on time. There are absolutely no makeup assignments for any reason. You get an automatic 3 day extension on every homework.
WARNING: there is absolutely NO extension to this extension for NO reason!

Collaboration Policy
You are encouraged to discuss the course material with other students. Discussing the material, and the general form of solutions to the labs is a key part of the class. Since, for many of the assignments, there is no single “right” answer, talking to other students and to the TAs is a good thing. However, everything that you turn in should be your own work, unless we tell you otherwise. If you talk about assignments with another student, then you need to explicitly tell us on the hand-in. You are not allowed to copy answers or parts of answers from anyone else, or from material you find on the Internet. This will be considered as willful cheating, and will be dealt with according to the official collaboration policy:

Academic Integrity
Unless explicitly instructed otherwise, everything that you turn in for this course must be your own work. If you willfully misrepresent someone else’s work as your own, you are guilty of cheating. Cheating, in any form, will not be tolerated in this class.

Checkout these questions and answers in the CSE FAQ.

There is zero tolerance of Academic Dishonesty. I will be actively searching for academic dishonesty on all homework submissions and the exam. If you are guilty of cheating on any assignment or exam, you will receive an F in the course and be referred to the School of Engineering Discipline Committee. In severe cases, this can lead to expulsion from the University, as well as possible deportation for international students. If you copy from anyone in the class both parties will be penalized, regardless of which direction the information flowed. This is your only warning.

Please refer to the University Undergraduate Academic Integrity Policy, for more information. If you suspect that you may be entering an ambiguous situation, it is your responsibility to clarify it before the professor or TAs detect it. If in doubt, please ask us.

Providing/Posting Solutions
Providing your course work (written or code) in any form to others is a violation of the academic integrity policy. If you provide your solutions to someone else in the course or post them publicly onlineyou are guilty of violating our academic integrity policy. Such a case will be treated the same way as described above and prosecution will also take place after finishing the course or even graduating form Wash U.

Mental Health
Mental Health Services professional staff members work with students to resolve personal and interpersonal difficulties, many of which can affect the academic experience. These include conflicts with or worry about friends or family, concerns about eating or drinking patterns, and feelings of anxiety and depression. See: http://shs.wustl.edu/MentalHealth

If you have any problems with the workload of this class, please come and talk to me. The earlier we talk the better.

Accommodations based upon sexual assault
The University is committed to offering reasonable academic accommodations to students who are victims of sexual assault. Students are eligible for accommodation regardless of whether they seek criminal or disciplinary action. Depending on the specific nature of the allegation, such measures may include but are not limited to: implementation of a no-contact order, course/classroom assignment changes, and other academic support services and accommodations. If you need to request such accommodations, please direct your request to Kim Webb (kim_webb@wustl.edu), Director of the Relationship and Sexual Violence Prevention Center. Ms. Webb is a confidential resource; however, requests for accommodations will be shared with the appropriate University administration and faculty. The University will maintain as confidential any accommodations or protective measures provided to an individual student so long as it does not impair the ability to provide such measures.

If a student comes to me to discuss or disclose an instance of sexual assault, sex discrimination, sexual harassment, dating violence, domestic violence or stalking, or if I otherwise observe or become aware of such an allegation, I will keep the information as private as I can, but as a faculty member of Washington University, I am required to immediately report it to my Department Chair or Dean or directly to Ms. Jessica Kennedy, the Universitys Title IX Coordinator. If you would like to speak with the Title IX Coordinator directly, Ms. Kennedy can be reached at (314) 935-3118, jwkennedy@wustl.edu, or by visiting her office in the Womens Building. Additionally, you can report incidents or complaints to Tamara King, Associate Dean for Students and Director of Student Conduct, or by contacting WUPD at (314) 935-5555 or your local law enforcement agency.

You can also speak confidentially and learn more about available resources at the Relationship and Sexual Violence Prevention Center by calling (314) 935-8761 or visiting the 4th floor of Seigle Hall.

Bias Reporting 
The University has a process through which students, faculty, staff and community members who have experienced or witnessed incidents of bias, prejudice or discrimination against a student can report their experiences to the Universitys Bias Report and Support System (BRSS) team. See: http://brss.wustl.edu

Roadmap

Topics and order are subject to changes as the semester evolves.

I   Networks & Graph Theory

  • complex systems and networks
  • graph representation, notation and definitions
  • degree distribution, paths, distance distribution, diameter, connectedness
  • application: social network analysis
    • triadic closure, clustering coefficient
    • strong and weak ties
    • homophily and social-affiliation networks
  • node centrality and importance
  • network robustness and resilience

II   Network Models

  • small-world phenomenon
  • random graph model (Erdös-Rényi graphs)
  • Watts-Strogatz model
  • rich-get-richer phenomenon
  • fitness model, Barabási-Albért graphs, and scale-free networks

III  Analyzing Networks: Graph Mining

  • application: community detection
    • betweenness
    • Girvan-Newman algorithm
    • graph clustering, graph partitioning
  • triangles, cliques, and paths
  • application: structure of the web & link analysis
    • web graph
    • hubs and authorities (HITs)
    • PageRank

IV  Modeling Dynamics: Information Cascades, Link Prediction, Epidemics

We can choose some topics to study in more detail (time permitting).

  • information cascades, diffusion models: how does information spread?
  • link prediction, supervised random walks: how do friendship suggestions work?
  • epidemic models: how do diseases spread?

V  (Machine) Learning with Graphs: Node and Graph Classification

We can choose some topics to study in more detail (time permitting).

  • node similarity and node classification
  • graph similarity and graph classification
  • graph and node embeddings
Course calendar and reading

(*)indicates more advanced reading for the interested student

_______

Topic

Materials

27 Aug Course Overview, Syllabus
29 Aug Introduction

  • complex systems
  • networks
  • network analysis

Motivation: Seven Bridges

Part I: Networked Data & Graphs

slides

3 Sept
  1. Definitions and Notation
  2. Types of Graphs
5 Sept
  1. Data Structures for Graphs
  2. Paths and Connectivity
  • [NCM] Chapter 2.2
  • [CNA] Ch11 – Connected Components
10 Sept

12 Sept

  1. Distances
    • Dijkstra (single source SP)
    • Floyd-Warshall (all pairs SP)
  2. Case Study: Small World
    • Global Social Network
    • Six-Degrees of Separation
    • CNA Measures
    • Social Interpretations
    • Beyond SNs
17 Sept
  1. Node Importance
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
  2. Case-Study: Robustness of the Internet
19 Sept
  1. Network Structure (Microscopic)
    • Clustering Coefficient
    • Triadic Closure
    • Strength of Weak Ties
    • Neighborhood Overlap
24 Sept
  1. Network Structure and Context (Macroscopic)
    • Assortativity
    • Surrounding Context & Homophily
  • [CNA] Ch8 – Measuring Networks
    • Estimate Network Uniformity Through Assortativity
  • Lecture notes by Aaron Clauset
  • [NCM] Ch4-4.2

Part II: Network Models

slides [Raj], slides [MN]

1 Oct

 

  1. Random Graph Model
    • Definition
    • Degree Distribution
    • Clustering Coefficient
    • Avg. Path Length
    • Giant Component
    • Evolution & Phase Transition
    • Discussion
  • [SMM] Ch4.2
  • [CNA] 
    • Ch6 – Appreciate Synthetic Networks
    • Ch7 – Generate Synthetic Networks
  • [demo] Complex Networks
3 Oct
  1. Small World Model
    • Definition
    • Avg. Path Length
    • Clustering Coefficient
    • Discussion
8 Oct
  1. Scale-Free Networks
    • Power-Law Distribution
    • Detection
    • Exponent Estimation
10 Oct
  1. Preferential Attachment
    • Model
    • Rich-get-richer
    • Power-law Degree Distribution
    • Clustering coefficient
    • Avg. path length
15 Oct  FALL BREAK – no class

Part III: Graph Mining

slides

17 Oct
  1. Motifs & Graphlets
  2. Communities
  3. Betweenness-based Clustering
    • Girvan-Newman Algorithm
    • Modularity
22 Oct

24 Oct

 

  1.  Modularity Maximization
    • Mathematical Derivation
    • Algorithm
29 Oct
  1.  Spectral Clustering
    • Laplacian Matrix
    • Algorithm
31 Oct
  1. Overlapping Communities
    • Clique Percolation Method
    • Finding Cliques
5 Nov
  1. Node Similarity
    • Structural & Regular Equivalence
    • (Markov) Random Walks
    • SimRank
    • Application: Link Prediction
7 Nov
  1. Node Classification
    • Problem Introduction
    • Label Propagation
12 Nov Exam Review
  • summary slides for all slides linked above
14 Nov In-Class EXAM

Part VI: (More) Applications

slides

19 Nov

21 Nov

  1. Spreading Processes
  2. Disease Spread and Epidemics
  3. Cascading Behavior
  4. Graph Classification
26 Nov Networks in AI

  • Natural Language Processing
  • Social and Commonsense Reasoning
  • Adversarial Learning / Security
28 Nov  THANKSGIVING – no class

Semester Summary

slides

3 Dec FINAL PROJECT presentations
  • project presentations schedule and slide submission instructions: here
5 Dec FINAL PROJECT presentrations
  • project presentations schedule and slide submission instructions: here
13 Dec FINAL PROJECT 

  • report due at 8pm
  • no extension
Homework assignments
SUBMISSION INSTRUCTIONS

(violations will result in a penalty on the hw grade)

  1. PDF submission for written homework
    • Find a tutorial on submitting a PDF to Gradescope HERE or watch this video.
    • Match Pages: In Gradescope every page needs to be matched to the problems it contains.
    • -10% penal on the assignment score, if the pages are not (or incorrectly) matched.
  2. Code submission
    • submit the Python notebook (.ipynb file) via file upload
    • do not create a zip file
    • do not add or delete cells in the notebook
    • add you name to the filename replacing YOUR_NAME
      • e.g.hw0_MarionNeumann.ipynb
      • or hw0_MarionNeumann_AnnaLee.ipynb for teams
  3. Gradescope Group Submission (for both written and code submissions)
    • In Gradescope both group members need to be added to the submission.
    • -10% for all team members if your group’s submission is not a group submission listing both team members
    • Find a tutorial on how to add a group member to your submission in the second half of this video.

Assignments and Deadlines

  • 08/29 hw0
    • due: 09/05 at 11:59pm
    • Helpful resources
    • Add your name (FirstLast) to your submission’s file name.
      •  e.g.hw0_MarionNeumann.ipynb
      • or hw0_MarionNeumann_AnnaLee.ipynb for teams
    • submit python notebook via Gradescope
  • 09/10 hw1
    • due: TUE 09/24 at 11:59pm
    • submit written answers via Gradescope (hw1)
    • submit python notebook via Gradescope (hw1_code)
      • Add your name (FirstLast) to your submission’s file name.
        •  e.g.hw1_p3_MarionNeumann.ipynb
        • or hw1_p3_MarionNeumann_AnnaLee.ipynb for teams
  • 09/24 hw2
    • due: 10/08 at 11:59pm
    • submit written answers via Gradescope (hw2)
    • submit python notebook via Gradescope (hw2_code)
  • 10/08 hw3
    • due: 10/22 at 11:59pm
    • submit written answers via Gradescope (hw3)
    • submit python notebook via Gradescope (hw3_code)
  • 10/22 hw4
    • due: 11/05 at 11:59pm
    • submit written answers via Gradescope (hw4)
      • will be graded by Nov 11th (before the exam)
    • submit python notebook via Gradescope (hw4_code)
    • extension: submit written answers via Gradescope (hw4_extended)
      • not graded before the exam
      • not graded if we received a submission for hw4
  • 11/05 hw5
    • due: 11/19 at 11:59pm
    • submit written answers via Gradescope (hw5)
    • submit python notebook via Gradescope (hw5_code)
Resources and HowTos

Books

Prereqs Refresher

Datasets and Code

Ideas for Final Project

***FEATURED***
  • Fairness: Homophily and the Glass Ceiling Effect in Social Networks: paper
  • Crazy idea: Build, Run, and Organize your own Social Network: HowTo
  • Motif Algorithms: Building blocks of biological networks: a review on major network motif discovery algorithms: paper
  • CNA in Biology: Evolution of resilience in protein interactomes across the tree of life: paper
  • CNA in Biology, Node Classification: Graphlet Kernels for Prediction of Functional Residues in Protein Structures: main paperpotentially useful paper
  • Graph-based Machine Learning, Graph Classification: Efficient graphlet kernels for large graph comparison: paper
  • Graphs in NLPWordNet
  • Graphs for Knowledge RepresentationConceptNet
OTHERS
  • node2vec: Scalable Feature Learning for Networks: paper
  • PageRank: Datasets and Code Collection
  • Directed Networks/PageRank/HITs Algorithm etc. [CNA] Ch V (+ online resources)
  • Deep Graph: paper and toolbox
  • Preferential Attachment Model with Triads: paper
  • Empirical Comparison of Distributed Graph Storage Patterns: paper
  • Empirical Comparison of Algorithms for Network Community Detection: paper
  • Finding All Maximal Cliques in Very Large Social Networks: paper
  • GraphX: paper1paper2documentation
  • Other Graph Libraries: iGraph, graph-tool, NetworKit: cf. [CNA] Ch2 and Appendix for a start
  • Networks based on Co-Occurrences: [CNA] Part III (+Case Study)
  • Similarity-based Networks: [CNA] Part IV
  • Bi-partite Networks: [CNA] Part IV (+ Case Study)
  • Community Detection via k-means clustering on graph spectrum: implementation paper
  • Link Prediction: SFI lecture notespaper, this is an exciting application with many resources online!
  • Node Classification – especially interesting if you are familiar with machine learning/kernel methods
  • Graph Classification – especially interesting if you are familiar with machine learning/kernel methods: slides
  • Applications of CNA in Biology, Physics, Business, Medicine

Let me know if you have ideas on interesting studies/topics to add!

Python

We will use Python and NumpyScipyMatplotlib, and NetworkX. All those packages are included in the Anaconda package. Follow these instructions to get everything installed.

VERSIONS

It’s recommended to go with the newest versions included in Anaconda. If you have an up and running Python installation (and are capable to manage dependencies yourself), feel free to use any of the following Python versions: 3.5 or higher and the respective compatible versions for the packages listed above. Note that the [CNA] book uses Python 3.xNetworkX 1.11, Matplotlib 1.5.1, Numpy 1.11.3, and Scipy 0.18.1.

GRAPH LIBRARIES
  • NetworkX: all purpose graph library implemented for and in Python
  • SNAP.py: good for more complex algorithms and large networks (written in C++)
  • Gephi: good for network visualizations and basic measurements
JUPYTER NOTEBOOKS 

Jupyter notebooks (included in the Anaconda package) will be useful to explore the [DSCN] code and also for developing your homework solutions. HERE is some more information on how to get started with Jupyter.

PYTHON TUTORIALS AND RESOURCES

Gradescope

We will use Gradescope for written homework submissions and all homework grading. Find a tutorial on submitting a PDF to Gradescope HERE. You will be automatically added to Gradescope via Canvas.

Git

If you are not familiar with git, take some time and learn about it. Using git as a collaboration tool (for your team work on the assignments and project) rather than just a way to submit your solution is highly beneficial!!! Learn git while playing a game!

GIT HELP VIDEOS FROM CSE131

Please, ignore all the cse131 specific parts.

Using git: loading your repository, making changes, commit/pushing (start watching from minute 1:35)

How to get unstuck if you can’t commit/push:

Please ask any questions related to the course materials and homework problems on Piazza. Other students might have the same questions or are able to provide a quick answer.
Any public postings of (partial or full) solutions to homework problems (written or in form of source or pseudo code) will result in a grade of zero for that particular problem for ALL students in the course.