CSE 416a Resources

Books

Prereqs Refresher

Datasets and Code

  • Data Science & Complex Networks [DSCN] code on GitHub

Ideas for Final Project

***FEATURED***
  • Fairness: Homophily and the Glass Ceiling Effect in Social Networks: paper
  • Crazy idea: Build, Run, and Organize your own Social Network: HowTo
  • Motif Algorithms: Building blocks of biological networks: a review on major network motif discovery algorithms: paper
  • CNA in Biology: Evolution of resilience in protein interactomes across the tree of life: paper
  • CNA in Biology, Node Classification: Graphlet Kernels for Prediction of Functional Residues in Protein Structures: main paper, potentially useful paper
  • Graph-based Machine Learning, Graph Classification: Efficient graphlet kernels for large graph comparison: paper
  • Graphs in NLP: WordNet
  • Graphs for Knowledge Representation: ConceptNet
Others
  • node2vec: Scalable Feature Learning for Networks: paper
  • PageRank: Datasets and Code Collection
  • Directed Networks/PageRank/HITs Algorithm etc. [CNA] Ch V (+ online resources)
  • Deep Graph: paper and toolbox
  • Preferential Attachment Model with Triads: paper
  • Empirical Comparison of Distributed Graph Storage Patterns: paper
  • Empirical Comparison of Algorithms for Network Community Detection: paper
  • Finding All Maximal Cliques in Very Large Social Networks: paper
  • GraphX: paper1, paper2, documentation
  • Other Graph Libraries: iGraph, graph-tool, NetworKit: cf. [CNA] Ch2 and Appendix for a start
  • Networks based on Co-Occurrences: [CNA] Part III (+Case Study)
  • Similarity-based Networks: [CNA] Part IV
  • Bi-partite Networks: [CNA] Part IV (+ Case Study)
  • Community Detection via k-means clustering on graph spectrum: implementation paper
  • Link Prediction: SFI lecture notes, paper, this is an exciting application with many resources online!
  • Node Classification – especially interesting if you are familiar with machine learning/kernel methods
  • Graph Classification – especially interesting if you are familiar with machine learning/kernel methods: slides
  • Applications of CNA in Biology, Physics, Business, Medicine

Let me know if you have ideas on interesting studies/topics to add!

Python

We will use Python and Numpy, Scipy, Matplotlib, and NetworkX. All those packages are included in the Anaconda package. Follow these instructions to get everything installed.

Versions

It’s recommended to go with the newest versions included in Anaconda. If you have an up and running Python installation (and are capable to manage dependencies yourself), feel free to use any of the following Python versions: 3.5 or higher and the respective compatible versions for the packages listed above. Note that the [CNA] book uses Python 3.xNetworkX 1.11, Matplotlib 1.5.1, Numpy 1.11.3, and Scipy 0.18.1.

Graph Libraries
  • NetworkX: all purpose graph library implemented for and in Python
  • SNAP.py: good for more complex algorithms and large networks (written in C++)
  • Gephi: good for network visualizations and basic measurements
Jupyter notebooks 

Jupyter notebooks (included in the Anaconda package) will be useful to explore the [DSCN] code and also for developing your homework solutions. HERE is some more information on how to get started with Jupyter.

Python tutorials and Resources

Gradescope

We will use Gradescope for written homework submissions and all homework grading. Find a tutorial on submitting a PDF to Gradescope HERE. You will be automatically added to Gradescope via Canvas.

Git

If you are not familiar with git, take some time and learn about it. Using git as a collaboration tool (for your team work on the assignments and project) rather than just a way to submit your solution is highly beneficial!!! Learn git while playing a game!

Git Help Videos from CSE131

Please, ignore all the cse131 specific parts.

Using git: loading your repository, making changes, commit/pushing (start watching from minute 1:35)

How to get unstuck if you can’t commit/push: