Books
- Course Book
- [CNA] Complex Network Analysis in Python: Recognize → Construct → Visualize → Analyze → Interpret by Dmitry Zinoviev
- source code and materials
- electronic copy through the Wash U library for viewing online
- Online Books
- [NCM] “Networks, Crowds, and Markets: Reasoning about a Highly Connected World” book and class taught at Cornell by David Easley and Jon Kleinberg
- [MMDS] “Mining of Massive Data Sets” book and “Analysis of Networks” class taught at Stanford by Jure Leskovec
- [SMM] Social Media Mining: An Introduction by Reza Zafarani, Mohammad Ali Abbasi, andHuanLiu. Cambridge University Press, 2014.
- Additional Books (not required for course)
- Network Science by Albert-László Barabási. Cambridge University Press, 2017.
- Networks: An Introduction by Mark Newman (this is a very intense book most likely beyond what we need for this course)
- “Data Science & Complex Networks” by Guido Caldarelli and Alessandro Chessa (pretty basic but nice use cases)
Prereqs Refresher
- Probability, Linear Algebra and Proof Techniques: Review from Stanford Analysis of Networks class
Datasets and Code
- Network Data compiled by Mark Newman
- Stanford Large Network Dataset Collection compiled by Jure Leskovec and Andrej Krevl
- Koblenz Network Collection
- Stanford CS224W Interesting Datasets
- Open Graph Benchmark at KDD Cup 2021
- TUDatasets graph database: http:// graphlearning.io
- Data Science & Complex Networks [DSCN] code on GitHub
Ideas for Final Project
***FEATURED***
- Fairness: Homophily and the Glass Ceiling Effect in Social Networks: paper
- Crazy idea: Build, Run, and Organize your own Social Network: HowTo
- Motif Algorithms: Building blocks of biological networks: a review on major network motif discovery algorithms: paper
- CNA in Biology: Evolution of resilience in protein interactomes across the tree of life: paper
- CNA in Biology, Node Classification: Graphlet Kernels for Prediction of Functional Residues in Protein Structures: main paper, potentially useful paper
- Graph-based Machine Learning, Graph Classification: Efficient graphlet kernels for large graph comparison: paper
- Graphs in NLP: WordNet
- Graphs for Knowledge Representation: ConceptNet
Others
- node2vec: Scalable Feature Learning for Networks: paper
- PageRank: Datasets and Code Collection
- Directed Networks/PageRank/HITs Algorithm etc. [CNA] Ch V (+ online resources)
- Deep Graph: paper and toolbox
- Preferential Attachment Model with Triads: paper
- Empirical Comparison of Distributed Graph Storage Patterns: paper
-
Empirical Comparison of Algorithms for Network Community Detection: paper
- Finding All Maximal Cliques in Very Large Social Networks: paper
- GraphX: paper1, paper2, documentation
- Other Graph Libraries: iGraph, graph-tool, NetworKit: cf. [CNA] Ch2 and Appendix for a start
- Networks based on Co-Occurrences: [CNA] Part III (+Case Study)
- Similarity-based Networks: [CNA] Part IV
- Bi-partite Networks: [CNA] Part IV (+ Case Study)
- Community Detection via k-means clustering on graph spectrum: implementation paper
- Link Prediction: SFI lecture notes, paper, this is an exciting application with many resources online!
- Node Classification – especially interesting if you are familiar with machine learning/kernel methods
- Graph Classification – especially interesting if you are familiar with machine learning/kernel methods: slides
- Applications of CNA in Biology, Physics, Business, Medicine
- …
Let me know if you have ideas on interesting studies/topics to add!
Python
We will use Python and Numpy, Scipy, Matplotlib, and NetworkX. All those packages are included in the Anaconda package. Follow these instructions to get everything installed.
Versions
It’s recommended to go with the newest versions included in Anaconda. If you have an up and running Python installation (and are capable to manage dependencies yourself), feel free to use any of the following Python versions: 3.5 or higher and the respective compatible versions for the packages listed above. Note that the [CNA] book uses Python 3.x, NetworkX 1.11, Matplotlib 1.5.1, Numpy 1.11.3, and Scipy 0.18.1.
Graph Libraries
- NetworkX: all purpose graph library implemented for and in Python
- SNAP.py: good for more complex algorithms and large networks (written in C++)
- Gephi: good for network visualizations and basic measurements
Jupyter notebooks
Jupyter notebooks (included in the Anaconda package) will be useful to explore the [DSCN] code and also for developing your homework solutions. HERE is some more information on how to get started with Jupyter.
Python tutorials and Resources
- Learn Python course on Codecademy
- Intro to Python for Data Science from DataCamp
- The official Python tutorial is quite comprehensive. There is also a useful glossary.
- The Wash U library has electronic copies of these useful O’Reilly books available for viewing online:
- Learning Python – Find it here.
- Python Data Science Handbook – Find it here.
Gradescope
We will use Gradescope for written homework submissions and all homework grading. Find a tutorial on submitting a PDF to Gradescope HERE. You will be automatically added to Gradescope via Canvas.
Git
If you are not familiar with git, take some time and learn about it. Using git as a collaboration tool (for your team work on the assignments and project) rather than just a way to submit your solution is highly beneficial!!! Learn git while playing a game!
Git Help Videos from CSE131
Please, ignore all the cse131 specific parts.
Using git: loading your repository, making changes, commit/pushing (start watching from minute 1:35)
How to get unstuck if you can’t commit/push:
- First try to pull and then try the commit/push again
- Drastic steps to get yourself unstuck