Neo4j Live: Construct the Matrix interaction network based on the movie script

10 Feb, 2022



This stream we will present how to combine web scraping, OCR, and NLP techniques to construct the Matrix interaction network.
- Scraping Matrix fandom page with Selenium
- Using PyTesseract to read the Matrix movie script PDF
- Extract characters in each scene by using the SpaCy’s rule-based matcher
- Construct and analyze the character’s co-occurrence network in Neo4j

Blog: https://towardsdatascience.com/construct-the-matrix-interaction-network-based-on-the-movie-script-738b4fa9b46d
Neo4j Sandbox: https://dev.neo4j.com/try
Colab Notebook: https://github.com/tomasonjo/blogs/blob/master/matrix/MatrixNLP.ipynb
Matrix Characters: https://matrix.fandom.com/wiki/Category:Characters_in_The_Matrix

Follow Tomaz: https://twitter.com/tb_tomaz
Graph Algorithms for Data Science: https://www.manning.com/books/graph-algorithms-for-data-science - use code au35bra for 35% discount

Related Videos