Graphs are a fundamental part of computer science, and they are only getting larger. Analyzing graphs using machine learning is powerful but requires translating them into numerical values. That’s where graph algorithms come in.
In this week’s five-minute interview (conducted at GraphConnect 2018 in NYC), we spoke with Dr. Steven Skiena, author of The Algorithm Design Manual and The Data Science Design Manual, about his work on DeepWalk, slated to be incorporated into the Neo4j graph algorithm library.
What is the primary focus of your research?
Dr. Steven Skiena: My research these days is focusing a lot on graph embeddings. Graph embeddings are multidimensional representations of graphs where vertices are represented as points in space.
Ideally you would like to take a graph and reduce it to features that you can build in a machine learning model. Machine learning models typically work very well with numerical values. Things like linear regression, support vector machines (SVMs) and neural networks work very well with data when it is in a numerical representation and our goal is to take a graph and represent it in let’s say 100-dimensional space so that these 100 dimensions represent features.
Graph embeddings are not so much about visualizing graphs; you’re not going to look at a picture in 100 dimensions and make sense of it. But on the other hand, a machine learning model can very easily make sense of 100 features, and by reducing graphs to these features, you can easily build interesting and powerful models.
You can easily answer questions about when are two things very similar to each other, testing whether two points in a high dimensional space are similar, or finally identify what the most similar things are.
These are perfectly reasonable things to do, once you have a representation. Our graph embedding algorithm DeepWalk is a way of getting these kinds of embeddings.
Can you go into more detail about the DeepWalk algorithm?
Skiena: A lot of DeepWalk’s power comes from the idea of word embeddings. Many people have heard of Word2Vec. It is a way of taking English text in an unsupervised way and reducing it to features that represent what words mean. You would like to find a way to organize words so that words that have similar roles in language, like green, yellow, and red, are close to each other in space. This is very powerful for building language models.
Word2vec takes this idea of building representations of essentially words by their corresponding role (something like, that they fit in sentences). By an analogy, for DeepWalk, we think about a graph as being a vocabulary of vertices, and we think of sentences as being walks on a graph.
A walk from one word to another word to another word: that describes a sentence. And a walk from one node to another node to another node describes a random walk, which we can now treat as a sentence for the purpose of building embeddings.
Word2vec takes sequences of symbols and from them in an unsupervised way learns what the symbols really mean. We use the underlying technology of Word2vec to do a similar thing for graphs. This has been proven to be powerful in a lot of applications and apparently is on its way into the graph algorithms library at Neo4j.
What is the most interesting thing you’ve come across in graph algorithm research?
Skiena: Graph algorithms are a very interesting area of algorithm design. Graph algorithms are very intricate, but using them does not really require knowledge of their intricacy.
A lot of the power in graph algorithms comes from basically knowing how to model what you’re doing. You can build surprisingly powerful things out of graphs without knowing that much about the underlying algorithmics and how they work.
The fact that graphs are ubiquitous in social networks and networks everywhere tells you that they do very powerful things.
What excites you about the future of graph technology?
Skiena: Graphs are a fundamental part of computer science. They will always be a fundamental part of computer science, but obviously, graphs have been getting larger. The amount of graph data and the amount of graph representations have been growing tremendously.
Obviously, we know about social networks, and we know that graphs seemingly are everywhere. And it’s clear that this trend is presumably only increasing.
We originally built DeepWalk to handle a few million nodes. That sounded very exciting, except graphs keep getting bigger and bigger. Our more recent research has focused on hierarchical graph embedding, something we call HARP, that will enable you to do embeddings for bigger graphs using distributed algorithms.
Anything else you’d like to say?
Skiena: It’s very interesting for an academic, especially one who has spent his life teaching graph algorithms, to see so many people interested in graphs. I’ve been surprised by the number of people and the amount of activity that goes on in this space.
Again graphs have become very powerful, they’re fundamental, and they’re only becoming more and more important.
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at firstname.lastname@example.org
Download Graph Algorithms in Neo4j, and discover how to tap into the power of graphs for powerful insights into connected data.
Read the White Paper