Connected Feature Extraction
Important - page not maintained
This page is no longer being maintained and its content may be out of date. For the latest guidance, please visit the Neo4j Graph Data Science Library .
In this guide, we will learn about concepts related to connected feature extraction, a technique that is used to improve the performance of Machine Learning models.
At a high level, machine learning algorithms take input data and create some sort of output. The input data comprises features, where a feature could be an attribute or property.
Examples of features could be the number of rooms or size of the garden in meters squared, if we were building a model to predict house prices.
The process of generating and selecting appropriate features is called feature engineering.
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.
Feature extraction is one of the sub problems of feature engineering. It describes the process of how we change the shape or format of our raw data so that it can be used in a machine learning pipeline.
It can be thought of as a dimensionality reduction process, where raw variables are reduced to more manageable groups (features), while still accurately describing the original data set.
These features are often derived from data stored in tables in a relational database.
Connected feature extraction is the process of changing the shape or format of graph data so that it is usable in a machine learning pipeline.
Connected features can be generated in two main ways:
- Running local graph queries
These work best when we know what we’re trying to find. For example, counting the number of fraudulent users up to 3 degrees away from a person in a fraud graph
- Running global graph algorithms
These are used when we know the general structure that we want, but not the exact pattern. For example, computing the PageRank score of all users as an indicator of potentially fraudulent behavior.
In a talk from the ML4ALL 2019 conference, Amy Hodler explained how connected features can be used to improve machine learning predictions.
You can also read more articles on this topic in Amy’s AI & Graph Technology blog series.
For a practical example of how connected features can be used to train a machine learning model, see the Link Prediction with scikit-learn developer guide.
Connected features are used in many industries and have been particularly helpful for investigating financial crimes like fraud and money laundering. In these scenarios, criminals often try to hide activities through multiple layers of obfuscation and network relationships.
Traditional feature extraction methods may be unable to detect such behavior, and this is where graph extracted features work well.
Amy Hodler’s Artificial Intelligence & Graph Technology: Enhancing AI with Context & Connections white paper goes into the uses for connected features in more detail.
Was this page helpful?