Collaborative Filtering: Creating the Best Teams Ever
data:image/s3,"s3://crabby-images/e3152/e31528090e19bd6916d1628c2b24b205b858fbb3" alt="Maurits van der Goes"
Graduate Intern
7 min read
data:image/s3,"s3://crabby-images/f4cdc/f4cdc5b17b3b712fc20684815da91a81aaa80164" alt="Watch Maurits van der Goes' presentation on how to use collaborative filtering to develop effective graph algorithms"
Why do We Need Virtual Teams?
What is Part-Up, and why do we need a recommendation system? Consider office spaces from about 100 years ago, which consisted of islands of people at individual desks with little communication. This no longer works today, for both economic and personal reasons. We are moving away from hierarchical company structures towards flat structures, we are focusing our specialization in one area, and we are cooperating with other organizations. Workers don’t want to have to show up at nine o’clock sharp, and may not even want to come into the office at all and work from home or while on holiday. This allows companies to keep up with intense global competition by increasing speed and flexibility. To operate within these new economic and organizational models, you need a new platform — and that’s where Part-Up comes in. Imagine that you don’t have a fixed function or department, and you can do whatever you want from wherever you want. Part-Up provides the marketplace to find and form any team you need. We founded the company one-and-a-half years ago, and we launched the platform in August. There are currently a number of public teams that anyone can join, but you first start by supporting the team, then becoming a contributor, and then becoming a partner. You can compare this to GitHub, which has a similar model. And while anyone can start with public teams, you can also create a private team for your own organization for a small fee. Once you’re in a team, you can see the other activities and colleagues associated with it. There is no team leader; everyone is equal and you cooperate and organize yourselves. But this is where a challenge arises. If there are over 400 available public teams, which one should you pick? Netflix did an analysis that showed if you have a large number of movie options, people will only check a total of 10 to 20 movie titles – but only three in detail. And the person doesn’t find a movie within that selection that they want to watch, they’ll drop out and never come back. So we don’t want our users to have to search through hundreds of teams; we want to serve people one team that perfectly matches their interests, ambitions and moods. This is why we started developing a recommendation engine.How to Develop a Graph Recommendation Engine
System Architecture
Below is our current system architecture: Our website runs on Meteor which works with MongoDB, and we chose to use a hybrid database structure, so MongoDB has all the content for the website but no recommendations. As we all know, Neo4j does a great job walking query paths, which it does much faster than MongoDB. For this reason, recommendations are calculated in Neo4j and stored in MongoDB. When I request a recommendation via the API, it relies on JCypher to work with Neo4j and retrieve an ID. This picture is then colored in with information from MongoDB. We create the recommendations with GraphAware, which has a really nice recommendation framework that allows you to specify and customize graph algorithms without having to build an entire framework. The GraphAware team has been really supportive, and I really like their product. The last piece of our system architecture is the Java Importer, which we used to get all of our old data into Neo4j.Data Model
Below is the logical data model that we use in Neo4j: I dropped all of the properties for the example, but we have a user which can hold strengths just like a profile. A user is active in a team and a part of a network, all of which are located in a city that is located within a country. This is what my network looks like in Neo4j: I’m quite active in a lot of teams (pink nodes), which are connected to networks (yellow nodes). We already know that it’s easy to walk these paths in Neo4j to get recommendations for new teams. Below is the setup we are using in GraphAware. I’ve added a number of modules: Next we check with the blacklist to see if there are any recommendations for teams that you are already on. We post produce these, meaning we tweak the results a bit within the post processor. Below is the Cypher code for the blacklist:MATCH (u:User) — [r:ACTIVE_IN]->(t:Team) WHERE id(u)={id} AND r.role>1.0 RETURN t as blacklistAnd here are some of the filters that I added:
MATCH (u:User),(t:Team) WHERE id(u)={id} AND t.privacy_type=2 AND NOT (u) — [:ACTIVE_IN {role:1.0} — > (t) RETURN t as blacklist
MATCH (u:User) (n:Network)<-[:PART_OF] - (t:Team) WHERE id(u)={id} AND n.privacy_type=3 AND NOT (u) — [:MEMBER_OF] —>(n) RETURN t as blacklist
Overcoming Filtering Challenges to Develop the Best Algorithms
Then we go into designing our algorithms. Below are the key filtering challenges we have to be prepared to overcome:-
-
- Data sparsity: Because we don’t know a lot about our users, we don’t know which teams to recommend. This is especially true with a “cold start,” which is why dating sites, for example, ask a series of questions when you create a profile.
-
-
-
- Grey sheep: You can have users that are unlike any of your other users, which makes providing recommendations a challenge.
-
-
-
- Scalability: This is important not only for the infrastructure, but for algorithms.
-
-
-
- Shilling attacks: You don’t want the false activity of other users to affect your algorithm. Consider the fraudulent activity on Amazon in which people falsified ratings to make their fraudulent product more interesting to others.
-
-
-
- Synonymy: This points to a computer not being able to understand that coding and programming are the same thing.
-