By Aileen Agricola | January 22, 2013
Xebia uses Neo4j to analyze the international trade of fish and demonstrates how the graph reveals important connections of trading routes.As international trade is influenced by various factors such as local supply and demand, tariffs, or currency exchange rates, it might be interesting to see which trading routes between countries are preferred over any alternatives. For example, zooming in onto some of the bigger players, we’d like to know what the preferred trading routes are between the neighboring countries UK, Iceland and Norway. In general, it would be nice if we could discover such routes automatically for the global trade network at once. How can this be accomplished? The simple solution: filtering on volume As a first experiment, we could try to apply a filter that drops all connections for which the trading volume does not belong to the top 1 %. This leads to the following result (after re-applying layout for the graph): We can learn from this that there’s only a small group of countries between which most fish trade takes place, in absolute terms. However, we cannot answer the question stated before, because by applying the volume based filter, we already lost any connection between Iceland and our “core countries” (and more than 80% of the other countries that are not displayed in above picture). Continuing the experiment, we can lower the filter to let the connections with the top 5 percent trade volumes pass, which leads to the following result: As can be seen, lowering the volume threshold only increased the complexity of the network between the large exporters/importers of fish, and hardly provided new insights into preferred trade routes with countries in the periphery of the network. From this we can conclude that simple filtering on trade volume has two major problems:
- Based on the chosen threshold it either completely discards most of the network, or adds enough complexity to the connected part to make it hard to find the preferred routes,
- The choice of a particular threshold requires domain knowledge in the best case and is fully arbitrary in the worst case.