GraphGists

Most people are looking for the best possible flight options when traveling from one destination to another. However the "best possible flight option" can be defined based on different criteria, such as ticket prices, number of stops, travel time, operating airline, and etc. which each is built upon quite a few. Although this problem may seem very similar to finding the shortest path within a given graph, there is a distinct matter that should be taken care of, in which in travels that has a stop in between, we only can select those flights that their departure time falls after previous flight’s arrival time (in addition of minimum time buffer). While there are many websites that offer such a service, I modeled this scenario using GraphGist.

What my proposal has to offer which distinguishes it from many other available programs, is that in this model, in addition to travelers, other type of entities are addressed such as organizing systems of airports, airlines, and other agencies to have a better understanding of statistics of different airlines, popularity of destinations, and demographics of their users per any defined period of time. Thus these data can be used for better and more efficient proposed plans by organizations.

fhRnvSI

www.transtats.bts.gov[Bureau of Transportation Statistics] releases flight data set (open source) monthly. This Gist, in order to have a faster sample of results, used data of 11/30/2015. Following variables from this data set are used:

  • Airport abreviation

  • Flight date

  • Flight origin

  • Flight duration

  • Flight distance

  • Flight airline ID

  • Flight date

While these variables are very informative in many ways, they lack a very important piece which is costs and final price for the customers. For each flight, this model, has assumed three possible class for ticket options: economy, business and first class. For each ticket, random number has been generated and multiplied by 0.75, 1 and 1.2 respectively based on ticket class and multiplied by flight duration (total travel time in minutes).

Here are some information about dataset:

Airport #

71

Flight #

2809

Airline #

2

Queries

This simple but very well thought model can be used to address several questions that some of them are covered here:

Find prices for direct flight originated from a given origin [in this case study, Seattle airport (SEA)] for a business class ticket

MATCH (f:Flight)-[:ORIGIN]-(a:Airport { name:'ORD' })
WITH f
MATCH (t:Ticket { class:'business' })-[:ASSIGN]-(f)-[:DESTINATION]-(d:Airport)
RETURN d.name as Destination, t.price as Price, f.airline as Airline

Find Flights with one stop from a given origin [in this case study, Seattle airport (SEA)] to a given destination (in this case study, San Francisco Airport)

MATCH (a:Airport { name:'SEA' })-[:ORIGIN]-(f1:Flight)-[d:DESTINATION]-(a2:Airport)-[:ORIGIN]-(f2:Flight)-[:DESTINATION]-(a3:Airport { name:'SFO' })
where f1.date < f2.date
RETURN f1.date, f1.airline,  a2.name, f2.date, f2.airline, a3.name

Find flights with one stop from a given origin [in this case study, Seattle airport (SEA)] to a given destination (in this case study, San Francisco Airport)

MATCH (a)<-[:ORIGIN]-(f:Flight)
WITH a.name AS Airport, f.airline AS Airline, count(f) AS total
ORDER BY total, Airport, Airline DESC
RETURN Airport, Airline, total

Find how many flights are originating for each given city and/or airline

MATCH (a)<-[:DESTINATION]-(f:Flight)-[:ASSIGN]-(t:Ticket)
WITH a.name AS Airport, f.airline AS Airline, avg(t.price) AS Average
ORDER BY Airport, Average DESC
RETURN Airport, Average