Can We Outsmart Ryanair?#
๐๐ฉ๐ช๐ด ๐ฑ๐ฐ๐ด๐ต ๐ธ๐ข๐ด ๐ค๐ฐ๐ฏ๐ค๐ฆ๐ช๐ท๐ฆ๐ฅ ๐ข๐ฏ๐ฅ ๐ธ๐ณ๐ช๐ต๐ต๐ฆ๐ฏ ๐ข๐ด ๐ข ๐ต๐ฉ๐ฆ๐ณ๐ข๐ฑ๐บ ๐ต๐ฐ ๐ต๐ณ๐ฆ๐ข๐ต ๐ข๐ฏ ๐ฆ๐น๐ค๐ณ๐ถ๐ค๐ช๐ข๐ต๐ช๐ฏ๐จ ๐ฐ๐ถ๐ต๐ฃ๐ถ๐ณ๐ด๐ต ๐ฐ๐ง ๐ข๐ฏ๐จ๐ฆ๐ณ ๐ ๐ฉ๐ข๐ฅ ๐ธ๐ฉ๐ฆ๐ฏ ๐ ๐ฑ๐ข๐ช๐ฅ 200ยฃ ๐ง๐ฐ๐ณ ๐ข ๐ฐ๐ฏ๐ฆ-๐ธ๐ข๐บ ๐ต๐ช๐ค๐ฌ๐ฆ๐ต. โ๐๐ฐ๐ฐ๐ญ ๐ฎ๐ฆ ๐ฐ๐ฏ๐ค๐ฆ, ๐ด๐ฉ๐ข๐ฎ๐ฆ ๐ฐ๐ฏ ๐บ๐ฐ๐ถ; ๐ง๐ฐ๐ฐ๐ญ ๐ฎ๐ฆ ๐ต๐ธ๐ช๐ค๐ฆ, ๐ด๐ฉ๐ข๐ฎ๐ฆ ๐ฐ๐ฏ ๐ฎ๐ฆโ.
๐๐ฉ๐ฆ ๐ฅ๐ข๐ต๐ข ๐ด๐ค๐ช๐ฆ๐ฏ๐ต๐ช๐ด๐ต ๐ธ๐ข๐บ.
As many European travelers, I am very grateful to Ryanair. Not only they offer ridiculously low fares, but it forces all other airlines to keep their fares low if they want to stay in business. In addition, the fact that Ryanair serves smaller airports means it brings me closer to my final destination. For less.
But I have a confession (I hope Michael OโLear will never read it): Iโm probably one of the worst kind of customer Ryanair can have, from a profit standpoint.
I regularly fly in-out of London on a point-to-point, two-hour-long route, and almost always pay between 10 to 40 ยฃ/โฌ each way. At those prices I am a loss for Ryanair.
Except when Ryanair sets outrageous fares on the range 150-300 ยฃ/โฌ, during summer, Christmas and Easter holidays, and other occasions!
Make sense of Ryanair ticket pricing#
After paying 200ยฃ for a one-way ticket it usually costs me 20ยฃ, I started wondering if this was a necessity or I have been ripped off: is it really inconceivable to try to reverse engineering the Ryanair pricing algorithm(s)? Iโm a data scientist after all!
Wellโฆ it is not that easy.
Main factors affecting airline prices#
Roughly speaking, there are few antagonistic effects at play that determines flight seat fares:
- Capacity Effect
Seats are limited. As soon seats are booked, scarcity increases. This should drive price up with time.
- Temporal Effect
Flight seat reservations are perishable services. Selling high-cost seats at a later time has an increasing risk while approaching the departure date as unsold seats provide zero revenue.
- Customer behavior
The willingness to pay (WTP) at a given price is time dependent and customer-type dependent.
Revenue Management and Dynamic Pricing#
How airlines blend all these effects in their pricing strategy?
They usually resort to some kind of revenue management, i.e. predicting customer behavior to optimize price and availability to maximize revenue.
They adapt to changes in market conditions with dynamic pricing and quantity-based RM. The complexity of the problem implies that dynamic pricing does not mean to find the right moment to increase seat fares, nor to set the lower price for early buyer. We will clearly show later that the inter-temporal price discrimination does not go in one direction only!
One trick, that is now well know low-cost airlines adopt, is to arrange seats into groups (buckets) with different price tag and variable size. The dynamic is to move seats between buckets, according to market changes. Therefore making sense of single fare become very difficult.
Information Asymmetry and Customer Strategy#
Unfortunately not only the RM practice and the actual dynamic pricing implementation are unknown to consumers, but there is an information asymmetry about essential information available (and used) by airlines that are not available (or very hard to find) to consumers (as for example the flight occupancy at booking time, or the route demand/popularity level).
So airlines always win?
There is one fact airline algorithms are built upon: customers are myopic.
As soon as the price is below what they are willing to spend, they buy. Customers donโt have the time or information or willingness or knowledge to behave strategically (as a competitor does for example) in trying to get the best possible deal.
Therefore airlines algorithms are not built to play a strategic game with customers: they donโt need to, and it would probably be counterproductive as the player they are at the table with (consumers) donโt play the strategic game.
I will focusing on one particular route and look for possible patterns to draw useful insights for a more informed, strategic behavior as a loyal Ryanair customer
Data#
The first step is obviously to collect the best possible data for the problem at hand. My focus is on a relatively simple case:
Just one point-to-point route (STN-AOI)
A route being covered only by Ryanair
No complications due to fare adjustment/alignment with competitors
A leisure route
No business vs. leisure traveler dynamic price complications
At most one daily flight (typically four flights a week)
No complications from fares depending on the time of the day
Note
I will use the terms โseat fareโ, โflight fareโ or just โfareโ interchangeably. By that I mean just the base fare at booking time. The base fare for all seats are the same as there are just one class on Ryanair flights (different charges apply if a particular seat is reserved, but that is in addition to the base fare)
Data Collection#
The idea is to collect price history trends for each flight. For convenience I limited the collection of fares up to 6 months in advance.
My simple data collection strategy is:
Find an API to fetch Ryanair fares with python
Run it daily through an Azure function
Store the results in Azure blobs after some simple feature engineering.
Basically it is just a simple cronjob of a python script, that can be run even locally, but I found more convenient to run on the cloud. In this separated post I details how this was done.
Dataset Schema#
The resulting collected dataset looks like this:
day_dep | price | currency | airport_start | airport_end | day_id | days_left | |
---|---|---|---|---|---|---|---|
0 | 2021-10-05 | 8.00 | GBP | STN | AOI | 2021-10-04 | 1 |
1 | 2021-10-05 | 8.00 | GBP | STN | AOI | 2021-10-03 | 2 |
2 | 2021-10-07 | 7.99 | GBP | STN | AOI | 2021-10-04 | 3 |
3 | 2021-10-07 | 7.99 | GBP | STN | AOI | 2021-10-03 | 4 |
0 | 2021-10-08 | 18.99 | GBP | STN | AOI | 2021-10-07 | 1 |
... | ... | ... | ... | ... | ... | ... | ... |
178 | 2023-06-29 | 53.74 | GBP | STN | AOI | 2023-01-01 | 179 |
179 | 2023-06-29 | 53.74 | GBP | STN | AOI | 2022-12-31 | 180 |
178 | 2023-06-30 | 78.99 | GBP | STN | AOI | 2023-01-02 | 179 |
179 | 2023-06-30 | 78.99 | GBP | STN | AOI | 2023-01-01 | 180 |
179 | 2023-07-01 | 87.80 | GBP | STN | AOI | 2023-01-02 | 180 |
53534 rows ร 7 columns
Each flight is identified by day_dep
, the departure date (remember: one flight a day at most), and the route is airport_start
โ airport_end
Flight fares are collected daily, so there are multiple entries per flight. Each provides the price
fare for a given booking date (day_id
).
For convenience I added days_left
, the days in advance of the booking. Roughly, they range between 180 days to 1 (the day before departure, if seats are still available).
Fetch other important information
Airlines have (and use) many other information at booking time to set the current seat price. One of the most important is the number of booked seats (occupancy), as well as historic route popularity trend and other revenue-related information (i.e. bulk seat blocking/release).
I havenโt investigated if it is possible to fetch some of those extra information with some API, but it would definitely be useful to understand some patterns we are going to see.
Flight Fare Summaries Creation#
Besides the fare trends for each flight, it is useful to extract summary metrics for each flight:
- Initial Price
Basically the flight fare 6 months in advance
- MinAvg Price
Minimum price within the 6-month window. I used a rolling average over N days (N=3) to smooth possible outliers
- Days Left at MinAvg
Days in advance to book the flight at the MinAvg. In case of ambiguity, the closest day to departure is chosen.
- MaxAvg Price
Maximum price in the 6-month window.
- Last Price
Last available price, typically the day before departure.
The reduced summary dataset (one entry per flight) looks like this:
Show code cell source
## Helper functions
def PrepareMinAvg(df, N=3):
'''
Calculate the min ticket price for a given departure date.
The price is averaged over N days (rolling average) to avoid outliers
Also add the number of days in advance the min price was offered
'''
## sort dataframe
df = df.sort_values(by=['day_dep', 'days_left'])
## Departure date groups:
dfg = df.groupby('day_dep')
## Calculate the rolling average within a group
df['avg'] = dfg.rolling(N, min_periods=1, center=True)['price'].mean().droplevel(level=[0])
## Calculate the min over every departure date
col = 'avg'
df1 = dfg[col].min().reset_index().rename(columns={col:'avg_min'})
## Add the days_left when the price was at min.
dfr = df[['day_dep', 'days_left', 'avg']]
dft = pd.merge(df1, dfr, left_on=['day_dep', 'avg_min'], right_on=['day_dep', 'avg'], how='right', indicator="indicator_column")
## selct only rows matched bwtween the two dataframe
ff = dft.indicator_column == 'both'
dft = dft[ff]
dft.drop(['avg', 'indicator_column'], axis=1, inplace=True)
## If multiples days at min price, keep the last (closest to departure date)
ff_dup = ~(dft.day_dep.duplicated(keep='last'))
df1 = dft[ff_dup]
df1 = df1.rename(columns={'days_left':'days_left_atmin'})
return df1
def PrepareInitPrice(df, N=7):
'''
Calculate the starting price (average of the N days the ticket was on sale)
'''
## sort dataframe
df = df.sort_values(by=['day_dep', 'days_left'])
## Departure date groups:
dfg = df.groupby('day_dep')
col = 'price'
df1 = dfg[col].agg(lambda x: x.iloc[-N:].mean()).reset_index().rename(columns={col:'price_initial'})
return df1
def PrepareMaxPrice(df):
'''
Calculate just the max price for the ticket
'''
## sort dataframe
df = df.sort_values(by=['day_dep', 'days_left'])
## Departure date groups:
dfg = df.groupby('day_dep')
col = 'price'
df1 = dfg[col].max().reset_index().rename(columns={col:'price_max'})
return df1
def PrepareLastPrice(df):
'''
Calculate just the price at the closest day to departure (typically the day before)
'''
## sort dataframe
df = df.sort_values(by=['day_dep', 'days_left'])
## Departure date groups:
dfg = df.groupby('day_dep')
col = 'price'
df1 = dfg[col].first().reset_index().rename(columns={col:'price_last'})
return df1
def MakeReducedDataset(df):
'''
Prepare a rduced dataset with summary information on ticket price history for each departure date
'''
## sort dataframe
df = df.sort_values(by=['day_dep', 'days_left'])
## Crate list of df
df_ll = [
PrepareMinAvg(df),
PrepareInitPrice(df),
PrepareMaxPrice(df),
PrepareLastPrice(df)
]
## merge them all
df1 = df_ll.pop(0)
for i in df_ll:
df1 = df1.merge(i, on='day_dep', how='outer')
## sanity checks (all departure dates have values
for cc in df1.columns:
assert df1[cc].isna().sum()==0 ## no nan
return df1
## Make the reduced df
df = df_2aoi
df = df[~df[['day_dep', 'days_left']].duplicated()]
dfr_2aoi = MakeReducedDataset(df)
df = df_2stn
df = df[~df[['day_dep', 'days_left']].duplicated()]
#df['month_dep'] = df.day_dep.dt.month_name()
dfr_2stn = MakeReducedDataset(df)
dfr_2aoi
day_dep | avg_min | days_left_atmin | price_initial | price_max | price_last | |
---|---|---|---|---|---|---|
0 | 2021-10-05 | 8.00 | 2 | 8.000000 | 8.00 | 8.00 |
1 | 2021-10-07 | 7.99 | 4 | 7.990000 | 7.99 | 7.99 |
2 | 2021-10-08 | 7.99 | 5 | 11.656667 | 18.99 | 18.99 |
3 | 2021-10-09 | 7.99 | 6 | 10.265000 | 17.09 | 17.09 |
4 | 2021-10-10 | 7.99 | 7 | 11.630000 | 17.09 | 17.09 |
... | ... | ... | ... | ... | ... | ... |
533 | 2023-06-27 | 57.99 | 180 | 69.910000 | 88.59 | 88.59 |
534 | 2023-06-28 | 49.99 | 180 | 49.990000 | 49.99 | 49.99 |
535 | 2023-06-29 | 53.74 | 180 | 53.740000 | 53.74 | 53.74 |
536 | 2023-06-30 | 78.99 | 180 | 78.990000 | 78.99 | 78.99 |
537 | 2023-07-01 | 87.80 | 180 | 87.800000 | 87.80 | 87.80 |
538 rows ร 6 columns
Properties of fares data#
Now that we collected/prepared/cleaned the datasets we can looking for possible patterns and insights:
What is the average price of this route? What is the variation between seasons/months?
How much prices can fluctuate, and for how long (and when) they can be considered โconvenientโ?
Is there a pattern on flight price over time?
When is the best time to buy a ticket? Is it always convenient to buy early on?
There is a monotonic (even if not strict) increasing trend of ticket price that accounts for flights slowly but inexorably being filled?
And as always when (good) data are looked at in details, letโs be prepared for interesting and unexpected findings!
Note
This post is not only written with Jupyter Lab, but it is a Jupyter Notebook!
Clicking on the Colab badge at the top of the page you can run this page interactively, change/modify the examples below, look at different dates, plot different variables, etcโฆ and make your own exploration! Or even using my own plots asโฆ
All plots below are interactive!
They are made with plotly, so you can zoom, make x or y selection, toggle lines and histograms, hover on points and lines, and much more! All can be done on this page!
First we want to look at the fare trend over time. I cannot show all flights at once (too many!), but I restricted the plot to flights departing the week before Christmas 2022 (considered high season): Dec. 17, 18, 19, 20, 22 and 24.
Show code cell source
import plotly.express as px
def MakeHistoryTrendByDate(df, day_s_0="2022-12-17", day_s_1="", title_s="Flight Price Trend"):
'''
Make historical trend of price for tickets on dates in [day_s_0-day_s_1] range
'''
## If not end date not provided, use initial date
if day_s_1 == "": day_s_1 = day_s_0
## Filter
ff1 = df.day_dep <= day_s_1
ff0 = df.day_dep >= day_s_0
ff = ff0 & ff1
cols = ['day_id', 'price', 'days_left', 'day_dep']
df1 = df.loc[ff,cols].sort_values(by='day_id')
#df1 = df1[~df1.day_id.duplicated()]
df1['DepDate']=df1.day_dep.astype(str)
## Make plot
fig = px.line(df1, x="day_id", y='price', color='DepDate',
title=title_s)
## Range-slider and time-window buttons
fig.update_xaxes(
title='Booking Date',
rangeslider_visible=False,
rangeselector=dict(
buttons=list([
dict(count=15, label="15d", step="day", stepmode="backward"),
dict(count=1, label="1m", step="month", stepmode="backward"),
dict(count=3, label="3m", step="month", stepmode="backward"),
#dict(count=1, label="1y", step="year", stepmode="backward"),
dict(step="all")
])
)
)
return fig
pio.renderers.default='sphinx_gallery'
#pio.renderers.default='jupyterlab'
#pio.renderers.default='plotly_mimetype'
px.defaults.width = 700
px.defaults.height = 500
fig = MakeHistoryTrendByDate(df_2aoi, day_s_0='2022-12-17', day_s_1='2022-12-24')
Tip
You can show only one flight fare trend just selecting (double click) the corresponding date from the legend.
The first impression is that there is not a clear pattern, for any of those fare trends. It seems there is a general slow increase (as expected), but far from monotonic, and with big drops at various stage:
The initial prices are very high compared to the average for this route, and fairly stable up to mid-September, i.e. after summer season.
Zooming in at 1 month in advance we see some real chaos, with big jumps for all those flights. All of them went well below 50 a week before departure, to raise again few days later.
The flights closer to Christmas (22/12 and 24/12) went quite high, to drop significantly in November
Well, good luck making sense of all of that! (we really miss the information of seat availability at the time of booking)
But for sure next year I will wait until the second week of December to buy my tickets for Christmas holiday.
Letโs look at the fare distributions. For this Iโll show the aggregated fares as created above
Show code cell source
df = dfr_2aoi
present = '2023-01-01'
## select only data with at least 6 months booking time and not in the future
ff1 = df.day_dep > '2022-06-01'
ff2 = df.day_dep < present
ff = ff1 & ff2
df = df[ff]
## Prepare a df with one columns and categorical
x1 = df.avg_min
x2 = df.price_max
x3 = df.price_last
x4 = df.price_initial
data_ll = [x1,x2, x3, x4]
label_ll = ["Average Min", "Max" , "Last", "Initial"]
def PrepareDataForHisto(data_ll, label_ll):
df_ll = []
for dd,ll in zip(data_ll, label_ll):
dfi = pd.DataFrame(dd)
dfi.columns = ['data']
dfi['label'] = ll
df_ll.append(dfi)
df_plot = pd.concat(df_ll, axis=0)
#df_plot.columns = ['data', 'label']
return df_plot
df_plot2 = PrepareDataForHisto(data_ll, label_ll)
The bulk of them is below 50. Almost all flight fares go as low as that for at least some time.
It seems also that is never convenient to buy a ticket too much in advance, as initial prices are higher on average.
One of the main question is to find the best time-window to buy a ticket:
Unfortunately evidences are inconclusive, there is not a particular peak anywhere. But it is interesting that many of the cheapest fare can be found even within 40 days of departure.
The overall distribution above includes both low and high season fares. It may be instructive to further segment the dataset by month, and check how average trends may differ in summer or Christmas holidays with respect to other quieter time like October or November.
This time though we correlate with minimum and maximum price:
Tip
Months data can be shown individually as well in pairs (or a subset) clicking on the name in the legend on the top of the plot. Also hovering the mouse on individual circles provides details the given flight fare.
Some observations:
It is more clear now where the โclusterโ around 60-100 days in the overall distribution is coming from: they are mainly from July. On this month max fares are quite high (big disks in the plot), and for most of them the cheaper fare happens around 90 days in advance (basically buying the ticket in April).
For August (another high season, high fare period) it seems there are buckets of low fare released at different time
Interesting are October data: for most of the days, the cheaper fare happened at the very last days before departure.
Finally December (high season) shows - quite unexpectedly - cheaper price in the narrow 9-11 days window before departure, with big saving too! ( tickets with max fare at 200-250ยฃ being sold at less than 30ยฃ)
Finally, letโs check how min and max fares compare:
This plot is an illustration of the business model of low-cost airlines: the selling price of their products is mainly based on (forecast of) how much customers are willing to spend (at a given time), not the actual cost/value of the product.
This is not so shocking of course, but - as weโve seen above - it does not follow an offer/demand trade-off dynamic of a limited resource (the available seats in a flight is finite) and has a little to do with low fare at the beginning (to fill the airplane) and high fare at the end (with few seats available).
On the bottom-right side of the plot we can see the most extreme case: for a flight in June, the same seat, on the same flight, with the same service, the fare was 12.99ยฃ at one point and 336.59ยฃ at a another point.
Fare Birdโs Eye View#
As in astronomy where not only zooming into the deep sky but also looking from afar gives ideas about the universe structure, having a birdโs eye look at data sometimes can provide insights of patterns that can be missed otherwise:
Wow, thisโs quite interesting!!
The two diagonal boundaries are due to the obvious constraint to be able to book only flights in the future (and not in the past), and the constraint on data collection being built to check fares up to 6 months in advance, no further.
There are also a couple of periods with missing data (big rectangular holes).
The booking date
stops at the time I retrieved the data from my cloud storage to prepare this plot. Fares of flights after TODAY
line are not complete, as they will evolve until departure.
There are few interesting structures worth to be discussed (the plot above is interactive, but I made a few zoom screenshot for clarity):
Vertical Holes (Comb effect) The (many) vertical holes we see - mainly during 2022 up to end of summer - are due to Ryanair flight rescheduling. Basically they have scheduled flights (that you could book), and then decided to remove them. It is more a rescheduling of their operation, rather then canceling. It is not a last minute thing, and they remove many flights at the same time (corresponding to a horizontal breakdown). For example many flights scheduled in Feb. 2022 as well some in April were removed abruptly on Jan 12. |
|
Summer 2022 Summer 2022 was particularly troublesome in UK for all airlines, as airports had shortage of staff and many other issues. A big rescheduling took place on July 1st, with many flights in the summer being removed. Besides that, it is very noticeable the dynamic pricing strategy to release seats (or lower prices) at particular days of the year (horizontal change of colors). On April 13 for examples the fares for July flights dropped dramatically. |
|
December 2022 We looked at that before: horizontal breaks with sudden drop in price are very noticeable (i.e. in early Nov. for flights at Christmas time). Then things went quite crazy the first week of December. The flight on Dec 11th seems an outliers, not sure why its fare was so high compared to adjacent days. |
|
Outlook for 2023: prediction time!! There is a big outlier on Mar 9th, Iโm sure the price is going to drop. Apr, 1 - Apr, 8 is Easter time, fares are usually higher. Comparing with Easter 2022 I expect a drop (block-release) on March 1st. Prices are quite high for the end of May period. There is no much data, but it might be an algorithmic fluke? If they base their fare on previous year demands, in 2022 there was the Platinum Jubilee weekend at that time and the algorithm may be set to expect the same โactivitiesโ this year. Ryanair may face a hell of disappointment! |
Summary#
Ryanair price setting behavior is hard to reverse engineering without internal knowledge of its Revenue Management strategy and other flight specific marketing information.
However it shouldnโt be assumed Ryanair employs super-optimized dynamic pricing algorithms. Other considerations are at play that may be related to the overall profit of the company, rather than maximize profit for each individual flight. Evidences of this fact are the observed sudden and very similar changes in price for a bunch of flights at the same time. Maximum profit for individual flights would imply instead independent and tailored actions for each of them.
While models of airlines pricing strategy are rightly studied, empirical analysis of historical trends as done in this post can provide useful and practical insights on guessing the best time for a good deal on a given route.