plane taking off against a blue sky background

Guest post by Cesar Moltedo, Data Engineer and Daniel Pizarro, CTO, Airnguru

In this article, we will discuss Airnguru’s product offering, how we leveraged Spot Instances to reduce costs, some of the challenges we faced using this technology, and how we solved them.

Our Services

The Airnguru Suite is a SaaS for airlines. We provide airlines with new generation pricing technology and pricing intelligence solutions and assist airlines in their internal processes related to pricing. We aim to become the world-leading pricing solutions provider for airlines. Since its foundation in 2015, Airnguru has been testing and intensively using AWS technology and big data techniques to tackle the highly complex pricing field for the airline industry. Airnguru is also leveraging all this experience through its professional services track, helping companies optimize their cloud journey. In this article, we’ll discuss how we reduced our costs while processing CPU-intensive batch jobs for our reporting module.

Our Challenge

Most network airlines (also known as “legacy carriers”) publish in a public repository (ATPCO) the rules to compute every price for every combination of airports, for every booking class, for every departure date and every return date within a year. Combined, all these rules generate a universe of about 20 trillion potential prices. Airlines may update these rules every hour. It’s like a worldwide stock-exchange of airline tickets, with trillions of prices that update each hour.

Of course, not every airline is interested in every airfare in the world; however, we must be able to compute any detailed report of all airfares for large subsets of the world, sometimes each hour. Some customers are interested in a detailed snapshot. Others are interested in quick-change detection in wide markets. There is an increasing demand for data products. Specifically, a very popular feature of our suite is that it can create a full snapshot of every fare in the world in a few minutes. The secret behind this feature: Amazon EC2 Spot Instances, an EC2 launch type that provides up to 90% discounts compared with on-demand prices by using spare EC2 capacity within an AWS region. By using spot instances, we launch hundreds–sometimes thousands–of machines each taking a small part of the problem.

The ability to spin up cheap computing power in a few seconds is incredibly important, and it’s been with us since EC2’s early days. Spot Fleets are a great tool to get a cluster of the cheapest machines around. However, there are some often overlooked ways to get even-cheaper machines.

Two ways to get the most bang for your buck

It turns out that for our favorite spot instance type, the on-demand price in us-east-1 is the same as us-east-2. However, spot prices are incredibly different. At the time of writing this article, just by running spot loads in us-east-2, we achieved savings of 40% in EC2 compared to spot prices in us-east-1.

Bar graph of EC2 savings with multi-region

Running distributed jobs is all about partitioning large jobs into smaller jobs. It turns out that when you are partitioning a large job, you have the opportunity to classify the jobs at the same time. By classifying jobs at the time of partition, we managed to assign all jobs of a particular size to a specific instance type. By doing this, jobs of different sizes could be assigned to different instance types. The trick is to always stay in the same instance family. By staying in the same instance family, we get an almost identical machine in terms of devices and drivers, where all the initialization scripts will succeed.

Both these strategies were implemented in a library we named “Clark.,” a distributed work-scheduler. Clark selects the most adequate instance type for each class of job using domain-specific knowledge. Then, for that instance type, it chooses the cheapest region where that instance must run.

Finally, an important consideration

There’s always a chance that inter-region transfer costs could spike. However, for our case, the increase in the inter-region transfer costs was much less than the savings for using a different region.

Bar graph of the data transfer cost

How to accomplish this (Python)

We need to define which regions we will use (us-east-1, us-east-2, ca-central-1, etc), and for each region, we will need an EC2 client. In our example code, we will use only one.

ec2_client = boto3.client(‘ec2’, region_name=‘us-east-2’)

With the EC2 client, we get the zones in the region.

Available_zones = list() Zones = ec2_client.describe_availability_zones() For zone in zones['AvailabilityZones']:     available_zones.append(zone['ZoneName'])

In each zone, we get the last spot price for a specific instance type.

For Available_zone in Available_zones:      Price = Ec2_client.describe_spot_price_history( InstanceTypes=[‘c5d.4xlarge’], MaxResults=1, ProductDescriptions=['Linux/UNIX'], AvailabilityZone=available_zone)      Price_float = float(price['SpotPriceHistory'][0]['SpotPrice'])

We select the zone with a minimum price and launch an instance there. If we repeat the previous step with other regions, we will get the cheapest zone between different regions. In this zone we should run our ec2 spot instance.

We set the spot max price base at the price obtained in the previous step incremented by a small factor. Then, to run our new spot instance we have:

Ec2_client.run_instances(    BlockDeviceMappings=[], ImageId=’last-ami’,    InstanceType=‘c5d.4xlarge’,    MaxCount=1, MinCount=1,    Placement={'AvailabilityZone': available_zone},    UserData=’your user data’,    IamInstanceProfile={'Name': ‘your iam instance profile’},    InstanceInitiatedShutdownBehavior='terminate',    InstanceMarketOptions={        'MarketType': 'spot',        'SpotOptions': {            'MaxPrice': str(price_float * 1.1),            'SpotInstanceType': 'one-time',            'InstanceInterruptionBehavior': 'terminate'        }    } )

Conclusions and future development:

This method provides savings in the price of EC2 instances by using alternate regions. However, these savings come at a cost of inter-region data transfer, additional latency, and complexity. For our use case related to batch processes, the benefits exceeded the costs clearly, and the proof of concept we created quickly became a production deploy.

It is important to note that not every batch job could show this behavior: With this method, compute processing gets cheaper, at the cost of more expensive communication between servers. In order to solve this problem, we will soon add cache layers that will reduce the inter-region communication cost.

In this post we have presented a method to use EC2 Spot in a non-traditional way to get lower prices. There are costs involved, and they are much lower than the benefits, because of specific characteristics of the jobs. At Airnguru this idea sat for a long time before we decided to give it a try. Surprisingly, the implementation of this feature was much easier than we expected, and the increment in complexity of the architecture was not substantial at all. The success of this feature (once just an experiment) motivates us to be bolder about trying new ideas, making new experiments, and we hope it also motivates you to try your own.