You can now use Apache Spark 2.4.2, Apache Flink 1.8.0, Presto 0.219, Hue 4.4.0, JupyterHub 0.9.6, Apache Livy 0.6.0, and Apache MXNet 1.4.0 on Amazon EMR release 5.24.0.
This release also includes three new performance optimizations which you can enable and improve Spark performance by up to 13X: Dynamic partition pruning, Flattening scalar subqueries, and DISTINCT before INTERSECT.
- Dynamic partition pruning allows the Spark engine to dynamically infer relevant partitions at runtime, saving time and compute resources by both reading less data from storage, and processing less records.
- Flatten scalar subqueries helps in situations where multiple different conditions need to be applied to rows from a specific table, and prevents the table from being read multiple times for each condition. This reduces redundant data reads and improves performance.
- DISTINCT before INTERSECT eliminates duplicate values in each input collection prior to computing the intersection, improving performance by reducing the amount of data shuffled between hosts.
You need to enable these optimizations via Spark properties. Please refer to the EMR 5.24.0 release notes to learn more about these features.
Additionally, you can now use CloudFormation templates and specify multiple subnets for different Availability Zones within a VPC when you launch clusters using EMR Instance Fleets. This feature is available from EMR versions 4.8.0 and greater (with the exception of 5.0.x)
Amazon EMR release 5.24.0 is now available in all supported regions for Amazon EMR.
You can create an Amazon EMR cluster with the release 5.24.0 by choosing the release label “emr-5.24.0” from the AWS Management Console, AWS CLI, or SDK. You can choose Spark, Flink, Presto, Hue, JupyterHub, Livy, and MXNet to install these applications when you launch your EMR cluster. Please visit the Amazon EMR documentation for more information about EMR release 5.24.0, Spark 2.4.2, Flink 1.8.0, Presto 0.219, Hue 4.4.0, JupyterHub 0.9.6, Livy 0.6.0, and MXNet 1.4.0.
You can stay up to date on EMR releases by subscribing to the feed for EMR release notes. Use the icon at the top of the EMR Release Guide to link the feed URL directly to your favorite feed reader.
from Recent Announcements https://aws.amazon.com/about-aws/whats-new/2019/06/announcing-emr-release-5240-with-performance-improvements-in-spark-new-versions-of-flink-presto-Hue-and-cloudformation-support-for-launching-clusters-in-multiple-subnets-through-emr-instance-fleets/