Buy New
-64%
$17.93$17.93
FREE delivery Wednesday, June 17 on orders shipped by Amazon over $35
Ships from: Amazon Sold by: Epic Book Outlet
Used - Like New
$9.99$9.99
FREE delivery Wednesday, June 17 on orders shipped by Amazon over $35
Ships from: Amazon Sold by: ZLIGHT LLC
Return this item for free
We offer easy, convenient returns with at least one free return option: no shipping charges. All returns must comply with our returns policy.
Learn more about free returns.- Go to your orders and start the return
- Select your preferred free shipping option
- Drop off and leave!
Sorry, there was a problem.
There was an error retrieving your Wish Lists. Please try again.Sorry, there was a problem.
List unavailable.
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the author
OK
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Purchase options and add-ons
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.
Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing.
With this book, you’ll explore:
- How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure
- The choice between data joins in Core Spark and Spark SQL
- Techniques for getting the most out of standard RDD transformations
- How to work around performance issues in Spark’s key/value pair paradigm
- Writing high-performance Spark code without Scala or the JVM
- How to test for functionality and performance when applying suggested improvements
- Using Spark MLlib and Spark ML machine learning libraries
- Spark’s Streaming components and external community packages
- ISBN-109781491943205
- ISBN-13978-1491943205
- Edition1st
- PublisherO'Reilly Media
- Publication dateJuly 11, 2017
- LanguageEnglish
- Dimensions7 x 0.75 x 9.25 inches
- Print length358 pages
There is a newer edition of this item:
$52.58
This title will be released on July 7, 2026.
Frequently bought together

Customers who viewed this item also viewed
Spark: The Definitive Guide: Big Data Processing Made SimplePaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 17Only 1 left in stock - order soon.
Learning Spark: Lightning-Fast Data AnalyticsPaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 17
High Performance Spark: Best Practices for Scaling and Optimizing Apache SparkPaperback$3.99 shippingGet it Jul 14 - 17This title will be released on July 7, 2026.
Learning Spark: Lightning-Fast Big Data AnalysisPaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Wednesday, Jun 17Only 1 left in stock - order soon.
Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering ProblemsPaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 17
Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySparkPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 18Only 3 left in stock (more on the way).
Customers also bought or read
- Spark: The Definitive Guide: Big Data Processing Made Simple
Paperback$48.00$48.00FREE delivery Wed, Jun 17 - Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Paperback$43.99$43.99FREE delivery Wed, Jun 17 - Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Paperback$44.94$44.94FREE delivery Wed, Jun 17 - Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
Paperback$43.99$43.99FREE delivery Wed, Jun 17 - Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
Paperback$35.46$35.46FREE delivery Wed, Jun 17 - Spark in Action, Second Edition: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Paperback$34.96$34.96$3.99 delivery Mon, Jun 22 - Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh
Paperback$50.99$50.99FREE delivery Wed, Jun 17 - The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Paperback$41.71$41.71FREE delivery Wed, Jun 17 - Database Internals: A Deep Dive into How Distributed Data Systems Work#1 Best SellerDesktop Database Books
Paperback$36.33$36.33FREE delivery Wed, Jun 17 - Delta Lake: The Definitive Guide: Modern Data Lakehouse Architectures with Data Lakes
Paperback$49.00$49.00FREE delivery Wed, Jun 17 - Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
Paperback$45.70$45.70FREE delivery Thu, Jun 18 - Data Pipelines with Apache Airflow, Second Edition: Orchestration for data and AI
Paperback$56.99$56.99FREE delivery Wed, Jun 17 - Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming
Paperback$49.12$49.12FREE delivery Wed, Jun 17 - Data Pipelines Pocket Reference: Moving and Processing Data for Analytics
Paperback$16.93$16.93Delivery Wed, Jun 17 - Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications
Paperback$42.49$42.49FREE delivery Wed, Jun 17 - Data Quality Fundamentals: A Practitioner's Guide to Building Trustworthy Data Pipelines
Paperback$39.63$39.63FREE delivery Wed, Jun 17 - Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
Paperback$35.99$35.99FREE delivery Wed, Jun 17 - Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
Paperback$48.59$48.59FREE delivery Wed, Jun 17 - Data-Intensive Text Processing with MapReduce (Synthesis Lectures on Human Language Technologies, 7)
Paperback$39.88$39.88$3.99 delivery Jun 22 - 24 - Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
Paperback$44.99$44.99FREE delivery Wed, Jun 17 - AWS for Solutions Architects: Design and scale secure AWS architectures with GenAI strategies and real-world patterns
Paperback$45.99$45.99FREE delivery Wed, Jun 17 - Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
Paperback$17.59$17.59Delivery Thu, Jun 18 - Kubernetes: Up and Running: Dive into the Future of Infrastructure
Paperback$43.99$43.99FREE delivery Wed, Jun 17 - Ace the Data Science Interview: 201 Real Interview Questions Asked By FAANG, Tech Startups, & Wall Street
Paperback$42.89$42.89FREE delivery Jun 28 - Jul 3 - Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale
Paperback$45.39$45.39FREE delivery Wed, Jun 17 - Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Paperback$45.25$45.25FREE delivery Wed, Jun 17
From the brand
-
Your partner in learning
-
Bestsellers
-
Software Development
-
Programming Languages
-
AI / Machine Learning
-
Data Science
-
Data, Databases and more
-
Cloud Services
-
Business
-
Finance
-
Blockchain / Cryptocurrency
-
Security
-
Lean series
-
Cookbooks
-
Head First series
-
97 Things series
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
Learning Spark
|
Advanced Analytics with Spark 2nd Ed.
|
Learning Spark Streaming
|
High Performance Spark
|
|
|---|---|---|---|---|
|
Add to Cart
|
Add to Cart
|
Add to Cart
|
Add to Cart
|
|
| Customer Reviews |
4.7 out of 5 stars 334
|
3.8 out of 5 stars 51
|
4.4 out of 5 stars 33
|
4.1 out of 5 stars 93
|
| Price | $43.99$43.99 | $37.95$37.95 | $49.12$49.12 | $17.93$17.93 |
| Subtitle | Lightning-Fast Data Analytics | Patterns for Learning from Data at Scale | Best Practices for Scaling and Optimixing Apache Spark | Best practices for scaling and optimizing Apache Spark |
Editorial Reviews
About the Author
Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.
Product details
- ASIN : 1491943203
- Publisher : O'Reilly Media
- Publication date : July 11, 2017
- Edition : 1st
- Language : English
- Print length : 358 pages
- ISBN-10 : 9781491943205
- ISBN-13 : 978-1491943205
- Item Weight : 2.31 pounds
- Dimensions : 7 x 0.75 x 9.25 inches
- Best Sellers Rank: #710,303 in Books (See Top 100 in Books)
- #219 in Business Intelligence Tools
- #334 in Data Mining (Books)
- #519 in Data Processing
- Customer Reviews:
About the author

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and Kubeflow for ML. She is a committer and PMC on Apache Spark and ASF member. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys dancing, scooters, and playing with fire.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonTop reviews from the United States
- 5 out of 5 stars
a book focused on how to write high performance spark code
Reviewed in the United States on September 7, 2017this is not a beginner's guide, so you need some working knowledge of Scala and spark beforehand. <Learning Spark> and <Spark in Action> will lay a good foundation for this book. The target reader is spark programmer, all the content focuses on how to write high performance spark code, especially how to use the spark core and spark SQL API. there is nothing about how to admin or configure a spark cluster. In this latter area, one can try <expert Hadoop administration> . Having that said, this book have done a great job in explaining the nuances of writing spark code. highly recommended.
4 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
you'll love this book
Reviewed in the United States on December 3, 2017If you're a real hacker, you'll love this book. If you're not, and you would like to be, you'll find it frustrating, but if you stick with it, you will grow as a professional. If you're not, and you know you never will be, I suggest you start working on a nursing, phlebotomy or massage therapy certifation before people in the first two groups figure out how to automate your job.
2 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 4 out of 5 stars
You must know the basics of Spark before this book will be helpful to you
Reviewed in the United States on August 4, 2017Format: PaperbackVine Customer Review of Free ProductApache spark moves along the continuum of parallel processing. “Apache Spark is a high-performance, general-purpose distributed computing system that has become the most active Apache open source project with more than 1,000 active contributors”. The authors go on to state “Spark enables us to process large quantities of data, beyond what can fit on a single machine, with a high-level, relatively easy-to-use API”.
Most people in (and out) of IT will never have any contact with Spark. I need to know about it only because my job involves having at least a superficial knowledge of every significant aspect of IT.
This book presumes you are already conversant with Apache Spark and need no education or hand-holding in that regard.
Rather this book’s goal is to help the reader make their Spark queries “faster, able to handle larger data sizes, and use fewer resources”. Being able to at least read Scala is highly recommended.
The entire book is loaded with detailed examples. For the casual reader, such as myself, lacking a Spark environment to play in, there is an empty feeling – you can read the examples, study them, but not run them.
Having read literally dozens or more programming cookbooks during the course of my career, this one feels right, but without being able to run the examples, that’s just as an assumption. It does, however, make me wish I had some huge datasets to work on. Maybe I can get a job with the NSA? I bet there are a lot of Spark experts there.
Jerry
One person found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 3 out of 5 stars
Spark Fine Tuning
Reviewed in the United States on October 13, 2017Format: PaperbackVine Customer Review of Free ProductThis book is filled with useful optimization tidbits for those already familiar with using Spark. It’s not a fun read, and I couldn’t imagine where to begin to find the resources to practice the content.
2 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Like new!
Reviewed in the United States on January 20, 2024Book came in faster than expected and in perfect condition!
Sending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
good to explain in one language
Reviewed in the United States on November 17, 2017much improved compared with 1st edition, more elaboration on joining datasets. good to explain in one language.
Sending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Very readable, practical text
Reviewed in the United States on August 13, 2017This book clarifies lots of my questions on Spark. I especially appreciate the walk through joins.
One person found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 4 out of 5 stars
Detailed book with an eye on performance and a lot of code examples
Reviewed in the United States on July 28, 2017Format: PaperbackVine Customer Review of Free ProductOverall, I thought this was a very good book. It strikes a good balance between detailed instruction and depth and being a guidebook, not an instruction manual. It's the usual high quality that I've come to expect from O'Reilly, and I feel much more confident about my understanding of Spark, both as a user and of the inner workings.
Much of the book is written with a focus on performance. There's some discussion of statistical concepts, but the book is clearly aimed at helping the reader use Spark in a resource-efficient manner (which makes a lot of sense, given that Spark comes into play when you're tackling large data sets).
Virtually all of the code examples are written in Scala. When I began reading, my Scala abilities were fairly limited, but the authors do a good job of parsing and commenting on the code such that I now feel much stronger in Scala, as well. They do have a chapter that discusses using Python and Java (including JVM), but most of the book is presented through Scala.
My one complaint about this book is that it's a bit heavy on the code. It's possible that it's necessary, but I ended up skimming most of the coding examples, and it made for some tedious reading at times. Then again, there were several examples that I scrutinized closely, and having thorough examples did help me learn quite a bit of Scala.
3 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Top reviews from other countries
Duraga singh thakur5 out of 5 starsGood book
Reviewed in India on January 3, 2024Good book
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
GS5 out of 5 starssimplicity and beautiful diagrams and presentation
Reviewed in the United Kingdom on April 24, 2018Clarity, simplicity and beautiful diagrams and presentation. You cannot get better than this.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Babacar4 out of 5 starsThe book is well written, but cover a too small part of streaming
Reviewed in France on February 14, 2019The book is well written, but cover a too small part of streaming. I'm looking for a good way to optimize spark streaming jobs. I have done a round in the web but couldn't find something really useful.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Pentergark5 out of 5 starsBuen libro para leer y tener de consulta
Reviewed in Spain on December 21, 2017El libro es tal y como se espera. Muchos ejemplos, muy bien comentado y bien aclarados los puntos. Perfecto iniciados N el mundo Spark o gente que tenga ciertas dudas.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Amazon Customer5 out of 5 starsFive Stars
Reviewed in the United Kingdom on August 15, 2017Best spark in depth reference book
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again










