Shop FRANKIE4
Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows.
Buy New
-64% $17.93
FREE delivery Wednesday, June 17 on orders shipped by Amazon over $35
Ships from: Amazon
Sold by: Epic Book Outlet
$17.93 with 64 percent savings
List Price: $49.99
Get Fast, Free Shipping with Amazon Prime
FREE delivery Wednesday, June 17 on orders shipped by Amazon over $35
Or Prime members get FREE delivery Tomorrow, June 13. Order within 1 hr 30 mins. Join Prime
Arrives 8 days before Father's Day
Only 1 left in stock - order soon.
$$17.93 () Includes selected options. Includes initial monthly payment and selected options. Details
Price
Subtotal
$$17.93
Subtotal
Initial payment breakdown
Shipping cost, delivery date, and order total (including tax) shown at checkout.
Ships from
Amazon
Amazon
Ships from
Amazon
Returns
FREE 30-day refund/replacement
FREE 30-day refund/replacement
Quick refund
Usually issued within 24 hours. See exceptions
FREE return
At least one free return option available.
Convenient dropoff
At any of our 50,000 US locations.
See return policy
Gift options
Available at checkout
Available at checkout This item is a gift. Change
At checkout, you can add a custom message, a gift receipt for easy returns and have the item gift-wrapped
Payment
Secure transaction
Your transaction is secure
We work hard to protect your security and privacy. Our payment security system encrypts your information during transmission. We don’t share your credit card details with third-party sellers, and we don’t sell your information to others. Learn more
$9.99
Get Fast, Free Shipping with Amazon Prime FREE Returns
FREE delivery Wednesday, June 17 on orders shipped by Amazon over $35
Or Prime members get FREE delivery Monday, June 15. Join Prime
Arrives 6 days before Father's Day
Only 1 left in stock - order soon.
$$17.93 () Includes selected options. Includes initial monthly payment and selected options. Details
Price
Subtotal
$$17.93
Subtotal
Initial payment breakdown
Shipping cost, delivery date, and order total (including tax) shown at checkout.
Access codes and supplements are not guaranteed with used items.
Added to

Sorry, there was a problem.

There was an error retrieving your Wish Lists. Please try again.

Sorry, there was a problem.

List unavailable.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

  • High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Follow the author

Get new release updates & improved recommendations
Something went wrong. Please try your request again later.

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

4.1 out of 5 stars (93)

{"desktop_buybox_group_1":[{"displayPrice":"$17.93","priceAmount":17.93,"currencySymbol":"$","integerValue":"17","decimalSeparator":".","fractionalValue":"93","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"GxZf3OfebSWriXGg7JcZCeGqkycj2YXpQL0759Mr5q66dgDYUI8TgjMmeR32Td3EpEraKI4KaM2icEu3nShMh2Dz8YdBXR0O4EhYd5RbpDhbsGYRhz978d2yF0YodfEixaNRr4i%2Flp1KrEm%2BqzM22IvYZh9qb%2FZK6zK2TnXA2HbsGxgrdEc626bKoz1DNqf8","locale":"en-US","buyingOptionType":"NEW","aapiBuyingOptionIndex":0}, {"displayPrice":"$9.99","priceAmount":9.99,"currencySymbol":"$","integerValue":"9","decimalSeparator":".","fractionalValue":"99","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"GxZf3OfebSWriXGg7JcZCeGqkycj2YXpR4GJ73%2F3PftxOWMhGFzPFLd8qLWx%2B8FWQFEdKQkfjAyhx1CPKU1GCwf6yy755c36GeJ17kQUjktVxNqlyhgQPLePjGnynXVMX1c7TT7UurwNA1evNp5CCRlpsjgrIk2mQxj0F8abgWTgq0BmwD3vU0yNq%2BRqy209","locale":"en-US","buyingOptionType":"USED","aapiBuyingOptionIndex":1}]}

Purchase options and add-ons

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing.

With this book, you’ll explore:

  • How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure
  • The choice between data joins in Core Spark and Spark SQL
  • Techniques for getting the most out of standard RDD transformations
  • How to work around performance issues in Spark’s key/value pair paradigm
  • Writing high-performance Spark code without Scala or the JVM
  • How to test for functionality and performance when applying suggested improvements
  • Using Spark MLlib and Spark ML machine learning libraries
  • Spark’s Streaming components and external community packages

There is a newer edition of this item:

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
$52.58
This title will be released on July 7, 2026.

Frequently bought together

This item: High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
$17.90
Get it as soon as Wednesday, Jun 17
Only 1 left in stock - order soon.
Sold by ANGORA and ships from Amazon Fulfillment.
+
$48.00
Get it as soon as Wednesday, Jun 17
Only 1 left in stock - order soon.
Sold by StarPlatinium and ships from Amazon Fulfillment.
+
$43.99
Get it as soon as Wednesday, Jun 17
In Stock
Ships from and sold by Amazon.com.
Total price: $00
To see our price, add these items to your cart.
Details
Added to Cart
Some of these items ship sooner than the others.
Choose items to buy together.

Customers also bought or read

Loading...

From the brand


From the Publisher

Learning Spark
Advanced Analytics with Spark 2nd Ed.
Learning Spark Streaming
High Performance Spark
Customer Reviews
4.7 out of 5 stars 334
3.8 out of 5 stars 51
4.4 out of 5 stars 33
4.1 out of 5 stars 93
Price $43.99 $37.95 $49.12 $17.93
Subtitle Lightning-Fast Data Analytics Patterns for Learning from Data at Scale Best Practices for Scaling and Optimixing Apache Spark Best practices for scaling and optimizing Apache Spark

Editorial Reviews

About the Author

Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Apache Spark and holds office hours at coffee shops at home and abroad. She is a Spark committer with frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.

Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.

Product details

  • ASIN ‏ : ‎ 1491943203
  • Publisher ‏ : ‎ O'Reilly Media
  • Publication date ‏ : ‎ July 11, 2017
  • Edition ‏ : ‎ 1st
  • Language ‏ : ‎ English
  • Print length ‏ : ‎ 358 pages
  • ISBN-10 ‏ : ‎ 9781491943205
  • ISBN-13 ‏ : ‎ 978-1491943205
  • Item Weight ‏ : ‎ 2.31 pounds
  • Dimensions ‏ : ‎ 7 x 0.75 x 9.25 inches
  • Best Sellers Rank: #710,303 in Books (See Top 100 in Books)
  • Customer Reviews:
    4.1 out of 5 stars (93)

About the author

Follow authors to get new release updates, plus improved recommendations.
Holden Karau
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and Kubeflow for ML. She is a committer and PMC on Apache Spark and ASF member. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys dancing, scooters, and playing with fire.

Sponsored

Customer reviews

4.1 out of 5 stars
93 global ratings
Sponsored

Top reviews from the United States

  • 5 out of 5 stars
    a book focused on how to write high performance spark code
    Reviewed in the United States on September 7, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    this is not a beginner's guide, so you need some working knowledge of Scala and spark beforehand. <Learning Spark> and <Spark in Action> will lay a good foundation for this book. The target reader is spark programmer, all the content focuses on how to write high performance spark code, especially how to use the spark core and spark SQL API. there is nothing about how to admin or configure a spark cluster. In this latter area, one can try <expert Hadoop administration> . Having that said, this book have done a great job in explaining the nuances of writing spark code. highly recommended.

    4 people found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    you'll love this book
    Reviewed in the United States on December 3, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    If you're a real hacker, you'll love this book. If you're not, and you would like to be, you'll find it frustrating, but if you stick with it, you will grow as a professional. If you're not, and you know you never will be, I suggest you start working on a nursing, phlebotomy or massage therapy certifation before people in the first two groups figure out how to automate your job.

    2 people found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 4 out of 5 stars
    You must know the basics of Spark before this book will be helpful to you
    Reviewed in the United States on August 4, 2017
    Format: Paperback
    Vine Customer Review of Free Product
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    Apache spark moves along the continuum of parallel processing. “Apache Spark is a high-performance, general-purpose distributed computing system that has become the most active Apache open source project with more than 1,000 active contributors”. The authors go on to state “Spark enables us to process large quantities of data, beyond what can fit on a single machine, with a high-level, relatively easy-to-use API”.

    Most people in (and out) of IT will never have any contact with Spark. I need to know about it only because my job involves having at least a superficial knowledge of every significant aspect of IT.

    This book presumes you are already conversant with Apache Spark and need no education or hand-holding in that regard.

    Rather this book’s goal is to help the reader make their Spark queries “faster, able to handle larger data sizes, and use fewer resources”. Being able to at least read Scala is highly recommended.

    The entire book is loaded with detailed examples. For the casual reader, such as myself, lacking a Spark environment to play in, there is an empty feeling – you can read the examples, study them, but not run them.

    Having read literally dozens or more programming cookbooks during the course of my career, this one feels right, but without being able to run the examples, that’s just as an assumption. It does, however, make me wish I had some huge datasets to work on. Maybe I can get a job with the NSA? I bet there are a lot of Spark experts there.

    Jerry

    One person found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 3 out of 5 stars
    Spark Fine Tuning
    Reviewed in the United States on October 13, 2017
    Format: Paperback
    Vine Customer Review of Free Product
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    This book is filled with useful optimization tidbits for those already familiar with using Spark. It’s not a fun read, and I couldn’t imagine where to begin to find the resources to practice the content.

    2 people found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    Like new!
    Reviewed in the United States on January 20, 2024
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    Book came in faster than expected and in perfect condition!

    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    good to explain in one language
    Reviewed in the United States on November 17, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    much improved compared with 1st edition, more elaboration on joining datasets. good to explain in one language.

    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    Very readable, practical text
    Reviewed in the United States on August 13, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    This book clarifies lots of my questions on Spark. I especially appreciate the walk through joins.

    One person found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 4 out of 5 stars
    Detailed book with an eye on performance and a lot of code examples
    Reviewed in the United States on July 28, 2017
    Format: Paperback
    Vine Customer Review of Free Product
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    Overall, I thought this was a very good book. It strikes a good balance between detailed instruction and depth and being a guidebook, not an instruction manual. It's the usual high quality that I've come to expect from O'Reilly, and I feel much more confident about my understanding of Spark, both as a user and of the inner workings.

    Much of the book is written with a focus on performance. There's some discussion of statistical concepts, but the book is clearly aimed at helping the reader use Spark in a resource-efficient manner (which makes a lot of sense, given that Spark comes into play when you're tackling large data sets).

    Virtually all of the code examples are written in Scala. When I began reading, my Scala abilities were fairly limited, but the authors do a good job of parsing and commenting on the code such that I now feel much stronger in Scala, as well. They do have a chapter that discusses using Python and Java (including JVM), but most of the book is presented through Scala.

    My one complaint about this book is that it's a bit heavy on the code. It's possible that it's necessary, but I ended up skimming most of the coding examples, and it made for some tedious reading at times. Then again, there were several examples that I scrutinized closely, and having thorough examples did help me learn quite a bit of Scala.

    3 people found this helpful
    Sending feedback...
    Thank you for your feedback.
    Sending feedback...
    Thanks, we'll investigate in the next few days.

Top reviews from other countries

    Translated by Amazon
    See original
  • 5 out of 5 stars
    Good book
    Reviewed in India on January 3, 2024
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.
    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    simplicity and beautiful diagrams and presentation
    Reviewed in the United Kingdom on April 24, 2018
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    Clarity, simplicity and beautiful diagrams and presentation. You cannot get better than this.

    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 4 out of 5 stars
    The book is well written, but cover a too small part of streaming
    Reviewed in France on February 14, 2019
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    The book is well written, but cover a too small part of streaming. I'm looking for a good way to optimize spark streaming jobs. I have done a round in the web but couldn't find something really useful.

    Sending feedback...
    Thanks, we'll investigate in the next few days.
  • 5 out of 5 stars
    Buen libro para leer y tener de consulta
    Reviewed in Spain on December 21, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    El libro es tal y como se espera. Muchos ejemplos, muy bien comentado y bien aclarados los puntos. Perfecto iniciados N el mundo Spark o gente que tenga ciertas dudas.

    Sending feedback...
    Thanks, we'll investigate in the next few days.
    Translated from Spanish by Amazon
    See original
  • 5 out of 5 stars
    Five Stars
    Reviewed in the United Kingdom on August 15, 2017
    Brief content visible, double tap to read full content.
    Full content visible, double tap to read brief content.

    Best spark in depth reference book

    Sending feedback...
    Thanks, we'll investigate in the next few days.