Amazon.com: High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark: 9781491943205: Karau, Holden, Warren, Rachel: Books

This item: High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

$17.90

Get it as soon as Wednesday, Jun 17

Only 1 left in stock - order soon.

Sold by ANGORA and ships from Amazon Fulfillment.

+

Spark: The Definitive Guide: Big Data Processing Made Simple

$48.00

Get it as soon as Wednesday, Jun 17

Only 1 left in stock - order soon.

Sold by StarPlatinium and ships from Amazon Fulfillment.

+

Learning Spark: Lightning-Fast Data Analytics

$43.99

Get it as soon as Wednesday, Jun 17

In Stock

Ships from and sold by Amazon.com.

Total price: $00

To see our price, add these items to your cart.

Try again!

Details

Added to Cart

Some of these items ship sooner than the others.

Show details Hide details

Choose items to buy together.

Spark: The Definitive Guide: Big Data Processing Made Simple
Bill Chambers

Paperback
$48.00
Get it as soon as Wednesday, Jun 17
FREE Shipping by Amazon
Only 1 left in stock - order soon.
Learning Spark: Lightning-Fast Data Analytics
Jules S. Damji

Paperback
$43.99
Get it as soon as Wednesday, Jun 17
FREE Shipping by Amazon
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Holden Karau
Paperback
$52.58
Get it Jul 14 - 17
$3.99 shipping
This title will be released on July 7, 2026.
Learning Spark: Lightning-Fast Big Data Analysis
Holden Karau

Paperback
$11.12
Get it as soon as Wednesday, Jun 17
FREE Shipping on orders over $35 shipped by Amazon
Only 1 left in stock - order soon.
Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
Bartosz Konieczny

Paperback
$35.46
Get it as soon as Wednesday, Jun 17
FREE Shipping by Amazon
Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
Mahmoud Parsian

Paperback
$45.70
Get it as soon as Thursday, Jun 18
FREE Shipping by Amazon
Only 3 left in stock (more on the way).

Customers also bought or read

Page 1 of 1Start over

Spark: The Definitive Guide: Big Data Processing Made Simple
456
Paperback
$48.00
FREE delivery Wed, Jun 17
Learning Spark: Lightning-Fast Data Analytics
334
Paperback
$43.99
FREE delivery Wed, Jun 17
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
878
Paperback
$43.99
FREE delivery Wed, Jun 17
Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
28
Paperback
$44.94
FREE delivery Wed, Jun 17
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
286
Paperback
$43.99
FREE delivery Wed, Jun 17
Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
17
Paperback
$35.46
FREE delivery Wed, Jun 17
Doing Data Science: Straight Talk from the Frontline
232
Paperback
$38.26
FREE delivery Wed, Jun 17
Learning Spark: Lightning-Fast Big Data Analysis
345
Paperback
$11.12
Delivery Wed, Jun 17
Spark in Action, Second Edition: Covers Apache Spark 3 with Examples in Java, Python, and Scala
44
Paperback
$34.96
$3.99 delivery Mon, Jun 22
Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh
149
Paperback
$50.99
FREE delivery Wed, Jun 17
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
770
Paperback
$41.71
FREE delivery Wed, Jun 17
Database Internals: A Deep Dive into How Distributed Data Systems Work
551
Desktop Database Books
Paperback
$36.33
FREE delivery Wed, Jun 17
Delta Lake: The Definitive Guide: Modern Data Lakehouse Architectures with Data Lakes
11
Paperback
$49.00
FREE delivery Wed, Jun 17
Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up using PySpark
19
Paperback
$45.70
FREE delivery Thu, Jun 18
Data Pipelines with Apache Airflow, Second Edition: Orchestration for data and AI
4
Paperback
$56.99
FREE delivery Wed, Jun 17
Stream Processing with Apache Spark: Mastering Structured Streaming and Spark Streaming
33
Paperback
$49.12
FREE delivery Wed, Jun 17
Data Pipelines Pocket Reference: Moving and Processing Data for Analytics
435
Paperback
$16.93
Delivery Wed, Jun 17
Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications
84
Paperback
$42.49
FREE delivery Wed, Jun 17
Data Quality Fundamentals: A Practitioner's Guide to Building Trustworthy Data Pipelines
45
Paperback
$39.63
FREE delivery Wed, Jun 17
Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
29
Paperback
$35.99
FREE delivery Wed, Jun 17
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
132
Paperback
$48.59
FREE delivery Wed, Jun 17
Data-Intensive Text Processing with MapReduce (Synthesis Lectures on Human Language Technologies, 7)
9
Paperback
$39.88
$3.99 delivery Jun 22 - 24
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
17
Paperback
$44.99
FREE delivery Wed, Jun 17
AWS for Solutions Architects: Design and scale secure AWS architectures with GenAI strategies and real-world patterns
40
Paperback
$45.99
FREE delivery Wed, Jun 17
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
295
Paperback
$17.59
Delivery Thu, Jun 18
Kubernetes: Up and Running: Dive into the Future of Infrastructure
118
Paperback
$43.99
FREE delivery Wed, Jun 17
Ace the Data Science Interview: 201 Real Interview Questions Asked By FAANG, Tech Startups, & Wall Street
1,263
Paperback
$42.89
FREE delivery Jun 28 - Jul 3
Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale
8
Paperback
$45.39
FREE delivery Wed, Jun 17
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
981
Paperback
$45.25
FREE delivery Wed, Jun 17

Loading...

From the brand

Your partner in learning

Visit the Store
Bestsellers

Visit the Store
Software Development

Visit the Store
Programming Languages

Visit the Store
AI / Machine Learning

Visit the Store
Data Science

Visit the Store
Data, Databases and more

Visit the Store
Cloud Services

Visit the Store
Business

Visit the Store
Finance

Visit the Store
Blockchain / Cryptocurrency

Visit the Store
Security

Visit the Store
Lean series

Visit the Store
Cookbooks

Visit the Store
Head First series

Visit the Store
97 Things series

Visit the Store
Sharing the knowledge of experts

O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.

Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.

From the Publisher

	Learning Spark	Advanced Analytics with Spark 2nd Ed.	Learning Spark Streaming	High Performance Spark

Customer Reviews	334	51	33	93
Price	$43.99	$37.95	$49.12	$17.93
Subtitle	Lightning-Fast Data Analytics	Patterns for Learning from Data at Scale	Best Practices for Scaling and Optimixing Apache Spark	Best practices for scaling and optimizing Apache Spark

Editorial Reviews

About the Author

Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Apache Spark and holds office hours at coffee shops at home and abroad. She is a Spark committer with frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.

Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.

Product details

ASIN ‏ : ‎ 1491943203
Publisher ‏ : ‎ O'Reilly Media
Publication date ‏ : ‎ July 11, 2017
Edition ‏ : ‎ 1st
Language ‏ : ‎ English
Print length ‏ : ‎ 358 pages
ISBN-10 ‏ : ‎ 9781491943205
ISBN-13 ‏ : ‎ 978-1491943205
Item Weight ‏ : ‎ 2.31 pounds
Dimensions ‏ : ‎ 7 x 0.75 x 9.25 inches
Best Sellers Rank: #710,303 in Books (See Top 100 in Books)
- #219 in Business Intelligence Tools
- #334 in Data Mining (Books)
- #519 in Data Processing
Customer Reviews:
(93)

Brief content visible, double tap to read full content.

Full content visible, double tap to read brief content.

Videos

Help others learn more about this product by uploading a video!

Upload your video

About the author

Follow authors to get new release updates, plus improved recommendations.

Holden Karau

Brief content visible, double tap to read full content.

Full content visible, double tap to read brief content.

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and Kubeflow for ML. She is a committer and PMC on Apache Spark and ASF member. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys dancing, scooters, and playing with fire.

Related books

Page 1 of 1Start Over

Customer reviews

93 global ratings

How customer reviews and ratings work

Top reviews from the United States

Zhao Jijiang
a book focused on how to write high performance spark code
Reviewed in the United States on September 7, 2017
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
this is not a beginner's guide, so you need some working knowledge of Scala and spark beforehand. <Learning Spark> and <Spark in Action> will lay a good foundation for this book. The target reader is spark programmer, all the content focuses on how to write high performance spark code, especially how to use the spark core and spark SQL API. there is nothing about how to admin or configure a spark cluster. In this latter area, one can try <expert Hadoop administration> . Having that said, this book have done a great job in explaining the nuances of writing spark code. highly recommended.
Read moreRead less
4 people found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Brianne T. Murphy
you'll love this book
Reviewed in the United States on December 3, 2017
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
If you're a real hacker, you'll love this book. If you're not, and you would like to be, you'll find it frustrating, but if you stick with it, you will grow as a professional. If you're not, and you know you never will be, I suggest you start working on a nursing, phlebotomy or massage therapy certifation before people in the first two groups figure out how to automate your job.
Read moreRead less
2 people found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Jerry Saperstein
You must know the basics of Spark before this book will be helpful to you
Reviewed in the United States on August 4, 2017
Format: Paperback
Vine Customer Review of Free Product
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Apache spark moves along the continuum of parallel processing. “Apache Spark is a high-performance, general-purpose distributed computing system that has become the most active Apache open source project with more than 1,000 active contributors”. The authors go on to state “Spark enables us to process large quantities of data, beyond what can fit on a single machine, with a high-level, relatively easy-to-use API”.
Most people in (and out) of IT will never have any contact with Spark. I need to know about it only because my job involves having at least a superficial knowledge of every significant aspect of IT.
This book presumes you are already conversant with Apache Spark and need no education or hand-holding in that regard.
Rather this book’s goal is to help the reader make their Spark queries “faster, able to handle larger data sizes, and use fewer resources”. Being able to at least read Scala is highly recommended.
The entire book is loaded with detailed examples. For the casual reader, such as myself, lacking a Spark environment to play in, there is an empty feeling – you can read the examples, study them, but not run them.
Having read literally dozens or more programming cookbooks during the course of my career, this one feels right, but without being able to run the examples, that’s just as an assumption. It does, however, make me wish I had some huge datasets to work on. Maybe I can get a job with the NSA? I bet there are a lot of Spark experts there.
Jerry
Read moreRead less
One person found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
J. Sullivan
Spark Fine Tuning
Reviewed in the United States on October 13, 2017
Format: Paperback
Vine Customer Review of Free Product
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
This book is filled with useful optimization tidbits for those already familiar with using Spark. It’s not a fun read, and I couldn’t imagine where to begin to find the resources to practice the content.
Read moreRead less
2 people found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Fatima Daher
Like new!
Reviewed in the United States on January 20, 2024
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Book came in faster than expected and in perfect condition!
Read moreRead less
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
User File
good to explain in one language
Reviewed in the United States on November 17, 2017
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
much improved compared with 1st edition, more elaboration on joining datasets. good to explain in one language.
Read moreRead less
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Anya
Very readable, practical text
Reviewed in the United States on August 13, 2017
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
This book clarifies lots of my questions on Spark. I especially appreciate the walk through joins.
Read moreRead less
One person found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Larson
Detailed book with an eye on performance and a lot of code examples
Reviewed in the United States on July 28, 2017
Format: Paperback
Vine Customer Review of Free Product
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Overall, I thought this was a very good book. It strikes a good balance between detailed instruction and depth and being a guidebook, not an instruction manual. It's the usual high quality that I've come to expect from O'Reilly, and I feel much more confident about my understanding of Spark, both as a user and of the inner workings.
Much of the book is written with a focus on performance. There's some discussion of statistical concepts, but the book is clearly aimed at helping the reader use Spark in a resource-efficient manner (which makes a lot of sense, given that Spark comes into play when you're tackling large data sets).
Virtually all of the code examples are written in Scala. When I began reading, my Scala abilities were fairly limited, but the authors do a good job of parsing and commenting on the code such that I now feel much stronger in Scala, as well. They do have a chapter that discusses using Python and Java (including JVM), but most of the book is presented through Scala.
My one complaint about this book is that it's a bit heavy on the code. It's possible that it's necessary, but I ended up skimming most of the coding examples, and it made for some tedious reading at times. Then again, there were several examples that I scrutinized closely, and having thorough examples did help me learn quite a bit of Scala.
Read moreRead less
3 people found this helpful
Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again

See more reviews

Top reviews from other countries

Translated by Amazon

See original

Duraga singh thakur
Good book
Reviewed in India on January 3, 2024
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Good book
Read moreRead less
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
GS
simplicity and beautiful diagrams and presentation
Reviewed in the United Kingdom on April 24, 2018
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Clarity, simplicity and beautiful diagrams and presentation. You cannot get better than this.
Read moreRead less
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Babacar
The book is well written, but cover a too small part of streaming
Reviewed in France on February 14, 2019
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
The book is well written, but cover a too small part of streaming. I'm looking for a good way to optimize spark streaming jobs. I have done a round in the web but couldn't find something really useful.
Read moreRead less
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Pentergark
Buen libro para leer y tener de consulta
Reviewed in Spain on December 21, 2017
Format: Paperback
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
El libro es tal y como se espera. Muchos ejemplos, muy bien comentado y bien aclarados los puntos. Perfecto iniciados N el mundo Spark o gente que tenga ciertas dudas.
Read moreRead less
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again
Sorry, we couldn't translate the review
Translate review to English
Translated from Spanish by Amazon
See original
Amazon Customer
Five Stars
Reviewed in the United Kingdom on August 15, 2017
Verified Purchase
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.
Best spark in depth reference book
Read moreRead less
Report
Sending feedback...
Thanks, we'll investigate in the next few days.
Sorry, We failed to report this review. Please try again

See more reviews

Sorry, there was a problem.

Sorry, there was a problem.

Image Unavailable

Follow the author

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Purchase options and add-ons

There is a newer edition of this item:

Frequently bought together

Customers who viewed this item also viewed

Customers also bought or read

From the brand

Your partner in learning

Bestsellers

Software Development

Programming Languages

AI / Machine Learning

Data Science

Data, Databases and more

Cloud Services

Business

Finance

Blockchain / Cryptocurrency

Security

Lean series

Cookbooks

Head First series

97 Things series

From the Publisher

Editorial Reviews

About the Author

Product details

Videos

About the author

Holden Karau

Related books

Customer reviews

Images in this review

Top reviews from the United States

a book focused on how to write high performance spark code

you'll love this book

You must know the basics of Spark before this book will be helpful to you

Spark Fine Tuning

Like new!

good to explain in one language

Very readable, practical text

Detailed book with an eye on performance and a lot of code examples

Top reviews from other countries

Good book

simplicity and beautiful diagrams and presentation

The book is well written, but cover a too small part of streaming

Buen libro para leer y tener de consulta

Five Stars