Getting Started
Pages 36
- Home
- Aggregation using Algebird Aggregators
- All about reducers count
- API Reference
- Building bigger platforms with scalding
- Calling Scalding from inside your application
- Common Exceptions and possible reasons
- Comparison to Scrunch and Scoobi
- Field rules
- Fields API: reduce functions of GroupBuilder
- Fields based API Reference
- Frequently asked questions
- Getting Started
- Intro to Scalding Jobs
- Introduction to Matrix Library
- Matrix API Reference
- Pig to Scalding
- Powered By
- REPL Reference
- Rosetta Code
- Run in Intellij IDEA
- Scala and sbt for Homebrew users
- Scala and sbt for MacPorts users
- Scald.rb
- Scalding Commons
- Scalding HBase
- Scalding on amazon elastic mapreduce
- Scalding REPL
- Scalding Sources
- Scalding with CDH3U2 in a Maven project
- SQL to Scalding
- Type safe api reference
- Upgrading to 0.9.0
- Using scalding with other versions of scala
- Using the distributed cache
- Why pack unpack and not toList[]
- Show 21 more pages…
Contents
Getting help
Documentation
- Scaladocs
- Getting Started
- Fields-based API Reference
- Type-safe API Reference
- Building Bigger Platforms With Scalding
- Scalding Sources
- Scalding-Commons
- Rosetta Code
- SQL to Scalding
Matrix API
Third Party Modules
Videos
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding REPL with Eclipse Scala Worksheets
How-tos
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Unit Testing Scalding Jobs
- TDD for Scalding
- Using counters
Tutorials
- Scalding for the impatient
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Mod-4 matrix arithmetic with Scalding and Algebird
- Dean Wampler's Scalding Workshop
- Typesafe's Activator for Scalding
Articles
- Hive, Pig, Scalding, Scoobi, Scrunch and Spark: A Comparison of Hadoop Frameworks
- Why Hadoop MapReduce needs Scala
- How Twitter is doing its part to democratize big data
- Meet the combo powering Hadoop at Etsy, Airbnb and Climate Corp.
- Scalding wins a Bossie award from InfoWorld
Other
Clone this wiki locally
Installation
To get started with Scalding, first clone the Scalding repository on Github:
git clone https://github.com/twitter/scalding.git
Next, build the code using sbt (a standard Scala build tool). Make sure you have Scala (download here, see scalaVersion in project/Build.scala for the correct version to download), and run the following commands:
./sbt update
./sbt test # runs the tests; if you do 'sbt assembly' below, these tests, which are long, are repeated
./sbt assembly # creates a fat jar with all dependencies, which is useful when using the scald.rb script
Now you're good to go!
Using Scalding with other versions of Scala
Scalding works with Scala 2.10 and 2.11 is recommended, though a few configuration files must be changed for this to work. In project/Build.scala, ensure that the proper scalaVersion value is set. Additionally, you'll need to ensure the proper version of specs in the same config. Change the following line
libraryDependencies += "org.scala-tools.testing" % "specs_2.10" % "1.6.9" % "test"
You can find the published versions here.
IDE Support
Scala's IDE support is generally not as strong as Java's, but there are several options that some people prefer. Both Eclipse and IntelliJ have plugins that support Scala syntax. To generate a project file for Scalding in Eclipse, refer to this project, and for IntelliJ files, this (note that with the latter, the 1.1 snapshot is recommended).
Reading material
For a quick introduction into Scalding, design patterns, TDD and connecting with external systems refer to this book Programming MapReduce with Scalding. You can code examples presented in the book here