Reference
Last updated
Last updated
The Apache Spark website is the best reference for getting started with programming, deploying and running Spark applications
Programming Guides:
: a quick introduction to the Spark API; start here!
: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
: processing structured data with relational queries (newer API than RDDs)
: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
: applying machine learning algorithms
: processing graphs
: processing data with Spark in Python
API Docs:
Operations Guide:
: customize Spark via its configuration system
: track the behavior of your applications
: best practices to optimize performance and memory use
: scheduling resources across and within Spark applications