Reference

The Apache Spark website is the best reference for getting started with programming, deploying and running Spark applications

Programming Guides:

Quick Start: a quick introduction to the Spark API; start here!
RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
MLlib: applying machine learning algorithms
GraphX: processing graphs
PySpark: processing data with Spark in Python

API Docs:

Operations Guide:

Last updated 3 years ago