Reference
The Apache Spark website is the best reference for getting started with programming, deploying and running Spark applications
https://spark.apache.org/docs/latest/index.html
Programming Guides:
Quick Start: a quick introduction to the Spark API; start here!
RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
MLlib: applying machine learning algorithms
GraphX: processing graphs
PySpark: processing data with Spark in Python
API Docs:
Operations Guide:
Configuration: customize Spark via its configuration system
Monitoring: track the behavior of your applications
Tuning Guide: best practices to optimize performance and memory use
Job Scheduling: scheduling resources across and within Spark applications
Last updated