PySpark Setup

PySpark is included with Spark. Use this for local usage or as a client to connect to a cluster instead of setting up a cluster itself. For regular users, use the Invariant JupyterLab build.

Install pyspark by using pyPI in the newly created environment. This will install PySpark under the new virtual environment.

pip install pyspark

Alternatively, install PySpark from Conda

conda install pyspark

For more details about the API, refer to Apache Spark website

PreviousSpark with YARN NextDataFrame API

Last updated 3 years ago