Installing Apache Spark in Alibaba Cloud ECS


Apache Spark is lightning fast Cluster computing technology, which consists of a unified analytics engine for large-scale data processing, particularly used for big data and machine learning applications. It can be used to process streaming queries and batch processing.

Figure1:Apache Spark Components

Features of Apache Spark are

  • It runs the workload faster with good performance for batch and stream processing.

  • Using spark we can write application quickly using scala, JAVA, Python, R and SQL languages.

  • It consists of libraries/components such as SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming.

  • Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud.

Steps to Install Apache Spark in Alibaba Cloud ECS

Step-1:Prerequisites: Install the update using the following command

sudo apt update

sudo apt -y upgrade

Step-2: Install the latest version of JAVA package by using the following command

sudo apt install default-jdk

Step-3: Install Scala Language feature by using the below command

sudo apt-get install scala

Step-4: Checking whether Scala is installed properly by using the command and run a simple program in scala.

root@ECS-Review:/opt#Scala

The output displayed will be

Welcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.7).

Type in expressions for evaluation. Or try :help.

scala> println("Hello World")

Hello World

Step5: Install Apache Spark by using the below command

root@ECS-Review:/opt#wget https://downloads.apache.org/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz

root@ECS-Review:/opt#tar -xzvf spark-3.0.0-bin-hadoop2.7.tgz

Step-6: update ./bashrc file by using the below code

root@ECS-Review:/opt/spark-3.0.0-bin-hadoop2.7/bin# vi ~/.bashrc

Step-7: Set the SPARK_HOME path by adding the below line in ./.bashrc file

export SPARK_HOME=/opt/spark-3.0.0-bin-hadoop2.7

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

step-8:Run the below command to update .bashrc file

root@ECS-Review:/opt/spark-3.0.0-bin-hadoop2.7/bin# source ~/.bashrc

Step-10:Finally run Apache Spark by using the below command and the Spark command prompt will be displayed as shown in figure below

root@ECS-Review:/opt/spark-3.0.0-bin-hadoop2.7/bin# start-master.sh