This website requires JavaScript.

Spark交互式开发平台Zeppelin部署

搭建Zeppelin主要还是为了使用SparkR,过程如下,供各位参考。

安装过程

安装编译环境

yum install git
yum install java-1.8.0-openjdk-devel
yum install nodejs npm

安装Maven Ensure node is installed by running node --version Ensure maven is running version 3.1.x or higher with mvn -version Configure maven to use more memory than usual by export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"

wget http://www.eu.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
sudo tar -zxf apache-maven-3.3.9-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.9/bin/mvn /usr/local/bin/mvn

下载Zeppelin源码

git clone https://github.com/apache/zeppelin.git
git checkout branch-0.6 #切换最新的release版本
git pull #确保代码是最新的

或者直接下载官方打包好的源码

wget http://apache.fayea.com/zeppelin/zeppelin-0.6.1/zeppelin-0.6.1.tgz

编译 具体选项含义可以查看官方文档Build

先确定版本号

hadoop version
spark-shell --version

git版本过低的话可以用WANDisco 源

yum install http://opensource.wandisco.com/centos/6/git/x86_64/wandisco-git-release-6-1.noarch.rpm

开始编译

mvn clean package -DskipTests -Pspark-1.6  -Dspark.version=1.6.0 -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.8.0 -Pscala-2.10  -Pr  -Pvendor-repo -Pbuild-distr -Pyarn -Ppyspark -Psparkr

Maven command to build the Zeppelin for YARN (All spark queries are tracked in Yarn history):
mvn  clean package -Pspark-1.6 -Pr -Ppyspark -Dhadoop.version=2.6.0-cdh5.8.3 -Phadoop-2.6 -Pyarn   –DskipTests

部署Zeppelin

将Zeppelin解压到指定的目录中

tar zxf zeppelin-0.6.2-SNAPSHOT.tar.gz -C /opt/
mv /opt/zeppelin-0.6.2-SNAPSHOT /opt/zeppelin

配置Zeppelin

mkdir /etc/zeppelin
mv /opt/zeppelin/conf /etc/zeppelin/conf
cd /opt/zeppelin
ln -s /etc/zeppelin/conf conf
cd /etc/zeppelin/conf
cp zeppelin-env.sh{.template,}
cp zeppelin-site.xml{.template,}

修改zeppelin-env.sh文件

export ZEPPELIN_JAVA_OPTS="-Dmaster=yarn-client -Dspark.yarn.jar=/opt/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.6.2-SNAPSHOT.jar"
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
if [ -n "$HADOOP_HOME" ]; then
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

export ZEPPELIN_LOG_DIR=/var/log/zeppelin
export ZEPPELIN_PID_DIR=/var/run/zeppelin
export ZEPPELIN_WAR_TEMPDIR=/var/tmp/zeppelin

创建对应目录

mkdir /var/log/zeppelin
mkdir /var/run/zeppelin
mkdir /var/tmp/zeppelin

为Zeppelin新建一个用户,并且处理相关的路径权限

useradd zeppelin

chown -R zeppelin:zeppelin /opt/zeppelin/notebook
chown zeppelin:zeppelin /etc/zeppelin/conf/interpreter.json
chown -R zeppelin:zeppelin /var/log/zeppelin
chown -R zeppelin:zeppelin /var/run/zeppelin
chown -R zeppelin:zeppelin /var/tmp/zeppelin

为用户建立hdfs的目录

su hdfs
hadoop fs -mkdir /user/zeppelin
hadoop fs -chmod 777 /user/zeppelin

启动

bin/zeppelin-daemon.sh start

默认通过8080端口访问zeppelin。可以在conf/zeppelin-env.sh或conf/zeppelin-site.xml中进行修改

设置Spark Interpreter

修改zeppelin-env.sh文件

export JAVA_HOME=/usr/java/jdk1.8.0_60
export MASTER=yarn-client
export SPARK_HOME=/var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6
export HADOOP_CONF_DIR=/etc/hadoop/conf

默认参数会从 SPARK_HOME/conf/spark-default.conf中读取

添加外部依赖

export SPARK_SUBMIT_OPTIONS="--jars /usr/install/libs/mysql-connector-java-5.1.34.jar,/usr/install/spark/lib/influxdbSink-byHost.jar"

碰到问题的解决

1.com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]

应该是jackson版本冲突了.zeppelin中删除以下文件即可

rm zeppelin-server/target/lib/jackson-* and rm zeppelin-zengine/target/lib/jackson-*

相关连接: Could not find creator property with name ... with embedded spark binaries com.fasterxml.jackson.databind.JsonMappingException

如果是二进制包,直接删掉替换

rm lib/jackson-core-2.5.3.jar
rm lib/jackson-annotations-2.5.0.jar
rm lib/jackson-databind-2.5.3.jar
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-core-2.6.5.jar ./lib/
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-annotations-2.6.5.jar ./lib/
cp /var/lib/hadoop-hdfs/spark-2.0.2-bin-hadoop2.6/jars/jackson-databind-2.6.5.jar ./lib/

2.ERROR: lazy loading failed for package ‘stringr’

使用以下命令安装stringer

> install.packages("stringi",dep=TRUE)
根据提示删除00LOCK-stringi
rm -rf /usr/lib64/R/library/00LOCK-stringi

3./bin/sh: libpng-config: command not found

yum install libpng-devel

4.rjcommon.h:11:21: 错误:jpeglib.h:没有那个文件或目录

yum install libjpeg-turbo-devel

5.ERROR: configuration failed for package ‘XML’

yum install libxml2-devel

6.ERROR: configuration failed for package ‘rgl’

yum install mesa-libGL mesa-libGL-devel mesa-libGLU mesa-libGLU-devel

参考

Quick Start How-to: Install Apache Zeppelin on CDH 在Cloudera CDH上部署Zeppelin和SparkR Interpreter Running Zeppelin on CDH Apache Zeppelin on CDH Spark & Zeppelin

0条评论
avatar