RHive 安装部署

RHive是R分布式计算的一个扩展组件,安装以后可以连接Hive,在R中进行HQL(Hive SQL)查询,也允许在Hive中使用R的对象和函数。

编译RHive

下载源码

mkdir RHive_source
yum install git
cd RHive_source
git clone git://github.com/nexr/RHive.git
# if you succeed, the name "RHive' is made automatically 
cd RHive

使用ant编译jar包

注意,一些环境变量没有编译会报错。
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
yum install ant
ant build

编译成功如下

Buildfile: build.xml

compile:
    [javac] Compiling 21 source files to /root/RHive_source/RHive/build/classes
    [javac] This version of java does not support the classic compiler; upgrading to modern
    ....
    [javac] 14 个警告
    [unjar] Expanding: /root/RHive_source/RHive/RHive/inst/javasrc/lib/JRI.jar into /root/RHive_source/RHive/build/classes
    [unjar] Expanding: /root/RHive_source/RHive/RHive/inst/javasrc/lib/REngine.jar into /root/RHive_source/RHive/build/classes
    [unjar] Expanding: /root/RHive_source/RHive/RHive/inst/javasrc/lib/RserveEngine.jar into /root/RHive_source/RHive/build/classes

jar:
    [jar] Building jar: /root/RHive_source/RHive/rhive_udf.jar

cran:
    [copy] Copying 1 file to /root/RHive_source/RHive/RHive/inst/java
    [copy] Copying 29 files to /root/RHive_source/RHive/build/CRAN/rhive/inst
    [copy] Copying 12 files to /root/RHive_source/RHive/build/CRAN/rhive/man
    [copy] Copying 11 files to /root/RHive_source/RHive/build/CRAN/rhive/R
    [copy] Copying 1 file to /root/RHive_source/RHive/build/CRAN/rhive
    [copy] Copying 1 file to /root/RHive_source/RHive/build/CRAN/rhive
[delete] Deleting: /root/RHive_source/RHive/rhive_udf.jar

build:

BUILD SUCCESSFUL

把RHive编译成R包

# pwd
/root/RHive_package/RHive
# ls ­‐l
total 76
# R CMD build ./RHive

如果编译成功,看起来是这样的

* checking for file ‘./RHive/DESCRIPTION’ ... OK
* preparing ‘RHive’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* looking to see if a ‘data/datalist’ file should be added
* building ‘RHive_2.0-0.10.tar.gz’

RHive_2.0-0.10.tar.gz 已经生成在当前目录中。

安装RHive包

没报错就成功了,注意‘rJava’和‘Rserve’包要有
R CMD INSTALL ./RHive_2.0-0.10.tar.gz

在所有计算节点安装运行RServe服务

安装Rserve

要使用RHive就需要在所有工作节点安装RServe。

yum -y install R
yum -y install R-devel

R CMD INSTALL ./Rserve_1.7-3.tar.gz

运行Rserve

所有节点需要启动Rserve DAEMON,编辑一下/etc/Rserv.conf(如果没有则创建一个),插入’remote enable’保存退出即可。

其他配置选项请参考 http://www.rforge.net/Rserve/doc.html

运行Rserve服务

R CMD Rserve

查看端口是否开启

# netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      1864/zabbix_agentd
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      1666/nrpe
tcp        0      0 0.0.0.0:17190               0.0.0.0:*                   LISTEN      1335/rpc.statd
tcp        0      0 0.0.0.0:6311                0.0.0.0:*                   LISTEN      36096/Rserve
tcp        0      0 0.0.0.0:25000               0.0.0.0:*                   LISTEN      44037/impalad
tcp        0      0 192.168.3.162:9000          0.0.0.0:*                   LISTEN      50593/python2.6

当然你也可以用telnet测试,Rserve默认的端口为6311,如果没有特殊需要就不要修改了。

# telnet 192.168.3.164 6311
Trying 192.168.3.164...
Connected to 192.168.3.164.
Escape character is '^]'.
Rsrv0103QAP1

运行RHive

配置环境

要运行RHive必须有两个环境变量,可以加入/etc/profile

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

RHive 运行样例

把rhive_udf.jar放大HDFS中
hadoop fs -put /rhive/lib/2.0-0.10/rhive_udf.jar /rhive/lib/2.0-0.10/

打开R,输入library(RHive) 加载包

> library(RHive)
载入需要的程辑包:rJava
载入需要的程辑包:Rserve

检查环境变量

> rhive.env()
    hadoop home: /opt/cloudera/parcels/CDH/lib/hadoop
    hadoop conf: /opt/cloudera/parcels/CDH/lib/hadoop/conf
    hive home: /opt/cloudera/parcels/CDH/lib/hive

RHive连接
在做任何hive操作前都需要连接到Hive server,如果连接不成功,那么RHive就无法使用了。

library(RHive)
rhive.connect(host="xxx.xxx.xxx",defaultFS="hdfs://xxx.xxx.xxx:8020/")

如果你用了Sentry会出现以下错误
错误: org.apache.hive.service.cli.HiveSQLException: Insufficient privileges to execute ADD
修改rhive.R文件将hiveClient$addJar注释即可
vim /root/RHive_source/RHive/RHive/R/rhive.R

HDFS 测试

>     rhive.hdfs.ls("/")
permission owner      group length      modify-time    file
1  rwxr-xr-x  hdfs supergroup      0 2016-03-03 13:29 /DimEmp
2  rwx------ hbase      hbase      0 2016-10-09 14:38  /hbase
3  rwxr-xr-x  hdfs supergroup      0 2016-03-24 09:59    /opt
4  rwxr-xr-x  hdfs supergroup      0 2016-10-09 14:41  /rhive
5  rwxr-xr-x  hdfs supergroup      0 2016-06-05 15:05 /system
6  rwxrwxrwt  hdfs supergroup      0 2016-08-31 14:26    /tmp
7  rwxr-xr-x  hdfs supergroup      0 2016-08-31 18:29   /user

Hive 测试

> rhive.list.tables()
    tab_name
1      dimemp
2   sample_07
3   sample_08
4 student_ext
5        test
6 ubt_es_test
> rhive.query('select * from dimemp limit 1000')

参考

RHive User Guide
RHive tutorial-Installation
CentOS6.5安装RHive
Cloudera Hadoop CDH上安装R及RHadoop(rhdfs/rmr2/rhbase/RHive)
RHive的安装和用法
RHadoop和CDH整合实例(三)- RHive

打赏支持:如果你觉得我的文章对你有所帮助,可以打赏我哟。