使用Hive加载HDFS数据并分析图书评分情况 创建Hive表 1. 下载并安装集群全量客户端,例如在主Master节点上安装,客户端安装目录为“/opt/client”,相关操作可参考用户操作指南安装客户端。 也可直接使用Master节点中自带的集群客户端,安装目录为“/opt/Bigdata/client”。 2. 为主Master节点绑定一个弹性IP并在安全组中放通22端口,然后使用root用户登录主Master节点,进入客户端所在目录并加载变量。 cd /opt/client source bigdataenv 3. 执行 beeline n 'hdfs' 命令进入Hive Beeline命令行界面。 执行以下命令创建一个与原始数据字段匹配的Hive表: create table bookscore (userid int,bookid int,score int,remarks string) row format delimited fields terminated by ','stored as textfile; 4. 查看表是否创建成功: show tables; + tabname + bookscore + 将原始数据导入Hive并进行分析 1. 继续在Hive Beeline命令行中执行以下命令,将已导入HDFS的原始数据导入Hive表中。 load data inpath '/tmp/test/bookscore.txt' into table bookscore; 2. 数据导入完成后,执行如下命令,查看Hive表内容。 select from bookscore; ++++ bookscore.userid bookscore.bookid bookscore.score bookscore.remarks ++++ 202001 242 3 Good! 202002 302 3 Test. 202003 377 1 Bad! 220204 51 2 Bad! 202005 346 1 aaa 202006 474 4 None 202007 265 2 Bad! 202008 465 5 Good! 202009 451 3 Bad! 202010 86 3 Bad! 202011 257 2 Bad! 202012 465 4 Good! 202013 465 4 Good! 202014 465 4 Good! 202015 302 5 Good! 202016 302 3 Good! ... 执行以下命令统计表行数: select count() from bookscore; + c0 + 32 + 3. 执行以下命令,等待MapReduce任务完成后,筛选原始数据中累计评分最高的图书top3。 select bookid,sum(score) as summarize from bookscore group by bookid order by summarize desc limit 3; 例如最终显示内容如下: ... INFO : 20211014 19:53:42,427 Stage2 map 0%, reduce 0% INFO : 20211014 19:53:49,572 Stage2 map 100%, reduce 0%, Cumulative CPU 2.15 sec INFO : 20211014 19:53:56,713 Stage2 map 100%, reduce 100%, Cumulative CPU 4.19 sec INFO : MapReduce Total cumulative CPU time: 4 seconds 190 msec INFO : Ended Job job16341972076820025 INFO : MapReduce Jobs Launched: INFO : StageStage1: Map: 1 Reduce: 1 Cumulative CPU: 4.24 sec HDFS Read: 7872 HDFS Write: 322 SUCCESS INFO : StageStage2: Map: 1 Reduce: 1 Cumulative CPU: 4.19 sec HDFS Read: 5965 HDFS Write: 143 SUCCESS INFO : Total MapReduce CPU Time Spent: 8 seconds 430 msec INFO : Completed executing command(queryIdomm20211014195310cf6696335b584bd5983773286ea83409); Time taken: 47.388 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager ++ bookid summarize ++ 465 170 302 110 474 88 ++ 3 rows selected (47.469 seconds) 以上内容表示,ID为465、302、474的3本书籍,为累计评分最高的Top3图书。