大数据集群配置 一、组件版本
Name
Version
Centos
7.9
Hadoop
2.7.7
Spark
2.1.1
Flink
1.10.2
Flume
1.7.0
Hive
2.3.4
Zookeeper
3.4.10
Sqoop
1.4.7
二、JDK 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf jdk-8u161-linux-x64.tar.gz -C /opt/module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv jdk1.8.0_161/ jdk
2、配置环境变量/etc/profile
1 2 3 4 # JAVA_HOME export JAVA_HOME=/opt/module/jdk export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
激活环境变量
1 [root@master module]# source /etc/profile
3、分发到另外两个节点 1 2 3 4 5 6 7 8 [root@master module]# scp -r jdk/ slave1:/opt/module/ [root@master module]# scp -r jdk/ slave2:/opt/module/ # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
三、Hadoop完全分布式部署 1、解压 1 2 3 4 5 6 7 [root@master software]# tar -zxvf hadoop-2.7.7.tar.gz -C /opt/module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv hadoop-2.7.7/ hadoop
2、配置环境变量/etc/profile
1 2 3 4 # HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 2 3 4 5 6 7 8 9 [root@master module]# cd hadoop/etc/hadoop [root@master hadoop]# ls # 我们需要配置下面几个文件 hadoop-env.sh slaves hdfs-site.xml core-site.xml mapred-site.xml yarn-site.xml
hadoop-env.sh
修改为自己的jdk安装路径
1 2 3 # The java implementation to use. export JAVA_HOME=/opt/module/jdk
slaves
1 2 3 4 # 要作为Datanode的节点 master slave1 slave2
hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 <property > <name > dfs.replication</name > <value > 3</value > </property > <property > <name > dfs.namenode.secondary.http-address</name > <value > slave1:50090</value > </property >
core-site.xml
1 2 3 4 5 6 7 8 9 <property > <name > fs.defaultFS</name > <value > hdfs://master:9000</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /opt/module/hadoop/tmp</value > </property >
1 [root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
mapred-site.xml
1 2 3 4 <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property >
yarn-site.xml
1 2 3 4 5 6 7 8 <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.resourcemanager.hostname</name > <value > master</value > </property >
4、分发到另外两个节点 1 2 3 4 5 6 7 8 [root@master module]# scp -r hadoop/ slave1:/opt/module/ [root@master module]# scp -r hadoop/ slave2:/opt/module/ # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
5、启动hadoop集群 格式化namenode
1 [root@master module]# hdfs namenode -format
看到successfully formatted字样说明格式化
启动hdfs和yarn
1 2 3 [root@master module]# start-dfs.sh [root@master module]# start-yarn.sh
分别在三个节点执行jps,查看进程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [root@master hadoop]# jps 5618 NameNode 5715 DataNode 6389 Jps 6221 NodeManager 5998 ResourceManager [root@slave1 hadoop]# jps 2727 Jps 2457 DataNode 2540 SecondaryNameNode 2623 NodeManager [root@slave2 hadoop]# jps 2241 DataNode 2449 Jps 2345 NodeManager
四、Hive 1、解压 1 2 3 4 5 6 [root@master software]# tar -zxvf apache-hive-2.3.4-bin.tar.gz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv apache-hive-2.3.4-bin/ hive
2、配置环境变量/etc/profile
1 2 3 4 # HIVE_HOME export HIVE_HOME=/opt/module/hive export PATH=$PATH:$HIVE_HOME/bin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 2 3 4 [root@master hive]# cd conf/ # 新建一个hive-site.xml [root@master conf]# vi hive-site.xml
hive-site.xml
添加以下内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > javax.jdo.option.ConnectionDriver</name > <value > com.mysql.jdbc.Driver</value > </property > <property > <name > javax.jdo.option.ConnectionURL</name > <value > jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true& useSSL=false</value > </property > <property > <name > javax.jdo.option.ConnectionUserName</name > <value > root</value > </property > <property > <name > javax.jdo.option.ConnectionPassword</name > <value > 123456</value > </property > <property > <name > hive.cli.print.header</name > <value > true</value > </property > <property > <name > hive.cli.print.current.db</name > <value > true</value > </property > </configuration >
1 [root@master conf]# cp hive-env.sh.template hive-env.sh
hive-env.sh
修改以下内容
1 2 3 4 5 6 7 HADOOP_HOME=/opt/module/hadoop export HIVE_CONF_DIR=/opt/module/hive/conf
hive-log4j2.properties
1 2 [root@master conf]# cp hive-log4j2.properties.template hive-log4j2.properties
1 2 3 4 5 6 property.hive.log.level = INFO property.hive.root.logger = DRFA property.hive.log.dir = /opt/module/hive/logs property.hive.log.file = hive.log property.hive.perflogger.log.level = INFO
4、将mysql驱动复制到hive/lib下 1 [root@master software]# mv mysql-connector-java-5.1.46-bin.jar /opt/module/hive/lib/
5、初始化元数据库 1 2 [root@master hive]# schematool -dbType mysql -initSchema
6、启动Hive 1 2 3 4 5 6 7 8 9 10 11 [root@master hive]# hive # 命令执行成功就说明安装成功了 hive (default)> show databases; OK database_name default Time taken: 3.57 seconds, Fetched: 1 row(s) hive (default)>
五、Zookeeper 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf zookeeper-3.4.10.tar.gz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv zookeeper-3.4.10/ zookeeper
2、配置环境变量/etc/profile
1 2 3 4 # ZK_HOME export ZK_HOME=/opt/module/zookeeper export PATH=$PATH:$ZK_HOME/bin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 [root@master conf]# cp zoo_sample.cfg zoo.cfg
zoo.cfg
1 2 3 4 5 6 dataDir =/opt/module/zookeeper/zkData server.1 =master:2888:3888 server.2 =slave1:2888:3888 server.3 =slave2:2888:3888
1 2 3 4 5 # 新建zkData [root@master zookeeper]# mkdir zkData # 将对应id 写入myid [root@master zkData]# echo 1 >> myid
4、分发到另外两个节点 1 2 3 4 5 6 7 8 9 10 11 12 [root@master module]# scp -r zookeeper/ slave1:/opt/module/ [root@master module]# scp -r zookeeper/ slave2:/opt/module/ # 在myid文件中写入对应id # 例如slave1为2 # slave2为3 # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
5、启动zk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [root@master module]# zkServer.sh start [root@slave1 module]# zkServer.sh start [root@slave2 module]# zkServer.sh start # 分别查看状态 [root@master module]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg Mode: follower [root@slave1 module]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg Mode: leader [root@slave2 module]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg Mode: follower
六、Kafka 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf kafka_2.11-2.0.0.tgz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv kafka_2.11-2.0.0/ kafka
2、配置环境变量/etc/profile
1 2 3 4 # KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 server.properties
1 2 3 4 5 6 7 8 9 10 11 delete.topic.enable =true broker.id =0 # 这个值必须是唯一的 zookeeper.connect =master:2181,slave1:2181,slave2:2181 log.dirs =/opt/module/kafka/logs
4、分发到另外两个节点 1 2 3 4 5 6 7 8 9 10 [root@master module]# scp -r kafka/ slave1:/opt/module/ [root@master module]# scp -r kafka/ slave2:/opt/module/ # 修改配置文件中broker.id值 # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
5、启动kafka 1 2 3 4 5 6 7 8 9 10 # 启动kafka前请先启动zookeeper # 分别在三个节点启动kafka [root@master kafka]# kafka-server-start.sh config/server.properties [root@slave1 kafka]# kafka-server-start.sh config/server.properties [root@slave2 kafka]# kafka-server-start.sh config/server.properties # 创建一个Topic试试 [root@master ~]# kafka-topics.sh --create --zookeeper master:2181,slave1:2181,slave2:2181 --replication-factor 3 --partitions 1 --topic xiaojia Created topic "xiaojia".
七、Flume 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf apache-flume-1.7.0-bin.tar.gz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv apache-flume-1.7.0-bin/ flume
2、配置环境变量/etc/profile
1 2 3 # FLUME_HOME export FLUME_HOME=/opt/module/flume export PATH=$PATH:$FLUME_HOME/bin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 flume-env.sh
1 2 3 4 export JAVA_HOME=/opt/module/jdk
4、写个flume脚本试试看 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [root@master conf]# vi netcat_logger.conf a1.sources=r1 a1.channels=c1 a1.sinks=k1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=44444 a1.channels.c1.type=memory a1.channels.c1.capacity=1000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type=logger a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [root@master flume]# flume-ng agent -n a1 -f conf/socket_logger.conf -c conf/ -Dflume.root.logger=INFO,console # 监听44444端口 [root@master ~]# telnet localhost 44444 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. haha OK 2021-10-22 18:03:05,450 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444] 2021-10-22 18:03:22,300 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 61 68 61 0D haha. }
八、Sqoop 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop
2、配置环境变量/etc/profile
1 2 3 # SQOOP_HOME export SQOOP_HOME=/opt/module/sqoop export PATH=$PATH:$SQOOP_HOME/bin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 [root@master conf]# cp sqoop-env-template.sh sqoop-env.sh
将mysql驱动文件拷贝到lib下
1 2 [root@master module]# cp hive/lib/mysql-connector-java-5.1.46-bin.jar sqoop/lib/
sqoop-env.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 export HADOOP_COMMON_HOME=/opt/module/hadoop export HADOOP_MAPRED_HOME=/opt/module/hadoop export HIVE_HOME=/opt/module/hive export ZOOCFGDIR=/opt/module/zookeeper ~
4、测试一下 1 2 [root@master module]# sqoop list-databases --connect jdbc:mysql://localhost:3306 --username root --password 123456
九、Spark 1、解压 1 2 3 4 5 [root@master software]# tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz -C ../module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv spark-2.1.1-bin-hadoop2.7/ spark
2、配置环境变量/etc/profile
1 2 3 4 # SPARK_HOME export SPARK_HOME=/opt/module/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 2 3 [root@master conf]# cp spark-env.sh.template spark-env.sh [root@master conf]# cp slaves.template slaves
slaves
spark-env.sh
1 2 3 4 5 6 # 添加以下内容 export HADOOP_CONF_DIR=/opt/module/hadoop/etc/hadoop export HADOOP_HOME=/opt/module/hadoop export JAVA_HOME=/opt/module/jdk export SPARK_MASTER_HOST=master
4、分发到另外两个节点 1 2 3 4 5 6 7 8 9 [root@master module]# scp -r spark/ slave1:/opt/module/ [root@master module]# scp -r spark/ slave2:/opt/module/ # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
5、启动spark 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [root@master module]# start-master.sh [root@master module]# start-slaves.sh [root@master module]# jps 5618 NameNode 5715 DataNode 8598 Master 8695 Jps 7356 QuorumPeerMain 6221 NodeManager 7437 Kafka 5998 ResourceManager [root@slave1 kafka]# jps 3488 Worker 3539 Jps 2904 QuorumPeerMain 2457 DataNode 2540 SecondaryNameNode 2623 NodeManager [root@slave2 kafka]# jps 2241 DataNode 3239 Worker 2345 NodeManager 3290 Jps 2685 QuorumPeerMain
十、Flink 1、解压 1 2 3 4 5 6 [root@master software]# tar -zxvf flink-1.10.2-bin-scala_2.11.tgz -C /opt/module/ # 修改目录名称 [root@master software]# cd ../module/ [root@master module]# mv flink-1.10.2/ flink
2、配置环境变量/etc/profile
1 2 3 4 # FLINK_HOME export SPARK_HOME=/opt/module/flink export PATH=$PATH:$SPARK_HOME/bin:$FLINK_HOME/sbin
激活环境变量
1 [root@master module]# source /etc/profile
3、配置文件 1 2 3 [root@master conf]# cp flink-env.sh.template flink-env.sh [root@master conf]# cp slaves.template slaves
slaves
flink-env.sh
1 2 3 jobmanager.rpc.address: master parallelism.default: 4
4、分发到另外两个节点 1 2 3 4 5 6 7 8 9 [root@master module]# scp -r flink/ slave1:/opt/module/ [root@master module]# scp -r flink/ slave2:/opt/module/ # 分发配置文件到另外两个节点 [root@master module]# scp /etc/profile slave1:/etc [root@master module]# scp /etc/profile slave2:/etc # 别忘了最后source 一下 [root@slave1 module]# source /etc/profile [root@slave2 module]# source /etc/profile
5、启动flink 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 [root@master flink]# start-cluster.sh [root@master module]# jps 5618 NameNode 5715 DataNode 9908 StandaloneSessionClusterEntrypoint 8598 Master 9992 Jps 7356 QuorumPeerMain 6221 NodeManager 7437 Kafka 5998 ResourceManager [root@slave1 kafka]# jps 3488 Worker 3922 TaskManagerRunner 2904 QuorumPeerMain 2457 DataNode 3993 Jps 2540 SecondaryNameNode 2623 NodeManager [root@slave2 kafka]# jps 2241 DataNode 3766 Jps 3239 Worker 3703 TaskManagerRunner 2345 NodeManager 2685 QuorumPeerMain