
Hadoop在生产环境中时,都是以分布式环境部署的,如果在开发环境可以使用伪分布式环境部署。两种环境的区别是:
伪分布式:在一个机器上启动Hadoop需要的所有进程进行工作。
分布式:在多台机器上都部署Hadoop,按照集群的规划在不同的机器上启动各自需要的Hadoop进程进行相互协调工作。
环境搭建的步骤如下。
Hadoop安装
安装包下载,目前最新的版本是3.1.3,下载地址见:https://hadoop.apache.org/releases.html
将下载文件解压到~/App/hadoop目录下。
配置SSH登录
启用SSH登录
ssh-keygen localhost -P ""
ssh-copy-id localhost
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.
使用SSH登录
ssh localhost
ssh: Could not resolve hostname loclahost: Name or service not known
[hadoop@hadoop000 ~]$ ssh localhost
Last login: Sun Jan 12 20:31:27 2020 from localhost
配置系统环境变量
打开~/.bashrc文件:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64/jre
export HADOOP_HOME=/home/hadoop/App/hadoop
export PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
保存后,在终端输入下面的命令启用:
source ~/.bashrc
配置文件调整
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop000:8020</value>
<description>配置NameNode的主机名和端口号</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/tmp/dfs/name</value>
<description>设置HDFS元数据文件存放路径</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/tmp/dfs/data</value>
<description>设置HDFS元数据文件存放路径</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>设置HDFS文件副本数</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>设置其他用户执行操作时会提醒没有权限的问题</description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
格式化HDFS系统
hadoop namenode -format
最后输出:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop000/192.168.74.131
************************************************************/
启动HDFS
启动方式一:一次启动所有的进程
$HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [hadoop000]
Starting datanodes
Starting secondary namenodes [hadoop000]
停止所有进程
$HADOOP_HOME/sbin/stop-dfs.sh
Stopping namenodes on [hadoop000]
Stopping datanodes
Stopping secondary namenodes [hadoop000]
启动方式二:单独启动每个进程
hdfs --daemon start namenode
hdfs --daemon start datanode
hdfs --daemon start secondarynamenode
单独停止每个进程
hdfs --daemon stop namenode
hdfs --daemon stop datanode
hdfs --daemon stop secondarynamenode
测试HDFS
添加HDFS目录
hadoop fs -mkdir /helloworld
查看目录是否创建成功
hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2020-01-13 21:12 /helloworld
启动YARN
启动方式一:一次启动所有进程
$HADOOP_HOME/sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
批量停止所有进程
$HADOOP_HOME/sbin/stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
启动方式二:单独启动每个进程
启停ResourceManager
yarn --daemon start resourcemanager
yarn --daemon stop resourcemanager
启停nodemanager
yarn --daemon start nodemanager
yarn --daemon stop nodemanager
运行wordcoun测试案例
Hadoop安装包中自带了wordcount的应用程序,包的位置在:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-{version}.jar。
准备测试文件hello.txt:
hello world hello
hello welcome world
将hello.txt上传到HDFS
hadoop fs -put hello.txt /
2020-01-13 21:33:35,101 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
查看列表:
hadoop fs -ls /
Found 2 items
-rw-r--r-- 1 hadoop supergroup 38 2020-01-13 21:33 /hello.txt
drwxr-xr-x - hadoop supergroup 0 2020-01-13 21:12 /helloworld
提交mapreduce到yarn上运行
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /hello.txt /wc_out/
会输出一段long long log...
执行成功后,查看wordcount统计结果
hadoop fs -text /wc_out/part*
2020-01-13 22:08:49,980 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
hello 3
welcome 1
world 2
后记
1、错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
解决办法:
在终端输入命令:
hadoop classpath
输出内容

将输出内容增加到yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>hadoop classpath 输出内容</value>
</property>
</configuration>
重新启动即可解决。
2、批量启动/停止所有进程
使用start-all.sh和stop-all.sh可以同时启停namenode/datanode/secondarynode/resourcemanager/nodemanager进程。
批量启动
$HADOOP_HOME/sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop000]
Starting datanodes
Starting secondary namenodes [hadoop000]
Starting resourcemanager
Starting nodemanagers
批量停止
$HADOOP_HOME/sbin/stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: Use CTRL-C to abort.
Stopping namenodes on [hadoop000]
Stopping datanodes
Stopping secondary namenodes [hadoop000]
Stopping nodemanagers
Stopping resourcemanager