不积跬步,无以至千里;不积小流,无以成江海。

Dean's blog

  • Join Us on Facebook!
  • Follow Us on Twitter!
  • LinkedIn
  • Subcribe to Our RSS Feed

安装 Hadoop 伪分布式环境

Hadoop在生产环境中时,都是以分布式环境部署的,如果在开发环境可以使用伪分布式环境部署。两种环境的区别是:

伪分布式:在一个机器上启动Hadoop需要的所有进程进行工作。

分布式:在多台机器上都部署Hadoop,按照集群的规划在不同的机器上启动各自需要的Hadoop进程进行相互协调工作。

环境搭建的步骤如下。

Hadoop安装

安装包下载,目前最新的版本是3.1.3,下载地址见:https://hadoop.apache.org/releases.html

将下载文件解压到~/App/hadoop目录下。 

配置SSH登录

启用SSH登录

ssh-keygen localhost -P ""

ssh-copy-id localhost
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.

使用SSH登录

ssh localhost
ssh: Could not resolve hostname loclahost: Name or service not known
[hadoop@hadoop000 ~]$ ssh localhost
Last login: Sun Jan 12 20:31:27 2020 from localhost

配置系统环境变量

打开~/.bashrc文件:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64/jre
export HADOOP_HOME=/home/hadoop/App/hadoop
export PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH

保存后,在终端输入下面的命令启用:

source ~/.bashrc

 配置文件调整

core-site.xml

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://hadoop000:8020</value>
                <description>配置NameNode的主机名和端口号</description>
	</property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/tmp/dfs/name</value>
                <description>设置HDFS元数据文件存放路径</description>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/tmp/dfs/data</value>
                <description>设置HDFS元数据文件存放路径</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
                <description>设置HDFS文件副本数</description>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
                <description>设置其他用户执行操作时会提醒没有权限的问题</description>
	</property>
</configuration>

mapred-site.xml

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>

</configuration>

格式化HDFS系统

hadoop namenode -format

最后输出:

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop000/192.168.74.131
************************************************************/

启动HDFS

启动方式一:一次启动所有的进程

$HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [hadoop000]
Starting datanodes
Starting secondary namenodes [hadoop000]

停止所有进程

$HADOOP_HOME/sbin/stop-dfs.sh
Stopping namenodes on [hadoop000]
Stopping datanodes
Stopping secondary namenodes [hadoop000]

启动方式二:单独启动每个进程

hdfs --daemon start namenode
hdfs --daemon start datanode
hdfs --daemon start secondarynamenode

单独停止每个进程

hdfs --daemon stop namenode
hdfs --daemon stop datanode
hdfs --daemon stop secondarynamenode

测试HDFS

添加HDFS目录

hadoop fs -mkdir /helloworld

查看目录是否创建成功

hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2020-01-13 21:12 /helloworld

启动YARN

启动方式一:一次启动所有进程

$HADOOP_HOME/sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers

批量停止所有进程

$HADOOP_HOME/sbin/stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager

启动方式二:单独启动每个进程

启停ResourceManager

yarn --daemon start resourcemanager
yarn --daemon stop resourcemanager

启停nodemanager

yarn --daemon start nodemanager
yarn --daemon stop nodemanager

运行wordcoun测试案例

Hadoop安装包中自带了wordcount的应用程序,包的位置在:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-{version}.jar。

准备测试文件hello.txt:

hello world hello
hello welcome world

将hello.txt上传到HDFS

hadoop fs -put hello.txt /
2020-01-13 21:33:35,101 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

 查看列表:

hadoop fs -ls /
Found 2 items
-rw-r--r-- 1 hadoop supergroup 38 2020-01-13 21:33 /hello.txt
drwxr-xr-x - hadoop supergroup 0 2020-01-13 21:12 /helloworld

提交mapreduce到yarn上运行

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /hello.txt /wc_out/
会输出一段long long log...

执行成功后,查看wordcount统计结果

hadoop fs -text /wc_out/part*
2020-01-13 22:08:49,980 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
hello 3
welcome 1
world 2

 

后记

1、错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

解决办法:

在终端输入命令:

hadoop classpath

输出内容

将输出内容增加到yarn-site.xml:

<configuration>

<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.application.classpath</name>
		<value>hadoop classpath 输出内容</value>
	</property>
</configuration>

重新启动即可解决。

2、批量启动/停止所有进程

使用start-all.sh和stop-all.sh可以同时启停namenode/datanode/secondarynode/resourcemanager/nodemanager进程。

批量启动

$HADOOP_HOME/sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop000]
Starting datanodes
Starting secondary namenodes [hadoop000]
Starting resourcemanager
Starting nodemanagers

批量停止

$HADOOP_HOME/sbin/stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: Use CTRL-C to abort. Stopping namenodes on [hadoop000]
Stopping datanodes
Stopping secondary namenodes [hadoop000]
Stopping nodemanagers
Stopping resourcemanager
不允许评论
粤ICP备17049187号-1