文章

hadoop的安装与配置

安装JDK

1.创建工作路径

mkdir /usr/cx

2.解压安装包

tar -zxvf 安装包位置 -C /usr/cx

3.配置环境变量

vi ~/.bashrc

在打开的~/.bashrc文件中写入一下内容

# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
(----------------注:需要在此处增加内容-------------------)
--在这添加--
export JAVA_HOME=/usr/cx/jdk名字版本
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/jre/lib/tools.jar

4.更新环境变量

source ~/.bashrc

5.验证jdk是否配置成功

java -version

主机名配置

1.编辑主机名

vi /etc/sysconfig/network

打开后的文件如下

NETWORKING=yes
HOSTNAME=CentOS6.5     -----将此地方更改为localhost  ----注意若为本地主机则更改为localhost不是则更改为别的地址

更给后输入reboot重启

reboot

2.IP地址与主机名映射文件配置

vi /etc/hosts

打开后的文件如下

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 (注:在此行增加内容)

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

在第一行的ip地址后面添加一个localhost  ----注意若为本地主机则更改为localhost不是则更改为别的地址

3.检测主机名与IP映射是否配置成功

ping localhost -c 4

SElinux安全配置

1.关闭SElinux

通过命令使用vi编辑器打开SElinux配置文件

vi /etc/selinux/config

打开后的文件如下

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive  ------ 将这里的值更改为disabled      (注:需要更改此行内容)
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

2.SElinux配置强制生效

setenforce 0

配置SSH免密码登录

1.生成密钥

输入一下命令生成本机密钥文件

ssh-keygen -t dsa

当出现提示的时候,我们直接按回车即可,默认会将秘钥文件生成到~/.ssh/目录下(由于我们实验所使用的登录用户为root,因此~/.ssh/等同于/root/.ssh/)

通过一下命令查看~/.ssh目录下的文件

ls ~/.ssh

2.密钥分发

把当前节点的公钥文件id_dsa.pub内容输出追加到任意节点的~/.ssh/authorized_keys文件的末尾,则在被添加的节点上便可以免密码登录到当前的节点(由于我们是单节点部署,因此直接追加到当前节点的~/.ssh/authorized_keys文件中即可

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3.验证免密码登录是否配置成功

ssh localhost  ----注意若为本地主机则更改为localhost不是则更改为别的地址

第一次登录的时候,会询问呢是否继续连接,输入yes即可

连接成功后退出连接

exit

安装Hadoop

1.解压安装文件

tar -zxvf Hadoop安装包位置 -C /usr/cx

2.配置Hadoop环境变量

vi ~/.bashrc

打开后的文件如下

# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
export JAVA_HOME=/usr/cx/jdk+版本
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/jre/lib/tools.jar

(----------------在此处增加内容-------------------)

export HADOOP_HOME=/usr/cx/hadoop+版本
export PATH=$PATH:$HADOOP_HOME/bin:$PATH
export PATH=$PATH:$HADOOP_HOME/sbin:$PATH

退出后执行如下命令,更新环境变量

source ~/.bashrc

通过下列命令验证Hadoop环境变量是否配置成功

hadoop

3.编辑Hadoop配置文件

使用vi命令打开hadoop-env.sh配置文件进行编辑:

vi /usr/cx/hadoop版本/etc/hadoop/hadoop-env.sh

打开后的文件如下

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}  ---更改为 export  JAVA_HOME=/usr/cx/jdk+版本      (注:需要对此行内容进行更改,为Hadoop绑定Java运行环境)

# The jsvc implementation to use. Jsvc is required to run secure datanodes
# that bind to privileged ports to provide authentication of data transfer
# protocol.  Jsvc is not required if SASL is configured for authentication of
# data transfer protocol using non-privileged ports.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

使用vi命令打开core-site.xml配置文件进行编辑

vi /usr/cx/hadoop+版本/etc/hadoop/core-site.xml

打开的文件内容如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
(注:需要在此处进行相关内容配置)
/*设置默认的HDFS访问路径*/
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*缓冲区大小:io.file.buffer.size默认是4KB*/
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
/*临时文件夹路径设置*/
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/tmp</value>
</property>
/*设置使用hduser用户可以代理所有主机用户进行任务提交*/
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
/*设置使用hduser用户可以代理所有组用户进行任务提交*/
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>

退出vi编辑器后输入以下vi命令打开yarn-site.xml文件进行配置

vi /usr/cx/hadoop+版本/etc/hadoop/yarn-site.xml

打开后的文件内容如下

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
(注:需要在此处进行相关内容配置)
<!-- Site specific YARN configuration properties -->
/*设置NodeManager上运行的附属服务,需配置成mapreduce_shuffle才可运行MapReduce程序*/
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
/*设置客户端与ResourceManager的通信地址*/
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*设置ApplicationMaster调度器与ResourceManager的通信地址*/
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*设置NodeManager与ResourceManager的通信地址*/
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*设置管理员与ResourceManager的通信地址*/
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/* ResourceManager的Web地址,监控资源调度*/
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
</configuration>

使用下列命令复制mapred-site.xml.template文件并重命名为mapred-site.xml:

cp /usr/cx/hadoop+版本/etc/hadoop/mapred-site.xml.template /usr/cx/hadoop+版本/etc/hadoop/mapred-site.xml

使用vi命令打开mapred-site.xml文件进行配置:

vi /usr/cx/hadoop+版本/etc/hadoop/mapred-site.xml

打开后的文件内容如下

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
(注:需要在此处进行相关内容配置)
/*Hadoop对MapReduce运行框架一共提供了3种实现,在mapred-site.xml中通过"mapreduce.framework.name"这个属性来设置为"classic"、"yarn"或者"local"*/
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
/*MapReduce JobHistory Server地址*/
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*MapReduce JobHistory Server Web UI访问地址*/
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>

</configuration>

执行以下命令创建Hadoop的数据存储目录namenode和datanode

mkdir -p /hdfs/namenode
mkdir -p /hdfs/datanode

使用vi命令打开hdfs-site.xml文件进行配置:

vi /usr/cx/hadoop+版本/etc/hadoop/hdfs-site.xml

打开的文件内容如下

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->

<configuration>
(注:需要在此处进行相关内容配置)
/*配置SecondaryNameNode地址*/
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>  ----注意若为本地主机则更改为localhost不是则更改为别的地址
</property>
/*配置NameNode的数据存储目录,需要与上文创建的目录相对应*/
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hdfs/namenode</value>
</property>
/*配置DataNode的数据存储目录,需要与上文创建的目录相对应*/
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hdfs/datanode</value>
</property>
/*配置数据块副本数*/
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
/*将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LIST FILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode保存的*/
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

使用vi命令打开slaves文件进行配置(要与我们前文设置的主机名相互一致,否则将会引起Hadoop相关进程无法正确启动):

vi /usr/cx/hadoop-2.7.1/etc/hadoop/slaves

打开的文件内容如下

localhost  ----注意若为本地主机则更改为localhost不是则更改为别的地址

若为localhost则不做更改,因为在本地机器而不是在易优云中需要连接到易优云的主机

** 4.格式化HDFS **

通过下列命令格式化HDFS文件系统

hadoop namenode -format

Hadoop运行及测试

** 1.启动Hadoop**

通过下列命令启动Hadoop:

start-all.sh

通过下列命令,查看相应的JVM进程确定Hadoop是否配置及启动成功:

jps

** Web页面测试**

用浏览器输入网址 http://localhost:8080http://localhost50070

License:  CC BY 4.0