MySQL as Hive metadata store
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
By Default Hive uses a Derby database to store its metadata. But in most of the clustered production environments it need to have more stable and shareable store to share the metadata between cluster nodes. For Hive to enable those multiuser , remote access features it has to configure MySQL database as its metadata store. Following will give you a step by step way to configure it successfully.
Software versions
Hadoop : 1.0.3
Hive : 0.9.0
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:8020</value> <!--
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/metastore_db</value> <!-- MySQL database url-->
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value> <!-- MySQL jdbc driver-->
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value> <!-- username of the respective database -->
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value> <!-- password of the respective database -->
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///${HIVE_HOME/lib/hive-builtins-0.9.0.jar</value>
</property>
</configuration>
By Default Hive uses a Derby database to store its metadata. But in most of the clustered production environments it need to have more stable and shareable store to share the metadata between cluster nodes. For Hive to enable those multiuser , remote access features it has to configure MySQL database as its metadata store. Following will give you a step by step way to configure it successfully.
Software versions
Hadoop : 1.0.3
Hive : 0.9.0
Step 1 : Create hive-site.xml in ${HIVE_HOME}/conf directory
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:8020</value> <!--
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. can be found on Hadoop_home/conf/core-site.xml
--></property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/metastore_db</value> <!-- MySQL database url-->
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value> <!-- MySQL jdbc driver-->
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value> <!-- username of the respective database -->
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value> <!-- password of the respective database -->
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///${HIVE_HOME/lib/hive-builtins-0.9.0.jar</value>
</property>
</configuration>
Step 2: Create database in MySQL
mysql > create database metastore_db;
Step 3 : Create Metadata store tables;
mysql > use metastore_db;
Hive metadata related sql can be found on ${hive_home}/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql
mysql> source ${hive_home}/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql;
This will create the database for hive metastore.
mysql> show tables;
+-------------------------+
| Tables_in_metastore_db |
+-------------------------+
| BUCKETING_COLS |
| COLUMNS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| GLOBAL_PRIVS |
| IDXS |
| INDEX_PARAMS |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SORT_COLS |
| TABLE_PARAMS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TYPES |
| TYPE_FIELDS |
+-------------------------+
+-------------------------+
| Tables_in_metastore_db |
+-------------------------+
| BUCKETING_COLS |
| COLUMNS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| GLOBAL_PRIVS |
| IDXS |
| INDEX_PARAMS |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SORT_COLS |
| TABLE_PARAMS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TYPES |
| TYPE_FIELDS |
+-------------------------+
Step 4: Goto hive shell and play with it.
open two or more hive shells and create table from one and that can be access from another hive shell after mysql hive metastore configuration.
Comments
I am happy that you just shared this useful info with
us. Please keep us up to date like this. Thank you for sharing.
my webpage :: sci-fi book blog