MySQL as Hive metadata store

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

By Default Hive uses a Derby database to store its metadata. But in most of the clustered production environments it need to have more stable and shareable store to share the metadata between cluster nodes. For Hive to enable those multiuser , remote access features it has to configure MySQL database as its metadata store. Following will give you a step by step way to configure it successfully. 

Software versions
Hadoop  : 1.0.3
Hive       : 0.9.0

Step 1 : Create hive-site.xml in ${HIVE_HOME}/conf directory



<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at


       http://www.apache.org/licenses/LICENSE-2.0


   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->


<configuration>


 <property>
    <name>fs.default.name</name>
    <value>hdfs://127.0.0.1:8020</value>   <!-- 
The name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation. can be found on Hadoop_home/conf/core-site.xml 
-->

 </property>


 <property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://127.0.0.1:3306/metastore_db</value> <!-- MySQL database url-->
 </property>

 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>    <!-- MySQL jdbc driver-->
  </property>

 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value> <!--  username of the respective database -->
 </property>

 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123</value>  <!--  password of the respective database -->
 </property>

 <property>
    <name>datanucleus.autoCreateSchema</name>
    <value>false</value>
 </property>

 <property>
     <name>datanucleus.fixedDatastore</name>
     <value>true</value>
  </property>


   <property>
     <name>hive.aux.jars.path</name>
     <value>file:///${HIVE_HOME/lib/hive-builtins-0.9.0.jar</value>
   </property>
</configuration>


Step 2: Create database in MySQL

mysql > create database  metastore_db;

Step 3 : Create Metadata store tables;

mysql > use  metastore_db;

Hive metadata related sql can be found on ${hive_home}/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql

mysql> source ${hive_home}/scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql;

This will create the database for hive metastore.

mysql> show tables;
+-------------------------+
| Tables_in_metastore_db  |
+-------------------------+
| BUCKETING_COLS          |
| COLUMNS                 |
| DATABASE_PARAMS         |
| DBS                     |
| DB_PRIVS                |
| GLOBAL_PRIVS            |
| IDXS                    |
| INDEX_PARAMS            |
| PARTITIONS              |
| PARTITION_KEYS          |
| PARTITION_KEY_VALS      |
| PARTITION_PARAMS        |
| PART_COL_PRIVS          |
| PART_PRIVS              |
| ROLES                   |
| ROLE_MAP                |
| SDS                     |
| SD_PARAMS               |
| SEQUENCE_TABLE          |
| SERDES                  |
| SERDE_PARAMS            |
| SORT_COLS               |
| TABLE_PARAMS            |
| TBLS                    |
| TBL_COL_PRIVS           |
| TBL_PRIVS               |
| TYPES                   |
| TYPE_FIELDS             |
+-------------------------+

Step 4: Goto hive shell and play with it.

open two or more hive shells and create table from one and that can be access from another hive shell after mysql hive metastore configuration.

Comments

Anonymous said…
LAS VEGAS, Nov. 15, 2012 -- /PRNewswire/ -- Caesars Entertainment Corporation (NASDAQ: CZR) today announced that Donald Colvin will join the company as Executive Vice President and Chief Financial Officer, subject to required regulatory approvals. In this role, Colvin will be responsible for Caesars' finance functions and report to Gary Loveman, Chairman, President and Chief Executive Officer. [url=http://NORWEGIAN-ONLINE-CASINOS.COM/]casino online
Anonymous said…
It is actually a great and helpful piece of information.
I am happy that you just shared this useful info with
us. Please keep us up to date like this. Thank you for sharing.
my webpage :: sci-fi book blog
Anonymous said…
you must read BXklnXUI [URL=http://www.hermes-birkinprice.weebly.com/]hermes birkin bag price[/URL] , for special offer EBPBGoHh [URL=http://www.hermes-birkinprice.weebly.com/ ] http://www.hermes-birkinprice.weebly.com/ [/URL]

Popular posts from this blog

PostgreSQL bytea and oid

Adding MySQL datasource to JBOSS AS 7

Microservices Architecture with Spring Boot in 15mins