All About Knows and Dont knows

Posts

Showing posts from July, 2012

MySQL as Hive metadata store

- July 15, 2012

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. By Default Hive uses a Derby database to store its metadata. But in most of the clustered production environments it need to have more stable and shareable store to share the metadata between cluster nodes. For Hive to enable those multiuser , remote access features it has to configure MySQL database as its metadata store. Following will give you a step by step way to configure it successfully. Software versions Hadoop : 1.0.3 Hive : 0.9.0 Step 1 : Create hiv...