How do successful businesses collect, use and store data? How do data architects configure large data sets from vastly different datasources in anticipation of the ever changing business environment?
Tuesday, August 2, 2016
Notes on Nutch crawler with indexing
Understanding Nutch with HBase: the HBase schema.
http://svn.apache.org/repos/asf/nutch/tags/release-2.3.1/conf/gora-hbase-mapping.xml
Data synchronization between rdbms and hive using sqoop
Hive can be a great backup environment for RDBMS data or simply as a data warehouse. Hive provides a great architecture for bulk OLAP data. Hive is also a great choice for data charting workspace where hadoop technologies can be employed to crunch data.
Because many organizations still use rdbms and sql technology in their data warehouse, it is easier to export data in hive to perform bulk processing. Sometimes data dumping and reimporting into hive is inefficient therefore a data synchronization strategy using jdbc technology is more logical. Sqoop is designed to replicate data between different databases by speaking the same 'jdbc language'.
Lets see how sqoop works between sql server and hive.
Because many organizations still use rdbms and sql technology in their data warehouse, it is easier to export data in hive to perform bulk processing. Sometimes data dumping and reimporting into hive is inefficient therefore a data synchronization strategy using jdbc technology is more logical. Sqoop is designed to replicate data between different databases by speaking the same 'jdbc language'.
Lets see how sqoop works between sql server and hive.
Subscribe to:
Comments (Atom)