- Which specific Hadoop projects do you integrate with (HDFS, Hive, HBase, Pig, Sqoop, and many others)?
- Do you work with the community edition software or with commercial distributions from MapR, EMC/Greenplum, Hortonworks, or Cloudera? Have these vendors certified your Hadoop implementations?
- Are you querying Hadoop data directly from your BI tools (reports, dashboards) or are you ingesting Hadoop data into your own DBMS? If latter
a. Are you selecting Hadoop result sets using Hive?
b. Are you ingesting Hadoop data using Sqoop?
c. Is your ETL generating and pushing down Map Reduce jobs to Hadoop? Are you generating Pig scripts?
- Are you querying Hadoop data via SQL?
a. If yes, who provides relational structures? Hive? If Hive,
b. Who translates HiveQL to SQL?
c. Who provides transactional controls like multi phase comits and others?
- Do you need Hive to provide relational structures or can you query HDFS data directly?
- Are you querying Hadoop data via MDX? If yes, please let me know what tools are you using, as I am not aware of any.
- Can you access NoSQL Hadoop data? Which NoSQL dbms? HBase, Casandra? Since your queries are mostly based on SQL or MDX, how do you access these key value stores? If yes, please let me know what use cases do you have for BI using NoSQL, as I am not aware of any.
- Do you have a capability to explore HDFS data without a data model? We call this discovery, exploration
- As Hadoop MapReduce jobs are running who provides job controls? Do you integrate with Hadoop Oozie, Ambari, Chukwa, Zookeeper?
- Can you join Hadoop data with other relational or multidimensional data in federated queries? Is it a pass through federation? Or do you persist the results? Where? In-memory? In Hadoop? In your own server?
Find your next job with computerworld UK jobs