Pivotal rolls out Hadoop distro update, new query optimizer

Comments

At EMC World in Las Vegas this week, Pivotal rolled out enhancements to its big data suite, including major component updates to its Pivotal HD Hadoop distribution and up to 100x performance upgrades to Pivotal Greenplum Database and Pivotal HAWQ.

Just a few months ago, Pivotal announced that it would open source its entire big data stack: the Pivotal HD distribution, Pivotal Greenplum Database, Pivotal GemFire real-time distributed data store, Pivotal SQLFire (a SQL layer for the real-time distributed data store), Pivotal GemFire XD (in-memory SQL over HDFS) and the Pivotal HAWQ parallel query engine over HDFS. These updates, says Michael Cucchi, senior director of Outbound Product at Pivotal, underscore Pivotal's continued commitment to supporting that open source strategy.

At the heart of the performance upgrades is the new Pivotal Query Optimizer, an advanced cost-based optimizer for big data, previously codenamed "Orca." The optimizer allows users to make use of full ANSI SQL compliant queries against Hadoop.

While some basic queries will execute faster in a standard planner, Cucchi says, the Pivotal Query Optimizer is the world's most advanced cost-based analyzer when it comes to complex big data optimizations and will help customers manage ballooning data sets driven by mobile, cloud, social and the Internet of Things.

"We've advanced our analytical capabilities and our performance," Cucchi says. "And also our configurability. We need to be able to very granularly control that optimizer."

He notes that users can configure Pivotal Query Optimizer down to the query level.

While currently available as part of its Pivotal Big Data Suite subscription, Cucchi notes the Pivotal Query Optimizer will also be released to open source, probably in the next year or two.

The new version of Pivotal HD is the first version of Pivotal's Hadoop distribution that is based on an Open Data Platform (ODP) core. Pivotal, together with a host of other vendors, systems integrators and end users shepherded ODP into existence in February of this year in an effort to reduce the amount of complexity surrounding the Hadoop and big data environment.

ODP is a big data kernel in the form of a tested reference core of Apache Hadoop, Apache Ambari and related Apache source artifacts. The idea was to create a "test once, use anywhere" core platform that would simplify upstream and downstream qualification efforts and eliminate growing fragmentation in the Hadoop market. Applications and tools built on the ODP kernel should integrate with and run on any compliant system. Since the launch of ODP, all major Hadoop distribution vendors have joined the effort.

The new version of Pivotal features the ODP core, which consists of Apache Hadoop 2.6 and Apache Ambari. It updates existing Hadoop components for scripting and query (Apache Pig and Apache Hive), non-relational database (Apache HBase) and basic coordination and workflow orchestration (Apache Zookeeper and Apache Oozie). It adds the Apache Spark core and machine learning library, additional Hadoop components for improved security (Apache Ranger (incubating) and Apache Knox), monitoring (Nagios, Ganglia in addition to Apache Ambari) and data processing (Apache Tez).

Follow Thor on Google+