CIO

10 Hot Big Data Startups to Watch

The Big Data market is heating up, and unlike some overhyped trends (social media), it's pretty easy to pinpoint ROI with these tools.

[ The Rise of the Data Visualization Expert ]

When we put out calls for nominees through the Story Source Newsletter, HARO, Twitter, and other channels, we received more than 100 recommendations. Usually, when we get that many, a good chunk of them can be dismissed out of hand. Some are clearly science projects; others have zero funding, no management pedigree and a dubious value proposition, while a few are clearly the product of malarial hallucinations.

Not so this time. Very few of the startups we looked at were whacky long shots. Most were decent ideas, backed by real VC money and seasoned management teams.

Recently, we've changed how the final 10 startups to watch are selected. First, a big list of nominees on Startup50.com are compiled. (Check out the Big Data list of 42 nominees here.) Then, we let readers vote on their favorites.

This time around another wrinkle was introduced. Startups left off the big list can challenge specific startups on it, trying to steal their spot away. If the challenge is deemed to have merit, we'll set up a separate vote. Sqrrl and DataStax both fought their way onto the list of nominees through challenges.

All told, more than 11,000 people voted for their favorite Big Data startups, with Cloudant winning, SiSense coming in a close second and SumAll finishing a strong third.

This time around we weighed voting more heavily than normal. Usually, voting is given a weight of about 30 percent, and then we turn to other factors, such as funding, the pedigree of the management team and the viability of the startup's roadmap.

However, the entire list of 42 Big Data nominees (plus several others that initially escaped our notice) is ridiculously strong.

Take Xplenty, for instance. They finished eighth in voting, but we considered bumping them because the startup is only a year old, hasn't raised significant funding and doesn't yet have big-name customers. All marks against it.

Balancing those negatives is the fact that voting does matter, and roundups like this are best if they include a mix of top startups well on the way to reaching their potential along with some startups that are pretty much all potential right now.

As we started looking at potential replacements, we realized that any of the top 25 or so vote-getters could make solid arguments for inclusion.

Frankly, we could have slotted Platfora, Cloudmeter, CloudPhysics, Sqrrl, RainStor, Rocket Fuel or several others in Xplenty's place. Big Data startups, unlike some other spaces, have real substance to them. They are building viable products that target real-world pain points (pain points businesses are willing to pay to solve--today), and most Big Data startups are well-funded with solid management teams. It is just a really strong space.

So, Xplenty stuck. Yes, they're more raw potential than giant killer at this stage, but their coding-free Hadoop Big Data service is simple, easy to use and affordable for even the mid-market.

Now, it's your turn. Vote for your favorite Big Data startups, and we'll rank the top 10 and crown an overall winner.

1. Cloudant

What they do: Provide Databases-as-a-Service.

Headquarters: Boston, Mass.

CEO: Derek Schoettle. Before Cloudant, he was vice president, CME Sales at Vertica Systems, which was acquired by HP in 2011.

Founded: 2008

Funding: Cloudant just closed a $12 million second round of funding in May. Devonshire Investors, Rackspace Hosting, and Toba Capital led the round, which included participation from current investors Avalon Ventures, In-Q-Tel, and Samsung Venture Investment Corporation. Cloudant has raised $16 million to date.

Why they're on this list: They finished first in Startup50.com voting, just upped their funding to $16 million and now claim more than 12,000 customers. According to Cloudant, the problem with databases is that if an application is successful, organizations often outgrow them. This is commonly referred to as the "App Store Effect." Even "scale-out" distributed databases and caches are limited by cluster hardware and partitioning schemes.

The Cloudant Database-as-a-Service (DBaaS) is a managed service purpose-built for data-driven Web and mobile application developers who want to handle Big Data workloads without ever having to deal with distributed database design, sharding, partitioning, backup, etc. Cloudant works by storing, analyzing, and distributing application data across a global network of data centers, delivering low-latency, highly available data layer performance, and pushing dynamic data closer to the edge.

Market Potential and Competitive Landscape:According to Market Research Media, the worldwide NoSQL market is expected to reach $3.4 billion by 2018, with a compound annual growth rate (CAGR) of 21 percent between 2013 and 2018. The NoSQL market is expected to generate $14 billion in revenues over the period 2013-2018.

Cloudant is rather uniquely positioned at the moment. While Oracle and MySQL have been available on AWS, there aren't that many NoSQL DBaaS offerings out there. Joyent rolled one out earlier this year, and AWS's DynamoDB is in beta.

Cloudant claims a customer base of more than 12,000 multi-tenant customers, including Samsung, DHL, Monsanto, Salesforce.com (Heroku), SourceFire, Hot Head Games, Flurry, AppAdvice, and LiveMocha.

2. Cloudera

What they do: Provide a Hadoop-based Big Data platform.

Headquarters: Palo Alto, Calif,

CEO: Mike Olson, who was formerly CEO of Sleepycat Software, an embedded database company that was acquired by Oracle in 2006. After the acquisition, Olson spent two years at Oracle as VP for Embedded Technologies.

Founded: 2008

Funding: Cloudera has raised $140 million in venture capital to date. Its investors include Accel Partners Greylock Partners, Ignition Partners, In-Q-Tel and Meritech Capital Partners.

Why they're on this list: Big Data is hot, and Cloudera pioneered the Hadoop-based Big Data space. Moreover, they're sitting on a giant pile of VC cash and have a top-notch management team.

Frankly, we thought long and hard about leaving Cloudera off this list -- not because they don't belong, but because they've been doing well enough for long enough that we're not sure that the label "startup" really fits anymore.

However, they did well in Startup50.com voting (finishing in the top 10), and they pretty much proved the business case for Hadoop. Cloudera lets users query all of their structured and unstructured data to gain a view beyond what's available from relational databases. Cloudera recently released Impala, a new open-source interactive query engine for Hadoop that enables interactive querying on massive data sets in real time.

Market Potential and Competitive Landscape:Gartner forecasts that Big Data will drive $34 billion in IT spending this year, increasing to $232 billion by 2016. Gartner also predicts that by 2015 65 percent of packaged analytic applications with "advanced analytics" will include embedded Hadoop.

Cloudera clearly has first-mover advantage, but competitors include EMC, Pivotal, Hortonworks and MapR. Intel just entered the fray, as well. Customers include CBS Interactive, eBay, Expedia, Monsanto and Samsung.

3. LucidWorks

What they do: Provide enterprise search tools to help navigate Big Data.

Headquarters: Redwood City, Calif.

CEO: Paul Doscher. Prior to LucidWorks, he was CEO of Exalead, an enterprise search company. Back in 2003, he became CEO and one of the principal founders for JasperSoft, an open-source business intelligence platform provider, and he later served as EVP of worldwide field operations for VMware.

Founded: 2008

Funding: Total venture funding stands at $16 million (from Granite Ventures, Walden International, In-Q-Tel and Shasta Ventures).

Why they're on this list: IT organizations are beginning to collect orders of magnitude more data than they gathered even a few years ago. Collecting data is one thing; however, making actual use of it is another. Enterprise search clearly has a role to play in terms of making Big Data accessible. The challenge is doing it in a way that other applications can utilize.

LucidWorks Search is designed to help developers build highly secure, scalable and cost-effective search applications, while providing a simple and comprehensive way to access open-source search technologies.

LucidWorks Big Data is an application development platform that integrates search capabilities into the foundational layer of Big Data implementations. The product is built on a foundation of key Apache open-source projects and enables organizations to quickly discover, access and evaluate large volumes of structured and unstructured data. LucidWorks Big Data and LucidWorks Search work hand-in-hand to accelerate and simplify the building of highly secure, scalable and cost-effective search applications.

Market Potential and Competitive Landscape:According to WikiBon, the total Big Data market reached $11.4 billion in 2012, ahead of Wikibon's 2011 forecast. WikiBon believes that the market will reach $18.1 billion in 2013, an annual growth of 61 percent. This puts it on pace to exceed $47 billion by 2017. That translates to a 31 percent compound annual growth rate over the five year period 2012-2017.

Competitive Landscape: Competitors include Endeca, Autonomy and Elasticsearch.

ADP is a named customer.

4. MapR Technologies

What they do: Provide a Hadoop/NoSQL Big Data platform.

Headquarters: San Jose, Calif.

CEO: John Schroeder, who previously served as CEO of Calista Technologies, which was acquired by Microsoft. Before that, he was CEO of Rainfinity, which EMC purchased.

Founded: 2009

Funding: In March 2013, MapR Technologies raised $30 million in VC funding in a round led by new investor Mayfield Fund, with participation from existing investors Lightspeed Venture Partners, NEA and Redpoint Ventures. This brings total funding to $59 million.

Why they're on this list: MapR finished in the top 10 in Startup50.com voting, has impressive VC backing and a CEO who knows how to see startups through to successful exits.

MapR's platform merges Hadoop, NoSQL, database and streaming applications into one unified Big Data platform. Anyone with even a cursory knowledge of Hadoop knows that speed isn't one of its claims to fame. MapR claims to have overcome the speed obstacle, while also offering such enterprise-grade features as "high availability, business continuity, real-time streaming, standard file-based access through NFS, full database access through ODBC, and support for mission-critical SLAs."

Competitive Landscape: Competitors include Cloudera, EMC, Pivotal, Hortonworks, and Intel.

Named customers include Ancestry, Rebicon and comScore.

5. ParStream

What they do: Develop database technologies to enable "Fast Data."

Headquarters: Redwood City, Calif.

CEO: Mike Hummel, who previously co-founded Empulse, a portal solutions and software consulting company now specializing in Web 2.0 projects.

Founded: 2008

Funding: ParStream has secured $5.6 million in Series A funding from Khosla, Baker Capital, CrunchFund, Tola Capital and Data Collective.

Why they're on this list: Traditional databases just weren't designed for Big-Data-scale analytics, and they certainly aren't able to deliver those insights in real time. Traditional databases analyze data sequentially and aren't able to take advantage of advances in multi-core processing.

At CTIA 2013 CEO Michael Hummel noted that memory is a big bottleneck for traditional databases. Meanwhile, the Big Data database darling, Hadoop, has trouble scaling efficiently.

Hummel argues that ParStream's database was purpose-built for speed. Whereas many database platforms exist for the purpose of storing and analyzing large quantities of data, ParStream was designed to deliver faster response times and to reduce Big Data storage infrastructure costs in the process.

ParStream enables "Fast Data" by using a distributed architecture that processes data in parallel. ParStream was specifically engineered to deliver both big data and fast data, enabled by a unique High Performance Compressed Index (HPCI). This removes the extra step and time required for decompression of data.

ParStream claims to provide sub-second response times on billions of data records while continuously importing new data.

Market Potential and Competitive Landscape:Analysts see the Big Data market reaching anywhere from $18 billion (WikiBon) to $34 billion (Gartner) in 2013. Competitors include SAP HANA, Apache platforms and Vertica Systems (HP). Searchmetrics is a named customer, but Hummel assured me that more will be going on the record soon.

6. ScaleArc

What they do: Provide database infrastructure software that simplifies the way database environments are deployed and managed.

Headquarters: Santa Clara, Calif.

CEO: Varun Singh. Singh previously helped create two of India's top online technology brands, TechTree and Tech2. He also currently hosts technology shows on CNBC, TV18, CNN-IBN and ET Now.

Founded: 2009

Funding: ScaleArc is backed by $18 million from Accel Partners, Trinity Ventures, Nexus Venture Partners, and angel investors.

Why they're on this list: ScaleArc finished sixth in Startup50.com voting. They've raised serious VC money and have a long string of customer wins.

At Interop last month Singh pointed out that the growth of online and mobile applications is straining traditional database infrastructures. For companies doing business online, application availability and performance are key determinants of the customer experience and, ultimately, revenue.

However, companies struggle with the complex challenge of growing their database infrastructures to handle increasing demand, without negatively impacting the customer experience, or consuming resources that may be better used elsewhere. Traditional SQL environments are bogged down by an increasing volume of database queries from a growing number of applications that need access to structured data -- leading to poor application performance and system outages.

And the problem is even worse for mobile applications, as performance takes an even bigger hit with increased latency.

Singh argues that companies need a way to optimize SQL query traffic without extensive modifications to existing applications or databases. To improve performance, they need to offload existing databases without investing in costly new infrastructure. Finally, they need full visibility into SQL traffic to more efficiently troubleshoot and resolve issues before they become major problems that impact revenue.

ScaleArc's flagship product, iDB, is software that inserts transparently between applications and databases, requiring no modifications to applications or databases. ScaleArc claims that it can be deployed in about 15 minutes. Then, users gain visibility into all database traffic with granular real-time SQL analytics.

iDB provides instant scalability and higher availability for databases with dynamic clustering, load balancing and sharding capabilities, and it provides a transparent SQL-NoSQL hybrid caching engine, which lets any application use a NoSQL cache without any code changes or drivers.

Market Potential and Competitive Landscape:ScaleArc estimates that this market space is worth more than $2 billion (they're far more conservative than most analysts).

Competitors include ScaleBase and ParElastic.

Existing customers include Demand Media, Disney UTV, KIXEYE, Sazze (dealspl.us), Flipkart, Weather Decision Technologies and others.

7. SiSense

What they do: Provide Big Data analytics platforms.

Headquarters: Redwood City, Calif., with an R&D Center in Tel Aviv, Israel

CEO: Amit Bendov. He was formerly CMO of Panaya and SVP of Worldwide Marketing at ClickSoftware.

Founded: 2010 (they were technically founded in 2004, but were really just a side project for the five founders until 2010, and their official launch was 2012)

Funding: In April, SiSense closed a $10 million Series B round of funding led by Battery Ventures with participation from Opus Capital and Genesis Partners. A $4 million Series A round was secured in 2010.

Why they're on this list: SiSense finished second in Startup50.com voting, has solid VC backing and a good-sized list of customers.

According to SiSense, traditional big data analytics solutions are like battleships: They're expensive, complicated to operate, and are actually overkill for most businesses, which just don't need that much processing. The typical business does not need to analyze petabytes of data. Rather, they'd be happy gaining insights on terabytes of data, but that's either too expensive or forces them to rely on in-memory solutions, which cannot later scale to handle massive amounts of data.

SiSense Prism is built to offer big data analytics technology to businesses of all sizes. With no coding or scripting required, business analysts can analyze data themselves, without having to draw IT or data scientists into the process. SiSense claims that Prism allows non-technical users to analyze 100 times more data than current in-memory analytics solutions, and it does so 10 times faster. There's no need to set up complex data warehouse systems or OLAP cubes.

Prism is powered by SiSense's Elasticube technology, which features a columnar data store, strong data compression, parallel processing, and advanced query optimization to offer analytical processing power previously available only with high-end solutions.

Market Potential and Competitive Landscape:Wikibon believes the Big Data market will exceed $47 billion by 2017. SiSense competitors include Tableau, QlikView and SAP HANA.

Customers include NASA, ESPN, Target, eBay, fiverr, Online Commerce Group, Plastic Jungle and Magellan Vacations.

8. Skytree

What they do: Develop machine-learning-based platforms for Big Data analytics.

Headquarters: San Jose, Calif.

CEO: Martin Hack, who previously served as a director of marketing for GreenBorder Technologies (acquired by Google) and as a product line manager for SonicWALL.

Founded: 2012

Funding: Skytree just secured (April 2013) $18 million in Series A funding. U.S. Venture Partners led the round and was joined by a new investor syndicate that includes UPS and Scott McNealy, co-founder and former CEO of Sun and Chairman of Wayin. Additional investors include Javelin Venture Partners and Osage University Partners. To date, Skytree has raised a total of $19.6 million.

Why they're on this list: Skytree finished in the top 10 in Startup50.com voting and has already lined up big-name customers.

According to Skytree, advanced analytics, contrary to popular belief, "is not a meat grinder into which you can dump data in one end and expect nuggets of wisdom to come out of the other end."

Skytree has created a general purpose platform that allows data scientists to focus on what matters most, which Skytree says is Mean Time to Insights (MTI), and focus on what they are good at: building and deploying analytic models rather than coding algorithms. Skytree is delivered as an application within a data center that can be used by many, as opposed to the traditional delivery model: an individual application used on a single PC.

[ Why Your IT Department Needs Data Scientists ]

[ What it Takes to Be a Data Scientist ]

Skytree argues that machine learning is the key that unlocks an entire treasure trove of predictions, customer recommendations, and anomaly detections that most people don't even know are possible. Machine learning solves that problem by unleashing algorithms on massive amounts of data and finding patterns that data scientists didn't even know existed.

Competitive Landscape: Skytree says that most of the competition they run into is either from roll-your-own solutions or from legacy BI platforms from the likes of SAS and IBM, which potential customers may simply choose to stick with.

Customers include eHarmony, SETI, USGA and Adconion Media.

9. SumAll

What they do: Provide data analytics tools focused on delivering marketing, sales and social media insights.

Headquarters: New York, N.Y.

CEO: Dane Atkinson. He was formerly CEO of Squarespace.

Founded: 2011

Funding: SumAll is backed by two rounds of funding that total $7.5 million from Battery Ventures, Wellington Partners, Matrix and General Catalyst.

Why they're on this list: SumAll finished third in Startup50.com voting, and CEO Dane Atkinson has seen several startups through to successful exits.

SumAll's product is an analytics tool that helps businesses make more money by using their own data. SumAll tries to break down various data silos, from those associated with legacy apps to those involved with social media.

SumAll brings all the disparate revenue, payment, social and organic traffic data into one place so users can see the interactions across their business and understand if a social campaign is driving traffic which is converting into traffic. SumAll can help businesses figure out, say, the value of a "like" on Facebook or the value of a website visit.

Competitive Landscape: These aren't necessarily head-to-head comparisons, but SumAll will compete with Hootsuite, Nimble, Gooddata and Kissmetrics.

Customers include Siemens, Diamond Candles and Urbio.

10. Xplenty

What they do: Provide Hadoop as a Service for Big Data analytics.

Headquarters: Tel Aviv, Israel

CEO: Yaniv Mor. Prior to founding Xplenty, Mor managed the NSW SQL Services practice at Red Rock Consulting.

Founded: 2012

Funding: They're backed by an undisclosed amount of seed funding raised from Magma Venture Capital in June 2012.

Why they're on this list: Hadoop is being hyped to the moon these days, but development, implementation and maintenance of Hadoop require a very specific and arcane skill set. Xplenty's goal is to eliminate your need to learn any of that.

Xplenty provides a data integration platform that processes Big Data. A drag-and-drop interface eliminates the need to write complex scripts or code of any kind.

Xplenty is cloud based, so there is no installation of anything on an end user's servers, and there is no software to download onto workstations. With automated server configuration, users simply point to a data source, configure the data transformation tasks and tell the platform where to right the results to. Xplenty's platform uses SQL terminology, so for data analysts, the learning curve should be minimal.

Market Potential and Competitive Landscape:According to TechNavio, the Hadoop-as-a-Service market will top $19 billion by 2016. Xplenty's main competitor is Amazon Elastic Map/Reduce (EMR). Other Hadoop-as-a-Service competitors include Mortar Data, Qubole, and recently Microsoft with Hadoop on Azure. Rackspace is about to launch its own Hadoop-as-a-Service offering based on Hortonworks' distribution.

What Open Source Hadoop Coming to Windows Means to IT ]

Jeff Vance is a freelance writer based in Santa Monica, Calif. Connect with him on Twitter @JWVance or by email at jeff@sandstormmedia.net.

Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and onGoogle +.

Read more about big data in CIO's Big Data Drilldown.