CIO

10 Big Data startups to watch

The Big Data space is heating up to the point that many pundits already see it as the over-hyped heir to "cloud." The hype may be a bit much, but Big Data is already living up to its potential, transforming entire business lines, such as marketing, pharmaceutical research, and cyber-security.

While the space is still fairly new, IDC, for one, sees big things ahead. The research firm predicts that the market for Big Data technologies will reach $32.4 billion by 2017, or about six times the growth rate of the overall information and communication technology market.

+ ALSO ON NETWORK WORLD Big data, big pay day: 10 data jobs with climbing salaries +

The startups below were chosen based on a mix of third-party validation (VC funding, named customers), experience (pedigree of the management team), and market potential (how unique is the product; how much pent-up demand is there for this sort of solution; how well are they positioned competitively). We also mixed slightly older startups on the brink of making it big with early stage startups exhibiting raw potential.

1. Sumo Logic

What they do: Apply machine learning to data center operations, using data analysis to pinpoint anomalies, predict and uncover potentially disruptive events, and identify vulnerabilities.Headquarters: Redwood City, Calif.CEO: Vance Loiselle, formerly VP of Global Services at BMC. He joined BMC via the acquisition of BladeLogic, which he co-founded. BMC acquired BladeLogic for $800 million.Founded: 2010Funding: $50 million from Accel Partners, Greylock Partners, and Sutter Hill Ventures.

Why they're on this list: Sumo Logic claims to address the "unknown unknown" problem of machine data: how do you get insights about data that you don't know anything about, or, worse, what do you do when you don't even know what you should be looking for?

Sumo Logic argues that managing machine data the output of every application, website, server, and supporting IT infrastructure component in the enterprise is the starting point for IT data analysis. Many IT departments hope they will be able to improve system or application availability, prevent downtime, detect fraud, and identify important changes in customer and application behavior by studying machine logs. However, traditional log management tools rely on pre-determined rules and thus fail to help users proactively discover events they don't anticipate.

Sumo Logic's Anomaly Detection attempts to solve this pain point by enabling enterprises to automatically detect events in streams of machine data, generating previously undiscoverable insights within a company's entire IT and security infrastructure and allowing remediation before an issue impacts key business services.

Sumo Logic uses pattern-recognition technology to distill hundreds of thousands of log messages into a page or two of patterns, dramatically reducing the time it takes to find a root cause of an operational or security issue.

Customers include Netflix, McGraw-Hill, Orange, Pagerduty, and Medallia.

Competitive Landscape: Sumo Logic will compete with CloudPhysics, Splunk, and open-source alternatives like Elasticsearch and Kibana.

2. Ayasdi

What they do: Apply Big Data analysis in order to solve complex problems, including finding cures for cancers and other diseases, exploring new energy sources, and preventing terrorism and financial fraud.Headquarters: Palo Alto, Calif.CEO: Gurjeet Singh, who was previously a Research Scientist at Stanford.Founded: The company was founded in 2008 but stayed in stealth-mode until its launch in January 2013.Funding: Ayasdi has raised $43.4 million in VC funding from FLOODGATE, Khosla Ventures, Institutional Venture Partners, GE Ventures, and Citi Ventures. The company also received $1.2 million in DARPA and NSF grants.

Why they're on this list: According to Ayasdi, since the creation of SQL in the 1980s, data analysts have tried to find insights by asking questions and writing queries. The query-based approach has two fundamental flaws. First, all queries are based on human assumptions and biases. Second, query results only reveal slices of data and do not show relationships between similar groups of data. While this method can uncover clues about how to solve problems, it is a game of chance that usually results in weeks, months, and years of iterative guesswork.

Ayasdi believes a better approach is to look at the "shape" of the data. Ayasdi argues that large data sets have a distinct shape, or topology, and that shape has significant meaning. Ayasdi claims to help companies determine that shape in minutes so they can automatically discover insights from their data without ever having to ask questions, formulate queries, or write code.

Ayasdi's Insight Discovery platform uses Topological Data Analysis (TDA) in tandem with machine learning techniques to enable data scientists, domain experts, and business analysts to optimize their data without coding.

Customers include GE, Citi, Merck, USDA, Mt Sinai Hospital, the Miami Heat, and the CDC.

Competitive Landscape: The machine learning space is wide open. Ayasdi will compete against IBM's Watson, SAS, and Skytree.

3. Feedzai

What they do: Feedzai uses real-time, machined-based learning to help companies prevent fraud.Headquarters: San Mateo, Calif.CEO: Nuno Sebastião. Prior to Feedzai, he led the development of the European Space Agency's satellite simulation infrastructure.Founded: 2013Funding: Feedzai has raised $4.3 million from SAP Ventures, Data Collective, and other international investors.

Why they're on this list: It's no great revelation that online fraud is a major problem. However, its impact is often underestimated. For instance, the Target breach could end up costing as much as $680 million, according to the Ponemon Institute.

Feedzai claims that it can detect fraud in any commerce transaction, whether the credit card is present or not, in real-time. Feedzai combines artificial intelligence (AI) to build more robust predictive models and analyze consumer behavior in a way that mitigates risk, protects consumers and companies from fraud, and preserves consumer trust.

Feedzai's software attempts to understand the way consumers behave when they make purchases anywhere, online or off. Feedzai says that its fraud detection system aggregates both online and offline purchases for each consumer over a longer time-frame, which results in earlier, more reliable detection rates.

The software uses data to create profiles for each customer, merchant, location, and POS device, with up to a three-year history of data behind each one. Profiles are updated for each consumer after every transaction. As a result, Feedzai claims to be able to detect fraud up to 10 days earlier than traditional methods and expose up to 60 percent more fraudulent transactions.

Clients include Coca-Cola, Logica, Vodafone, Ericsson, SIBs, Payment Solutions, and Servebase Credit Card Solutions.

Competitive Landscape: Competitors include SiftScience, Signifyd, Kount, and Retail Decisions (ReD).

4. CloudPhysics

What they do: Provide intelligent operations management for virtualized workloads.Headquarters: Mountain View, Calif.CEO: John Blumenthal, former director of product management at VMWare.Founded: 2011Funding: CloudPhysics is backed by $12.5 million in VC funding, including a recent $10 million Series B round led by Kleiner Perkins Caufield & Byers, which adds to previously raised funds from angel investors and the Mayfield Fund.

Why they're on this list: Virtualization and cloud management platforms lack actionable information that admins can use to better design, configure, operate, and troubleshoot their systems. However, having lots of data points is not enough. Not all data is equally valuable. In order to go beyond basic data capture, decision makers need to be able to validate, evaluate, and assess information from a variety of perspectives in order to make real, impactful decisions.

CloudPhysics goal is to analyze the world's IT data knowledge and use the information to transform computing, driving out machine and human costs in ways never before possible. Today, their servers receive a daily stream of 100+ billion samples of configuration, performance, failure, and event data from their global user base.

CloudPhysics' service combines Big Data analytics with data center simulation and resource management techniques. CloudPhysics argues that this approach uncovers hidden complexities in the infrastructure, discovers inefficiencies and risks that drain and endanger resources, and enables what-if analyses that can inform every data center decision.

Customers include Equinix, North Shore Credit Union, and United Technologies.

Competitive Landscape: This space is a land grab at the moment. Competitors include Splunk and Sumo Logic.

5. BloomReach

What they do: Provide Big Data marketing applications.Headquarters: Mountain View, Calif.CEO: Raj De Datta, formerly Entrepreneur-in-Residence at Mohr-Davidow Ventures and Director of Product Marketing at Cisco.Founded: 2009; remained in stealth mode until February 2012.Funding: BloomReach has raised $41 million in three rounds of funding from Bain Capital Ventures, NEWA, and Lightspeed Venture Partners.

Why they're on this list: Forrester Research estimates that the U.S. e-commerce market will hit $370 billion by 2017; worldwide, the market already topped $1 trillion, according to eMarketer.

Connecting these consumers with the products and content that they want and need means that smart businesses end up capturing an ever larger slice of that market. Companies like Amazon, Blue Nile, and even Walmart already leverage large-scale data and tech advantages. To compete with these companies, smaller retailers need to reach their audiences with increasing precision and accuracy.

BloomReach's Organic Search combines web-wide intelligence and site-level content knowledge with machine learning and natural language processing to predict demand and dynamically adapt pages to match consumer behavior and intent. This helps companies capture up to 60 percent of net-new users. BloomReach also takes a data-driven approach to m-commerce, more accurately matching consumers with content and products. This increases revenue-per-site-visit by up to 40 percent, and drives sales across all shopping channels.

Customers include Guess, Deb Shops, and Neiman Marcus

Competitive Landscape: Big Data marketing platforms are popping up faster than weeds after the first spring rains. While behemoths like Google, Amazon, and IBM have similar technologies, they keep them in-house. Others providing similar services include Kontera, DataSong, and Persado.

6. Altiscale

What they do: Provide Hadoop-as-a-Service (HaaS)Headquarters: Palo Alto, Calif.CEO: Raymie Stata, who was previously CTO of Yahoo.Founded: March 2012Funding: Altiscale is backed by $12 million in Series A funding from General Catalyst and Sequoia Capital, along with investments from individual backers.

Why they're on this list:  The market for Hadoop-as-a-Service is rapidly evolving. Hadoop is quickly becoming a key underlying technology for Big Data, but the problem is that Hadoop is both relatively new and rather complicated, making it difficult for organizations to find the talent to deploy and manage Hadoop-based applications.

Altiscale's service is intended to abstract the complexity of Hadoop. Altiscale's engineers set up, run, and manage Hadoop environments for their customers, allowing customers to focus on their data and applications. When customers' needs change, services are scaled to fit one of the core advantages of a cloud-based service.

Customers include MarketShare and Internet Archive.

Competitive Landscape: Amazon Web Services (AWS) is the 800-pound gorilla, but Altiscale will also compete with Cloudera and Hortonworks.

7. Pursway

What they do: Pursway uses big data analytics and proprietary algorithms to help companies identify the customers who are most likely to influence how people in their social networks shop.Headquarters: Herzliya, Israel; U.S. HQ: Waltham, Mass.CEO: Dave Ellenberger, who previously served as CEO of 170 Systems.Founded: 2009Funding: $17 million from Battery Ventures and Globespan Capital Partners.

Why they're on this list: In an era of social-savvy, data-driven marketing initiatives, marketers are increasingly looking for ways to unlock the power of relationship-based marketing. Most consumer behavior is influenced by the opinions of people we know and trust family, friends, and colleagues. While marketers have known this for quite a while, they have trouble acting on it.

Pursway's software is intended to improve customer acquisition, cross-selling opportunities, and retention. By imprinting a social graph onto existing customer and prospect data, identifying actual relationships between buyers, and identifying target customers who have a demonstrated influence over others' purchasing decisions, Pursway argues that it can help consumer-facing organizations close the gap between how businesses market and how people actually buy.

Customers include Sony, Orange, and Comcast.

Competitive Landscape: Competitors include Angoss, IBM, and SAS.

8. PlaceIQ

What they do: Provide a data-driven mobile advertising and consumer targeting platform.Headquarters: New York, NYCEO: Duncan McCall, who formerly founded PublicEarth.Founded: 2010Funding: The company is backed by $27.75 million raised in three round of funding from IA Ventures, Social Leverage, kbs+ Ventures, Neu Venture Capital, US Venture Partners, Valhalla Partners, Harmony Partners, and Iris Capital.

Why they're on this list: Mobile advertising and marketing present a unique challenge. The typical way companies try to understand consumer behavior online is through cookies. On smartphones and tablets, cookies don't have as much traction. Even if cookies are enabled in mobile browsers, they aren't terribly useful, since browsers are giving way to apps.

However, a potentially better replacement is location. Just as cookies track your journeys through the Web, marketers can glean demographic information from the actual physical locations you've visited.

PlaceIQ says that it "provides a multidimensional depiction of consumers across location and time." This allows brands to define audiences and intelligently communicate with those audiences to support greater ROI. PlaceIQ's product, Audiences Now, focuses on targeting customers where they are, in real time, creating an immediacy to a brand's marketing strategy.

Customers include Mazda, Disney, and Montana Tourism.

Competitive Landscape: The competition includes Verve Mobile, xAd, Placed, Sense Networks, jiWire, 4INFO, and Millennial Media.

9. MemSQL

What they do: Provide in-memory database technology for real-time Big Data analytics.Headquarters: San FranciscoCEO: Eric Frenkiel. Before MemSQL, he worked at Facebook on partnership development.Founded: 2011Funding: The company is backed by $45 million in funding from Accel Partners, Khosla Ventures, First Round Capital, and Data Collective. Their most recent funding was a $35 million Series B closed in January 2014.

Why they're on this list: Big Data and real-time analytics have the potential to profoundly impact the way organizations operate and how they engage with customers. However, there are challenges that prevent companies from fully extracting value from their data. Legacy database technologies are prone to latency, require complex and expensive architectures, and rely on slow disk-based technology.

The result is an outdated computing infrastructure that cannot handle the velocity and volume of data in the timeframe required of a true real-time solution.

MemSQL says that it solves this performance bottleneck with a distributed in-memory computing model that runs on cost-effective commodity servers. MemSQL's in-memory SQL database accelerates applications, powers real-time analytics, and combines structured and semi-structured data into a consolidated Big Data solution. MemSQL says that it empowers organizations to make data-driven decisions, which helps them to better engage customers, discover competitive advantages, and reduce costs.

Customers include Comcast, Zynga, Ziff Davis, and Shutterstock.

Competitive Landscape: Competitors include incumbents like SAP and Oracle, the open-source platform MogoDB, and startups such as Aerospike and Platfora.

10. Couchbase

What they do: Provides NoSQL database technology.Headquarters: Mountain View, Calif.CEO: Bob Wiederhold. He formerly served as chairman and CEO of Transitive, which was acquired by IBM in 2008.Founded: 2011Funding: Couchbase has raised a total of $56 million in funding from Adams Street Partners, Accel Partners, Mayfield Fund, North Bridge Venture Partners, Ignition Partners, and DoCoMo Capital.

Why they're on this list: The landscape for Big Data database technology is in flux. Hadoop and NoSQL seem to be the platforms most favor, although plenty of organizations are still betting on SQL.

Couchbase is placing its bet on NoSQL. The startup argues that its NoSQL document-oriented database technology provides the scalability and flexible data modeling needed for Big Data-scale projects. Couchbase also claims to offer the first NoSQL database for mobile devices.

Customers include AOL, Cisco, Concur, LinkedIn, Orbitz, Salesforce.com, Zynga, Amadeus, McGraw-Hill Education, and Nielsen.

Competitive Landscape: Competitors include MongoDB and DataStax.

Jeff Vance is a Santa Monica-based writer. He's the founder of Startup50, a site devoted to emerging tech startups. Follow him on Twitter @JWVance, or reach him by email at jeff@sandstormmedia.net.

Read more about software in Network World's Software section.