In-memory technologies move databases to real time
- 24 March, 2014 17:12
Last week, application-performance monitoring service provider New Relic launched an offering that allows customers to mine its operational data for business intelligence.
The new beta offering, called Insights, has been a big hit, according to the company. The service has been fielding queries that, on average, consult 60 billion metrics a minute, with an average response time of 58 milliseconds.
In-memory database technology has been key in this new line of business, said New Relic CEO Lew Cirne. In-memory databases are relational databases that run entirely within the working memory, or RAM, of a server, or set of servers.
Cirne expects that the service, once live, could be used in all sorts of ways, such as for customer service, security and targeted marketing. As soon as a user has a question about some aspect of operation, the service can return detailed metrics on that topic, drawing from data that has been captured just seconds before.
New Relic built the database from scratch and assembled a large cluster that can muscle through terabytes of data to quickly arrive at an answer. In-memory technology allows the service to provide answers within milliseconds, even when finding the answers involve combing through large sets of machine-generated data.
Once a boutique item for well-funded fast-trading financial firms, in-memory database systems are starting to become more widely used, thanks to the falling costs of server memory and the demands on the part of customers who've come to expect speedy Internet services, such as Amazon's.
"Customers can transform their business by taking advantage of this technology," said Eron Kelly, Microsoft general Manager of SQL server marketing.
Microsoft has released to manufacturers SQL Server 2014, which has in-memory capabilities built in. Big-data software company Pivotal also released the first full commercial version of its in-memory database for Hadoop, Gemfire HD.
Microsoft and Pivotal's new offerings join an increasing number of databases with in-memory capabilities, including IBM Blu for DB2, SAP Hana, VoltDB's eponymous database, and Oracle TimesTen, among others.
Add to this list a growing number of caching tools that allow organizations to keep much of their relational database content into memory, such as Redis and Memcache. This approach is favored by Facebook, for instance, which uses MySQL to store user data, but relies on Memcache to get material quickly to users.
Traditionally, an enterprise database would be stored on disk, because it would be far too large to fit in memory. Also, storing data on nonvolatile disk helps ensure that the material is captured for posterity, even if the power to the storage is cut off. With volatile RAM, if power is interrupted, all of its contents are lost.
These assumptions are being challenged, however. Online transactional databases, in particular, are being moved to main memory.
"If your data does not fit in main memory now, wait a year or two and it probably will," said Michael Stonebreaker, a pioneer in database development who is chief technology officer at VoltDB, the company overseeing his latest database of the same name. "An increasing fraction of the general database market will be main-memory deployable over time."
Stonebreaker's opinion is echoed by others in the business.
"If you have a bit of a performance pain, and you have the budget to pay for a market leading, general purpose, relational database management system anyway, then it makes sense to manage the data in RAM," said Monash Research head analyst Curt Monash.
Stonebreaker admits the approach wouldn't work for all databases. With today's memory prices, keeping a 100GB or even a 1TB database in main memory is not prohibitively expensive, depending on the level of responsiveness required and amount of money that an organization is willing to spend.
"It is still too early to call it commodity, but in-memory is becoming commodity in a way," said Nicola Morini Bianzino, managing director for the SAP Platform Solutions Group at Accenture.
Bianzino said he has seen a shift in the questions he gets from clients and potential clients about in-memory over the past six months. "The questions are shifting from 'What is it?' to 'How do I do it?'" Bianzino said.
"The message has gone to the market and has been assimilated by clients," Bianzino said. "This doesn't mean they will move everything tomorrow, but they are taking it for granted that they will have to move in that direction."
With SQL Server 2014, Microsoft's approach to in-memory is to bundle it into its standard relational database platform."We're not talking about an expensive add-on, or buying new hardware or a high end appliance," Kelly said.
The SQL Server in memory component can be used for online transaction processing, business intelligence and data warehousing.
The interesting thing about in-memory is not only that it can expedite current database operations, but actually create entirely new lines of business, Kelly said.
As an example, Kelly pointed to online auto parts reseller Edgenet.
Using a beta version of SQL Server 2014, "Edgenet has been able to transform its business to respond much faster to competitive threats, by enabling dynamic pricing on their website," Kelly said. The company can change prices of goods for customers in a given market, based on what the latest prices are from regional competitors, which may run spot sales on certain items.
Although dynamic pricing can be done with a standard relational database, the practice could lead to contention issues, in which updating the prices may slow the response to the end users to the point where they may not get the price quite immediately, Kelly said.
"In the way SQL Server does memory, it eliminates latching. So the user does not see any delays as they access the system," Kelly said. (Currently, Edgenet is going through a Chapter 11 bankruptcy protection, so perhaps the use of dynamic pricing will help the company regain its footing with the regional competitors).
Pivotal's newly launched Gemfire HD extends in-memory databases to big data, through its integration with the Apache Hadoop data-processing platform.
"Gemfire is essentially a SQL database in-memory that can pull data from the Hadoop File System [HDFS], or persist data down into HDFS," said Michael Cucchi, Pivotal senior director of product marketing.
Gemfire can ingest large amounts of data extremely quickly, even when the data is coming from multiple sources, Cucchi said.
"If you have to handle a 1,000 requests a second, you insert the in memory layer of Gemfire and it effectively offloads the real-time requirements for the application," Cucchi said.
This technology could be used, for instance, by a wireless telecommunication provider. Gemfire could be used to ensure all the calls it handles at any moment can be routed through the best network path at that time, Cucchi said.
Not everyone is convinced that special in-memory databases are needed. Many others have taken the approach of running an in-memory caching layer above the database, to serve the most requested fields.
"People have been doing in-memory databases for years, where you have terabytes of RAM across many machines, all fronting MySQL" said Bryan Cantrill, software engineer at the Joyent cloud service. Cantrill points to the growing popularity of Memcache, which is being widely used as a front end for relational databases.
What should an organization do if the amount of material it has on hand exceeds the available working memory?
Many in-memory technologies, such as Microsoft's and IBM's Blu are actually hybrids, in that they can still store some material on disk, and only keep the most frequently consulted data in working memory.
Like other in-memory systems, SQL Server 2014 has a few tricks to preserve data should the power go out, causing data to disappear from the volatile RAM. The tables are never written back to disk, but any changes are written to data logs. Should the power go out, the database can rewrite the lost data using the logs.
SQL Server 2014 comes with diagnostic technologies that can examine the user's databases and suggest the tables that should be moved to memory, based on how often they are consulted. "Just move the tables that are hot into memory. That will allow you to get big performance gains without having to buy new hardware," Kelly said.
No changes are needed at the application layer, which still sees the same database interface, Kelly said.
Researchers are experimenting with other approaches as well. Stonebreaker is investigating an approach called reverse caching. In traditional caching, the most frequently consulted data is to keep live in-memory. With reverse caching, only the data that is rarely consulted is written to disk.
"We have compared caching with anti-caching with a prototype at the [Massachusetts Institute of Technology] and anti-caching is a way better idea than caching," Stonebreaker said.
Part of the problem with caching is that when a caching system, such as Memcache pulls data into memory from disk, it pulls in the entire block of data on disk, which usually includes additional records. So the approach makes for inefficient use of memory. In contrast, a memory-first approach only writes to disk those specific entries that are rarely consulted.
Whether through an enterprise-ready in-memory database system, or an open-source caching layer, the organization is now able to serve its customers and its business managers much more quickly.