CIO

PostgreSQL devs lift open source database to enterprise heights

More fault-tolerance features on the roadmap

The release of the first beta of version 9.1 of the open source PostgreSQL database has opened a new era in enterprise-class reliability and data integrity that can compete with the big names, say its developers.

PostgreSQL core team member Josh Berkus said the upcoming 9.1 release introduces a certain amount of fault tolerance through the synchronous replication feature so the database can now guarantee that a write will survive a failure of the master node.

“With this version, we have completed the feature set offered by Oracle DataGuard, for example, which will allow users like NTT to migrate away from Oracle,” Berkus said.

There are more fault-tolerance and clustering features in the pipeline for 9.2, but the core team felt it was important to make the most basic synchronous replication case work as well as it could before moving on to more complex configurations.

Berkus said a number of improvements on this have been discussed for 9.2, including quorum commit (where the write needs to persist to, say, three out of five replicas), and in-memory synch, so that synch to disk needs not be a concern and you can have durability through lots of replicas at network speeds.

For “hot-hot” fault-tolerance in a cluster configuration, Berkus said there are some projects in development outside the core project.

“It’s already possible to construct a large fault-tolerant cluster of PostgreSQL using Skype's Skytools framework if you are building a new application,” he said, adding Skype has some 200 servers running PostgreSQL in production.

“EnterpriseDB and NTT are working on a project called PostgresXC which is intended to fill the use-case of Oracle RAC, although its design is somewhat different. Thirdly, the Postgres-R project, a group-communication based clustered PostgreSQL, has recently become fully funded and may produce a release version this year.”

The 9.1 release is set to include a number of features aimed at helping database administrators, which Berkus said is a direct result of the development model.

“Through the mailing lists and chat, the PostgreSQL developers probably hear from 1000 PostgreSQL DBAs a month. So yes, a lot of our development is in response to direct user feedback,” he said, adding a “good number” of code contributors also user PostgreSQL in production.

Wisconsin Courts does Serializable Snapshot Isolation

The court system in the US state of Wisconsin is using some 100 PostgreSQL databases for the core application used by 72 county circuit courts.

About 3000 users directly connect to the production databases in the various court houses and there are dozens of Web applications receiving millions of hits each day.

Kevin Grittner, database administrator, expert (actual title) at the Wisconsin Court System (WCS) said with a lot of replication between the disparate courts and the central information repository, he is tasked with making sure performance is good, managing the security and integrity of the data and work with programmers to ensure “sound database design” in new development.

“Our current application framework is over 10 years old, and we have decided to rework it using newer technology,” Grittner said. “Now that we have converted everything to PostgreSQL, we intend to eliminate some of the database portability code and use more PostgreSQL-specific features.”

In moving closer to the database, WCS is committing to PostgreSQL for the long-term and needed to deal with problems posed by snapshot isolation.

Grittner received approval to work on Serializable Snapshot Isolation (SSI) so PostgreSQL would maintain data integrity in this new environment “without excessive cost or performance problems”.

“The circuit court database schema has about 400 tables [and] we have over 20 programmers working full time on the applications which hit these databases, with frequent new releases which extend the schema and add new transaction types,” Grittner said.

“Trying to train all of these programmers in the tricks of maintaining data integrity under snapshot isolation amidst all of this flux is not really feasible.”

With SSI, Grittner says, the rule is simply “make sure this transaction will do the right thing if it is the only transaction running”.

“If they do that, it will do the right thing in any mix of transactions, or roll back with a serialisation failure,” he said. “This protects our data integrity while keeping down the development costs.”

Integrated extension management

Extension management is more associated with Web browsers than databases, but at PostgreSQL services company 2ndQuadrant, database architect and PostgreSQL contributor Dimitri Fontaine the extension problem has been an itch he has been “wanting to scratch” for a long time.

“In my usage of PostgreSQL using add-on packages has been a customary choice for years, yet dump and restore was always problematic because of that,” Fontaine said. “It was high time for me to find the time to work on solving that, and I've spent two years to talk about the problem and propose a design that would be widely accepted.”

Fontaine, who is based in France, said no two people within PostgreSQL agreed on what an extension was at first, but since the extension feature relates to many objects, it has a big impact on the source code files so it needed strong community buy-in.

Page Break

With funding help from the European Union's Seventh Framework Programme, Fontaine worked to come up with a solution for extensions.

“PostgreSQL code extensibility is awesome and has been put to great use already, but it lacked support for DBAs and daily operations,” he said. “Mainly when you install an extension before 9.1, its install script is found as-is in the dump script which means you have to prepare the exact same version already half-installed on the database where you want to restore your dump.”

With extensions in core, all the dump contains is a single dedicated SQL utility command, and DBAs can restore a database on a newer system with a newer version of the extension if it is compatible.

Tables unlogged for more speed

Robert Haas, database architect at commercial PostgreSQL software company EnterpriseDB and major contributor to the PostgreSQL project, is working on unlogged tables, which, he says, are useful in cases where an organisation is willing to trade the possibility of losing data in the case of a crash for more speed.

“For example, if you are writing a Web application, you might make the user-sessions table unlogged,” he said. “If the database crashes, everyone will have to log back in, but that shouldn't happen often enough to be a serious inconvenience.”

Haas was originally interested in implementing global temporary tables — a table where each session sees separate table contents, but the schema is shared among all sessions.

“In the process of thinking about that, the idea of unlogged tables occurred to me and seemed like it would be independently useful and simpler to implement,” he said.

Haas wanted to include the new feature in 9.1, but ran out of time.

“Unlogged tables are also useful for people who are using PostgreSQL for ETL or other batch-processing operations,” he said. “If the database crashes midway through the batch it can be restarted, so the powerful durability guarantees that PostgreSQL normally offers are not necessary in this case, and unlogged tables provide a way of relaxing those guarantees in a controlled way, without risking undetectable data corruption.”

Haas said the performance gain from avoiding write-ahead logging in these cases can be substantial.

The final release of PostgreSQL 9.1 is due around August 2011.

Follow Rodney Gedda on Twitter: @rodneygedda

Follow TechWorld Australia on Twitter: @Techworld_AU