New York Public Library reads up on the cloud
- 25 August, 2016 20:00
Four years ago, the New York Public Library began to move its web properties to the cloud.
Today, the library system has all of its approximately 80 web sites in the cloud. The library has shrunk the number of on-premise servers by 40% and is running those web properties 95% more cheaply than if it had bought the hardware and software to do it all by itself.
The library took a risk on the cloud, and on Amazon Web Services (AWS), and it paid off.
"We've grown but we've grown in the cloud," said Jay Haque, director of DevOps and Enterprise Computing at the library. "Today, we're primarily focused on the digital identity of the NYPL. How our properties look. How they merge and integrate. How our patrons use the site … Without the cloud, we wouldn't have the time to focus on the customer experience."
The NYPL is the largest public library system in the U.S. based on the size of its book collection and the amount of materials borrowed annually.
With more than 90 neighborhood branches, four research centers and about 67,000 free programs under the library's umbrella, the system houses more than 51 million items, including books, to e-books and research collections. The collection also includes a 1493 copy of Columbus's letter announcing his discovery of the New World and a collection of 40,000 restaurant menus dating as far back as 1843.
The library system serves more than 17 million patrons a year through all of its branches.
The NYPL is also focused on serving millions of people online, reaching out to a global audience that can't regularly walk through its doors.
With the library system's 80 web sites, users can browse its immense collections, find a list of library-recommended international novels, download e-books, find blogs, watch videos of author interviews and view more than 800,000 digitized items, such as maps and photos.
Making all of that work seamlessly, quickly and without pause is Haque's job.
He started at the NYPL as a page in 1992, then went on to work in IT as a support technician until he left for another job in 1997. However, the library work called him back 17 years ago and he's stayed on since then, designing web pages, taking care of servers and working his way up.
Early in its web development, the library's online presence wasn't so well used. Haque said if a website went down, no one would notice. They wouldn't even worry about getting it back online till the next morning.
"Now the website can't go down," he said. "If it goes down for two minutes, people are screaming."
The need for reliable websites was a big part of the push to move to the cloud in 2012.
"The major problem was getting to that sweet spot of having that high level of availability and resilience without spending money on the initial capital outlay to do it," Haque told Computerworld. "We realized that to do that on-premise we needed a significant amount of hardware and software at significant cost to meet the modern demands of a highly available, highly secure and automated system that we would need to be nimble."
What would that have cost? Likely between $1 million to $2 million.
The NYPL IT team wasn't unfamiliar with the cloud, and already used different cloud platforms for certain projects.
Between 2009 and 2010, the library system made its first cloud move, trading in IBM's on-premise Lotus Notes for Google Apps. The library also migrated from Oracle's PeopleSoft human resources software to cloud-based WorkDay.
When it was time to think about moving its web infrastructure to the cloud, the library turned to AWS.
It wasn't an easy decision.
"Back four years ago, AWS was still fairly new," Haque said. "Most CIOs were on the fence about AWS at first. There wasn't much Fortune 500 on there yet."
However, AWS still had more traction than other cloud vendors, and the library system could easily find consultants to help with the cloud move.
The library's IT staff also liked the idea of the AWS pay-as-you-go model, so they got started by deploying one website on the AWS public cloud.
"We realized very quickly it was going to be cheaper than anything we could do," Haque said. "It was easier to manage and we realized the benefits of multiple data centers and the repeatability of the platform -- all without increased cost."
More than not increasing the cost, Haque said he figures it was in "the neighborhood of 95% cheaper."
Today, every website they build is on the AWS platform.
"Four years ago, yes, it was a smart idea for them," said Ezra Gottheil, an analyst with Technology Business Research. "Now, they could use any of them, but AWS and Google are probably best for what they need. Starting out, you just buy what you need. You don't need a strategy or a road map."
The NYPL's cloud move has gone so well, that it doesn't buy on-premise servers anymore and has scaled back what it had by 40% in the past four years.
Today, it has about 300 servers in-house and about 300 with the AWS cloud.
It wasn't all smooth sailing, though.
The biggest challenges the library's IT staff had came down to training and changing expectations.
The library had traditional system administrators, who were anxious about automating much of their regular processes and changing their job functions.
There was extensive initial training, particularly for the first six to eight weeks.
"Initially, people were resistant," Haque said. "Would they pick it up fast enough? But they did really well. We sent people to conferences. We spent a lot of attention helping people get new skill sets. The challenge, these days, is helping people keep up with the new services and the new skills that they need."
Now Haque is thinking about expanding to other cloud vendors.
For instance, he's considering Microsoft Azure to migrate some of the library's Windows-based applications to the cloud.
He's also thinking about storing the library's 3 petabytes of data that's built up with the digital collection program -- preserving maps, photos, illustrations, books and videos – anything the library has digitized.
However, Haque wants that precious data to have serious redundancies.
He's considering using the AWS Glacier cloud storage service because it is a low-cost storage option for "dark" data, which is rarely used or accessed.
"But do we keep a second copy somewhere and a third copy somewhere else?" he asked, explaining that it would be optimal to be able to make a change to the data in one cloud and have it update across all three vendors. "We'd like to use Google, AWS and Azure. I don't know if that's the answer for us, but that's what we're thinking about … It will be interesting to see what they would be willing to do to make it easy to use. We'll talk with them about it at some point."
Gottheil said that should be doable.
"It's a matter of sending the same update to three different systems, not a matter of daisy-chaining the systems," Gottheil added." He may have a problem getting any of the big three to do it themselves, but another actor, a systems integrator, could do it."