CIO

Aussie SkyMapper Telescope to “open new windows of exploration”

Southern Sky Survey to usher in new era of collaborative research

What do the universe, open source software, a 12,000-core supercomputer, a cool $2.5 million of high-grade silicon and one of the country’s largest data sets have in common? They all underpin a five-year Australian initiative to map and study the observable universe from the southern hemisphere.

The story begins back in 2003 when the Great Melbourne Telescope was destroyed during the bush fires of January that year.

From the ashes rose the SkyMapper observatory, based in the Siding Spring Observatory at the safer central-west NSW location of Coonabarabran and tasked with scanning the night skies to create the Southern Sky Survey.

A deep digital map of the southern sky, the Southern Sky Survey -- with a little help from the National Computational Infrastructure (NCI) National Facility -- will allow astronomers to study interstellar objects ranging from nearby asteroids to super-distant objects like quasars.

The data from SkyMapper will also be shared globally via the Virtual Observatory initiative, to allow astronomers all over the world to explore its every possibility. Experts say this advance heralds the arrival of a new era in astronomy -- one where researchers can draw on freely available online data about the universe instead of having to wait months, or even years, for a chance to observe the night sky through a billion-dollar physical telescope.

(Check out CIO's SkyMapper slideshow here.)

Southern Sky Survey

According to Stefan Keller, SkyMapper scientist and research fellow at the Australian National University’s Research School of Astronomy and Astrophysics, the underlying idea -- and significance of the survey -- is that it will form the first digital, optical map of the southern skies.

Mapping about a billion objects, the survey will provide a fundamental resource for future astronomical studies in the near and distant universe.

“The southern sky has traditionally not been as observed as the northern sky, as there are fewer people, so there is the potential to find objects the size of Pluto drifting around out there [in our solar system] and as yet unseen,” Keller says.

Through particular attention to the use of different coloured glass filters in SkyMapper’s 268 megapixel camera, astronomers will be able to focus on particular parts of the stellar spectrum to help decipher the heat, density and chemical abundances of stars.

At the far edge of the optically observable universe, SkyMapper will also be able to pick up things such as ‘high red-shift’ quasi-stellar objects (QSOs), Keller says.

“Here we have galaxies powered by central black holes,” he says. “As they consume material they spit out jets of material and create a lot of light. Those objects form very valuable probes through the murk between us and them, and in that way we can determine what the material is along that line of site.”

“SkyMapper is really about finding the needles in the haystacks -- the incredibly rare objects. That’s really the power of SkyMapper; by drawing in that many objects you can spot all the oddball ones.”

SkyMapper is also notable for the speed and breadth at which it can take images -- about 1000 degrees of space a night, according to Keller; about 20 times the amount of data available through any other observatory in the southern hemisphere.

Unsurprisingly, the Southern Sky Survey will result in a large volume of raw data -- about 470 terabytes, or about 100,000 DVDs worth -- when complete, according to Keller.

Page Break

Using a data trickler and a secure gigabit link to the Australian Academic Research Network (AARNet) each night’s scan produces about 0.7 terabytes of data, which is transferred from Coonabarabran to the NCI National Facility in Canberra.

The data is then stored using a hierarchical storage management system, which mixes disk and a large robotic tape library system to help preserve the data in a regular, categorised form and duplicate it in two separate locations for backup purposes.

(Check out CIO's SkyMapper slideshow here.)

To increase the usability and accessibility of the data, the project will also shrink the raw data down to the actual numbers that are most important to researchers -- the shapes, sizes and brightness of the billion objects in the southern sky.

"We are going to image those one billion-odd objects in 36 images spaced in time over five years, Keller says. “In that way we can look at objects that vary across the sky, objects such as pulsating stars and moving asteroids.

“Then we reduce the data and end up with a database of about 30 terabytes, which we then make available via the Web. As far as we know it will be Australia’s largest database.”

Automation

Needing to identify and catalogue around a billion objects, and scan for five years, SkyMapper relies on a high level of intelligent automation, Keller says.

Using an automation and scheduling application, SkyMapper is capable of independently assessing night sky conditions -- whether the moon is out, how bright the stars are, whether there are clouds -- and then progress through the most suitable scientific program.

The cataloguing of one billion objects across more than 4000 survey fields is also automated, meaning that SkyMapper is able to discern objects based on factors such as brightness and shape, Keller says.

“We can cleanly extract all the stars and measure their brightness,” he says. “Galaxies are a bit harder as they can have spiral arms on them but we can still easily find the interesting objects that lie in that data set.”

Open Source

According to Keller, the SkyMapper project is heavily based on open source software, largely because of its low cost.

“We are on a very tight budget, so any expertise we can draw on in a shared way is extremely valuable to us,” Keller says.

“Most of our pipeline is comprised of components written by astronomers elsewhere in the world over many decades. We draw together those little units and basically script them all up together with Perl and Python, and that makes for an efficient coding process.”

Page Break

For its databases SkyMapper uses Postgresql, which is front-ended by a standard Web form through which relational searches can be done.

“You may be interested in a galaxy at a certain position, so you can get on to the Web page, download the data and image of that galaxy. In this way we can save astronomers a lot of time,” Keller says. “They don’t have to go and survey it themselves to decide if they’re interested in it for further research.”

(Check out CIO's SkyMapper slideshow here.)

This is particularly important for the current generation of massive 20-30 metre telescopes, Keller says. These behemoths cost about a billion dollars each, so time on them is extremely valuable.

Keller says that with the increasing importance of online data as a reference for the sky, astronomy is on the verge of a paradigm shift. This new online data is served by the International Virtual Observatory Alliance, a consortium of international astronomical facilities that make their data freely available to researchers and scientists.

“SkyMapper will be a key component in the Virtual Observatory by providing coverage for the southern sky, allowing astronomers to cross match objects seen in gamma-rays through optical to radio wavelengths and open new windows of exploration,” Keller says.

High Performance Data Transfer

Ben Evans, head ANU Supercomputer Facility and manager at the NCI National Facility, says that data transfer between SkyMapper and the NCI National Facility site is performed using GridFTP, which is designed to provide a more reliable and high performance file transfer for grid computing applications.

To handle data replication, the NCI National Facility uses a modified version of the data replication techniques in the Globus Alliance’s Globus Toolkit to verify that a full data copy has been received in Canberra before the images created at its Siding Spring Observatory are deleted, Evans says.

“What’s unique in Australia is us expanding the way in which we manage data,” Evans says. “Normally, we would just manage data in the local domain, but the grid software allows us to push our management technique right out to the instrument. That’s not typically how grid software is being used in other parts of the world.”

According to Evans, the decision to use on open source software as a control mechanism to manage the SkyMapper project’s large volumes of data came down to simplicity and flexibility -- and a lack of commercial software choices.

“We just wanted to adapt something that was already out there rather than develop our own,” he says. “There aren’t too many commercial software apps that do this and there is too much already available in the open source domain.”

Supercomputing

Evans says the bulk of the analysis of the SkyMapper data will be done on a brand new, next generation Sun supercomputer kitted out with 12,000 cores. Due to be fully online by December, the supercomputer will offer a tenfold increase in performance over the facility’s current set up of two SGI machines, each with just under 3500 cores in total.

Along with processing data from SkyMapper, the new Sun machine will also be used for atmospheric and weather research as well as serving other high-performance computing needs around the country, Evans says.

Data hosting will be done on a data storage cloud hosted next to the supercomputer to allow for easy access for data processing. This cloud is based on a hybrid of software and hardware including: SAN QFS software from Sun to help manage the storage domain; virtualization software from VMware; Linux as a core operating system; Solaris ; and databases from MySQL and PostgreSQL.