IBM opened the doors to its Almaden Research Centre this week to show what its scientists are working on, including some advanced technologies for storage and data analysis. Located at the southern tip of Silicon Valley, Almaden claims to be the birthplace of the distributed relational database and the first data mining algorithms.
Fiddling with bits and bytes to improve how they're stored and analysed continues to be a focus, although the labs also work on areas like nanotechnology, spin physics and human computer interaction.
The projects on show this week included Panache, a file system for use across wide area networks, Sage, a tool for moving data to different storage tiers automatically and Cobra, which helps companies figure out what people are saying about them in online forums.
Panache is a clustered file system that provides applications with high speed access to a large, central pool of data even if the applications are far away, in data centres in different parts of the country or on different continents, for example.
"Customers are asking us to give them a way, when data is created at one site, to make it available in other geographically distributed locations, so that users at those locations can access the data as if it were local," said Bruce Hillsberg, director of the Storage Systems research group.
The file system uses advanced caching techniques to make sure the data at each location is kept consistent. It has push and pull characteristics that replicate changes efficiently across multiple nodes in a wide area network, so that conflicts don't arise when changes are made to the data caches at individual nodes.
IBM says it could have several uses. Engineers working on a project in different countries can access the same set of data and make changes to it locally, without worrying about the cached versions getting out of sync.
It could also reduce the time it takes to replicate virtual machines between data centres, researchers here said. Applications running inside a virtual machine access data from a virtual LUN, typically stored as a file in the data centre. When a new virtual machine is configured or restarted from a failure, the OS image and its virtual LUN have to be transferred between sites, causing delays before the application is ready for use.
Panache can maintain a cache of the OS and its virtual LUN at the remote site, so it's there when needed. IBM researchers say this would greatly reduce the time and complexity of configuring new virtual machines and moving them across a wide area network. It could also help companies to reduce data centre costs. Instead of hosting 20,000 virtual machines in one large data centre, the faster migration capabilities would provide the option of hosting the VMs across 20 smaller data centres.
Some large cluster file systems already exist, like IBM's GPFS (General Parallel File System) and Sun's Lustre, now maintained by Oracle. Panache is unique because of its high level of parallelism, according to IBM, which allows multiple nodes to read and write to their local data cache even when they are temporarily offline.