In this podcast, founder and CEO of ScaleOut Software William Bain discusses distributed data caches and how they can be used with data stored on the cloud. Visit the SearchSOA Podcast Library for more expert commentary.
Early usage of the cloud has been primarily dominated by storage, said William Bain, founder and CEO of ScaleOut Software, Inc. Accessing data stored on the cloud, though, presents challenges. "What users primarily need is a very simple, easy to use software architecture that lets them access their data quickly," said Bain. Bain believes a new layer of storage, the distributed data cache, will help make data more accessible and make cloud computing a more appealing option for computation and analysis.
A distributed data cache, also called a distributed data grid, is a storage layer that sits between a database server and the in-memory of an application. Bain believes that it can speed application performance. "[Distributed data caches] host data so that it can be accessed very quickly, much more quickly than if it were kept just in the database server," said Bain.
Bain sees several use cases for data analysis using distributed data grids on the cloud. He says that MapReduce, a method of analysis that divides a computation among several servers and then combines the results, can be more easily deployed through the use of distributed data grids. Bain also cites the financial services industry as a good place to use distributed data caches. The ability to run analysis across a large set of servers will help stock analysts make decisions and apply strategies more quickly, suggested Bain.
While the primary use of distributed data caches is to store fast changing data that is accessed by multiple servers, Bain believes that the duties of distributed data caches will grow over time. "At some point in the future, distributed data grids and database servers will probably merge," said Bain. For the next several years, though, Bain believes that distributed data grids will continue to provide a platform for performing parallel data analysis.
This was first published in January 2010