Using big data and Hadoop 2: New version enables new applications
A comprehensive collection of articles, videos and more, hand-picked by our editors
Enterprises consider support for mobility and productivity enhancement to mobile workers as their top-priority new application category, according to a recent survey by CIMI Corp. That means most companies that have adopted, or are adopting, Hadoop will likely have to integrate the framework with mobile applications.
The process of integrating Hadoop and mobile applications can be broken down into three segments:
- Recognizing Hadoop's inherent limitations in mobile use
- Creating realistic Hadoop application frameworks
- Supporting and troubleshooting Hadoop in mobile applications
Hadoop is an open implementation of the MapReduce model developed to deal with large distributed databases. Since Hadoop has been linked to the cloud and cloud deployment, most people overlook the fact that it has attributes that don't align with enterprise needs in general, mobile applications in particular. Here are some of those characteristics:
- Hadoop's value is greatest for databases that are 10 to 1,000 times larger than the typical databases used by mobile applications. For many, Hadoop is overkill.
- Hadoop has significant setup and processing overhead. A Hadoop job might take several minutes, even if the amount of data correlated is modest.
- Hadoop isn't particularly good at supporting data structures that have a multi-dimensional context. For example, a record that defines the value of variables for a given geography, then links vertically to define the value over time requires a more complex representation of data relationships than the simple key-value one used by Hadoop.
- Hadoop is less helpful in problems that have to be viewed iteratively -- as several sequential dependent steps.
Mobile applications generally should not be designed as a new Hadoop application.
As the aforementioned points indicate, mobile applications generally should not be designed as a new Hadoop application. Adapting Hadoop to meet the needs of mobile applications entails exploiting existing Hadoop applications through mobile connections.
Using existing Hadoop applications
The most obvious way to adapt Hadoop data to mobile applications is to create a front-end database that is derived from Hadoop but is presented in a traditional form. Think of mobile-Hadoop applications as a Hadoop run that is executed on a scheduled basis, then creates another database (one using a relational database management system, for example) that a mobile application can query. This model won't normally require changes to the way Hadoop is used because most current uses of the framework are in batch mode. The result is a smaller database and an organization of information suitable to mobile response time requirements.
Making Hadoop work for mobile applications is a job for a combination of enterprise, software and database architects. Not only must Hadoop data be pre-digested to fit into a more responsive format, (perhaps through aggregation, multiple keying, etc.) but data sources outside Hadoop applications also should be examined to see if other information should be added to improve mobile usability. This involves working backward from mobile user requirements to Hadoop's capabilities and information content. Architects must fill in the blanks from other sources to get an efficient and complete representation of mobile user needs.
Those who use Hadoop for real-time applications and want to make the data available to a broad range of users, likely know that without some Hadoop overlay, it's difficult to make it an integral part of any IT operation. In order to maximize the amount of information from Hadoop, users who contemplate using the framework for mobile applications should restructure their Hadoop use as a Hive project.
Hive is an Apache project that automates the process of converting queries from traditional SQL sources into something Hadoop can work with. The data warehouse system creates additional indexes and provides tools for real-time Hadoop access. Hive is not a complete substitute for an effective database hierarchy built to insert summaries or abstractions between Hadoop and real-time users. Think of Hive as a tool in structuring a Hadoop-centered repository or warehouse, but not a complete solution to real-time mobile problems.
Focus on Hadoop applications' strengths
More on Hadoop applications
When and when not to use Hadoop
Hadoop in big data offers quick payoff
How to design a Hadoop strategy
Another suggestion for linking Hadoop to mobile applications is to think backward as a part of your project. Start by asking, "What is Hadoop uniquely good at?" The answer is that it's uniquely good at scheduled tasks that run at most daily, and are aimed at data that is unstructured or semistructured, but whose information can be represented as a series of traditional databases that are effectively abstractions of the raw data.
For users who really need Hadoop, this combination of strengths should have already been the basis for your project. Where Hadoop isn't the foundation, it would be wise to review the project structure and make accommodations, even without the need to support mobile applications.
There are cases where the aforementioned recommendations will not work. Hadoop's power lies in its ability to perform a series of difficult tasks on specific types of data at certain volume levels and distributability. Where any of those attributes are not present, Hadoop may not pay off, given the effort needed to deploy and sustain it.
About the author:
Tom Nolle is president of CIMI Corp., a strategic consulting firm specializing in telecommunications and data communications since 1982.