Hadoop – Significance for India
Concept – Hadoop concept is simple. In order to provide scalable, reliable and flexible large capacity for storing and analyzing Big Data, The job is split into number of small tasks and allotted to number of hardware devices which run in parallel.
HDFS system does the splitting and distribution of small information packets on hard disk storages on different machines ensuring sufficient redundancy. Map-Reduce system works in two steps. Mapping of the information packets with job tracker is done in key-value format where the key indicates the locationand type of information whereas value represents actual information. Mapping also converts the key-value input list to output list with some predefined criteria. Reduce part does the aggregation of the results from output list to provide desired summary information as the final output.
As the information from Big Data sources is collected continuously the information flow can be properly termed in terms of volume of data (bandwidth) varying with time. In that sense, Hadoop manages streaming data and the terms input list and output list could be interpreted as input and output data streams.
Significance for India – Thus the Hadoop system is based on parallel processing concept. Though it is employed for collection and processing of Big Data by using large clusters of computers, we can use this concept for handling large scale computational requirements for monitoring countrywide development projects and variety of large scale complex environmental and social challenges.
India is planning for development of 100 smart cities. The administration of such cities needs webbased central control software to cater for infrastructural services, energy and ecosystem monitoring, health, education and business requirements. Cloud computing with Hadoop system would be essential for such projects.
As regards, Digital India objective, Hadoop provides many avenues for progress in this direction. Major software companies in India like Infosys, Wipro and TCS and call centers are using human resource in similar fashion. These companies can improve their functioning to achieve scalability, redundancy and cost optimization by using Hadoop methodology.
The workload of projects can be split in tiny tasks and distributed to large number of small software companies located in villages and small towns with sufficient replication to safeguard the project execution timeline even if some providers fail to deliver the desired quality output. As the workforce would be scattered and not located in costly urban centres, the salary burden can be greatly reduced.
Moreover, this will give a big boost to small software companies in semi urban or rural area, which are struggling for their survival through transient staff and paucity of good projects. Strengthening of distributed small software companies will help in achieving the goal of digital India by providing live project training facilities to educated but unemployed youth who can’t leave their home due to agriculture or family requirements.
Hadoop relies on distributed parallel processing of Big Data by using computers. We can use the same concept for handling large scale projects of any type by splitting the work in large number of small work packets and replacing computers with skilled but scattered human workforce. Fortunately, the internet connectivity and popularity of mobile and tab devices has provided necessary hardware support for integrating such tasks.
Thus Hadoop system is not only for Big Data computing system for large projects but can provide a new way of distributed sustainable development in India.