Analyzing and working with big data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization.
Instead, big data requires large clusters with hundreds or even thousands of computing nodes. Official statistics is increasingly considering big data for deriving new statistics is increasingly considering big data for driving new statistics because big data sources could produce more relevant and timely statistics than traditional sources.
One of the software tools successfully and wide spread used for storage and processing of big data sets on clusters of commodity hardware is Hadoop.
Hadoop framework contains libraries, a distributed file-system, a resource-management platform and implements a version of the MapReduce programming model for large scale data processing.
In this paper we investigate the possibilities of integrating Hadoop with R wich is a popular software used for statistal computing and data visualization.