Taken the dataset from KDD 2012 cup which is of around 10gb. We have placed the training file on hadoop cluster. Initially, we have used pig to transform the data. Later, we have used Spark MLlib for dimensionality reduction and model building. Finally, evaluated the models using some evaluation metrics.