Machine learning methodologies for paddy yield estimation in India: A case study
Abstract
In-season crop yield estimation has various applications such as the farmer taking corrective measures to increase the yield, optimizing the supply-demand chain of fertilizers, pesticides and agricultural commodities, price prediction and determining the risk levels in agriculture insurance. Crop yield estimation models based on remote sensed satellite data and weather data have been widely adopted as do not require specific information about the crop and farming practices and hence are scalable across crops, growing locations and conditions. In this paper, we present a case study of weather and soil data based yield estimation modeling for paddy crop at different spatial resolution (SR) levels, namely, at the district (coarser SR) and taluk (finer SR) levels in India. We provide a detailed analysis of accuracy of the yield estimation models across varied sets of features and different machine learning (ML) techniques. Further, we perform dis-aggregation of district yield data by applying the machine learning models trained using district level data to predict yields at taluk level. Taluk level yield prediction by dis-aggregation of district level data has average error of 6% and maximum error of 25%.