An Empirical Study of Starting Salaries and Employment Trends of Engineering Students in India
Abstract
This paper presents an empirical study of a recently compiled workforce analytics data-set modeling employment outcomes of Engineering students. The contributions reported in this paper won the data challenge of the ACM IKDD 2016 Conference on Data Science. Two problems are addressed - regression using heterogeneous information types and the extraction of insights/trends from data to make recommendations; these goals are supported by a range of visualizations. Whereas the data-set is specific to a nation, the underlying techniques and visualization methods are generally applicable. Gaussian processes are proposed to model and predict salary as a function of heterogeneous independent attributes. Key novelties the GP approach brings to the domain of understanding workforce analytics are (a) statistically sound notion of uncertainty of prediction that is data dependent, (b) automatic relevance determination of various independent attributes to the dependent variable (salary),(c) seamless incorporation of both numeric and string attributes within the same regression frame- work without dichotomization; specifically, string attributes include single-word or categorical (e.g. gender) or nominal attributes (e.g. college tier) or multi-word attributes (e.g. specialization) and (d) treatment of all data as being correlated towards making predictions. Insights from both predictive modeling approaches and data analysis were used to suggest factors, that if improved, might lead to better starting salaries for Engineering students. A range of visualization techniques were used to extract key employment patterns from the data.