PaperEmpirical performance model-driven data layout optimization and library call selection for tensor contraction expressions