The practical performance of stochastic gradient descent on large-scale machine learning tasks is often much better than what current theoretical tools can guarantee. This indicates that there is an inherent structure in these problems that could be exploited to strengthen the analysis. In this paper, we argue that data sparsity is such a property. We derive explicit expressions for how data sparsity affects the range of admissible step-sizes and the convergence factors of minibatch gradient descent. Our theoretical results are validated by solving least-squares support vector machine problems on both synthetic and real-life data sets. The experimental results demonstrate improved performance of our update rules compared to the traditional mini-batch gradient descent algorithm.