It is hard not to be enamoured by deep learning nowadays, watching neural networks show off their endless accumulation of new tricks. There are, as I see it, at least two good reasons to be impressed:
(1) Neural networks can learn to model many natural functions well, from weak priors.
The idea of marrying hierarchical, distributed representations with fast, GPU-optimised gradient calculations has turned out to be very powerful indeed. The early days of neural networks saw problems with local optima, but the ability to train deeper networks has solved this and allowed backpropagation to shine through. After baking in a small amount of domain knowledge through simple architectural decisions, deep learning practitioners now find themselves with a powerful class of parameterised functions and a practical way to optimise them.
The first such architectural decisions were the use of either convolutions or recurrent structure, to imbue models with spatial and temporal invariances. From this alone, neural networks excelled in image classification, speech recognition, machine translation, atari games, and many more domains. More recently, mechanisms for top-down attention over inputs have shown their worth in image and natural language tasks, while differentiable memory modules such as tapes and stacks have even enabled networks to learn the rules of simple algorithms from only input-output pairs.
(2) Neural networks can learn surprisingly useful representations
While the community still waits eagerly for unsupervised learning to bear fruit, deep supervised learning has shown an impressive aptitude for building generalisable and interpretable features. That is to say, the features learned when a neural network is trained to predict P(y|x) often turn out to be both semantically interpretable and useful for modelling some other related function P(z|x).