Chain rule for single variable functions
The chain rule is, intuitively, a product of two derivatives. Suppose that we have a person walking at a speed of 2 m/s. Now suppose we have a train moving at 20 m/s in the same direction as the person. The train is obviously 10x faster than the walking person, if we are measuring in respect to the static ground. Now imagine that the person is walking at 2 m/s while inside the train moving at 20 m/s. What's the speed of the person? We have a physics problem here, because it really depends on whether we want the speed in respect to the ground or to the train.
From the point of view of mathematics we have a composite function to describe the previously mentioned motion. Because the speed of the person is a ratio space / time and for each unit of it, it compounds with the motion of the train. In other words, we have a product of ratios. That's precisely the idea of the chain rule when written with the Leibniz's notation:
[math]\displaystyle{ \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} }[/math]
The definition of a function states that we have one variable that depends upon another. In the case of composite functions, the value of one function depends on the value of the other function. Extending it to rates of change and the rate of change of one function depends on the rate of change of the other function. That's why we have that [math]\displaystyle{ h(x) = f(g(x)) }[/math]:
[math]\displaystyle{ h'(x) = g'(x) \cdot f'(g(x)) }[/math]
We can have any number of functions nested within another. The rule still holds and the name comes from the fact that we have a chain of operations, a chain of derivatives.