Chain rule for single variable functions: Difference between revisions

From Applied Science
No edit summary
No edit summary
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
The chain rule is, intuitively, a product of two derivatives. Suppose that we have a person walking at a speed of 2 m/s. Now suppose we have a train moving at 20 m/s in the same direction as the person. The train is obviously 10x faster than the walking person, if we are measuring in respect to the static ground. Now imagine that the person is walking at 2 m/s while inside the train moving at 20 m/s. What's the speed of the person? We have a physics problem here, because it really depends on whether we want the speed in respect to the ground or to the train.
'''The chain rule is, intuitively, a product of two derivatives.''' Suppose we have three moving things: A, B and C and their respective speeds are A > B > C. If we know how many times the speed of A is greater in comparison to B and B in comparison to C, then we can know how many times A is in comparison to C too. What we have to do is to multiply the ratios between A and B and between B and C. This is the example in [https://en.wikipedia.org/wiki/Chain_rule Wikipedia's article].


From the point of view of mathematics we have a composite function to describe the previously mentioned motion. Because the speed of the person is a ratio space / time and for each unit of it, it compounds with the motion of the train. In other words, we have a product of ratios. That's precisely the idea of the chain rule when written with the Leibniz's notation:
Another example. In meteorology there is the [https://en.wikipedia.org/wiki/Lapse_rate Lapse rate] which is the variation in temperature according to how high we are in the atmosphere. It's the ratio °C / km. If we fly up or down we experience changes in temperature because we are moving in respect to each level of temperature in the atmosphere. The other way around doesn't happen because if we stay put the atmosphere won't move up or down in respect to us. Flying faster naturally yields a faster change in temperature. Our speed is a ratio km / time. If we want the ratio °C / time we have to do the product <math>\frac{^{\text{o}}C}{km} \frac{km}{time} = \frac{^{\text{o}}C}{time}</math>.


<div style="text-align:center;>
<div style="text-align:center;>
Line 7: Line 7:
</div>
</div>


The definition of a function states that we have one variable that depends upon another. In the case of composite functions, the value of one function depends on the value of the other function. Extending it to rates of change and the rate of change of one function depends on the rate of change of the other function. That's why we have that <math>h(x) = f(g(x))</math>:
'''It's important to highlight one thing:''' the above example of the atmosphere's temperature is a linear case ''(assuming we're flying at constant speed)'', which translates to <math>T_1'(t) = T_2'(h) \cdot h'(t)</math>. Where <math>T_1'(t)</math> is the variation in temperature over time and the right side is the product of the variation of the temperature over height by the variation of height over time. Notice that the function that goes nested in <math>T_2'(h)</math> is the function that gives the ratio height / time. Notice that there are two different rates of change.


<div style="text-align:center;>
<div style="text-align:center;>
Line 13: Line 13:
</div>
</div>


We can have any number of functions nested within another. The rule still holds and the name comes from the fact that we have a chain of operations, a chain of derivatives.
We can have any number of functions nested within another. The rule still holds and the name comes from the fact that we have a chain of operations, a chain of derivatives.  
 
'''Note:''' sometimes we have composite functions but we don't see them clearly. For example: <math>y = \sin^2(x)</math>. We can see that we have a product <math>y = \sin(x) \sin(x)</math>, but we could also see it as <math>y = x^2</math> and <math>x = sin(x)</math>. In a more conventional notation: <math>f(x) = x^2</math>, and <math>g(x) = \sin(x)</math> and <math>f \circ g = \sin^2(x)</math>. This is specially common with implicit differentiation.
 
==Graphical reasoning for the chain rule==
 
I don't know about textbooks that show a graphical interpretation for the chain rule. Let's consider <math>f(x) = 3x</math> and <math>g(x) = x^2</math>. The graph of the former is a straight line and the constant factor is the angular coefficient, greater meaning a stepper inclination. The latter is a parabola. The first has a constant rate of change, the second does not.
 
<div style="text-align:center;">
[[file:chain_rule_graph.png]]
</div>
 
The graph of <math>g(f(x)) = (3x)^2</math> has a greater rate of change than the graph of <math>g(x) = x^2</math>. Think about this: if we choose <math>x = 2</math> the rates of change are, at that point and for each function, <math>f'(2) = 6</math> and <math>g'(2) = 4</math>. For the composite function we have <math>g'(f(x)) = f'(2)g'(f(2)) = 6 \cdot 2 \cdot 3 = 36</math>. I did this simple example with positive numbers but the chain rule holds for negative numbers and for more complicated functions.
 
'''Note:''' in this specific case we could have used the product rule. Or even faster, the power rule.
 
==Proof of the chain rule==
 
It's natural to think that the derivative of the composite function is the composition of the derivatives. It's the same intuition that commonly happens with the product and quotient rules. When we have a composition, the value of one function depends on the value of the other. We can be easily fooled and think that the derivative of <math>f(g(x))</math> is <math>f'(g'(x))</math>. Mathematically this doesn't make sense because we just swapped a function by its derivative. Who said that it's right to replace a function by its derivative and expect the result of this operation to be meaningful? Who said that the rate of change of <math>f</math> depends on the rate of change of <math>g</math>?
 
The problem of finding the tangent line describes how a differentiable function can be seen as a linear function if we consider a small enough interval around a point. Let's begin by defining two affine functions:
 
<math>f(x) = ax + b</math><br />
<math>g(x) = cx + d</math>
 
Let's take a look at:
 
<math>f(g(x)) = ag(x) + b</math><br />
<math>f(cx + d) = a(cx + d) + b</math><br />
<math>f \circ g = acx + ad + b</math>
 
Did you notice the product between the angular coefficients, <math>a \cdot c</math>? If we differentiate the expression <math>acx + ad + b</math> in respect to <math>x</math>, the operation yields <math>ac</math>! Surprise! That's not a formal proof though. The fundamental idea behind it is that if the function is differentiable, then near some point of it we can treat it as a linear function.
 
'''Links for the proof:'''
* https://tutorial.math.lamar.edu/classes/calci/DerivativeProofs.aspx
* https://en.wikipedia.org/wiki/Chain_rule
* https://math24.net/chain-rule.html
* https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-1b/a/proving-the-chain-rule
* https://www.cuemath.com/chain-rule-formula/

Latest revision as of 00:18, 24 August 2022

The chain rule is, intuitively, a product of two derivatives. Suppose we have three moving things: A, B and C and their respective speeds are A > B > C. If we know how many times the speed of A is greater in comparison to B and B in comparison to C, then we can know how many times A is in comparison to C too. What we have to do is to multiply the ratios between A and B and between B and C. This is the example in Wikipedia's article.

Another example. In meteorology there is the Lapse rate which is the variation in temperature according to how high we are in the atmosphere. It's the ratio °C / km. If we fly up or down we experience changes in temperature because we are moving in respect to each level of temperature in the atmosphere. The other way around doesn't happen because if we stay put the atmosphere won't move up or down in respect to us. Flying faster naturally yields a faster change in temperature. Our speed is a ratio km / time. If we want the ratio °C / time we have to do the product [math]\displaystyle{ \frac{^{\text{o}}C}{km} \frac{km}{time} = \frac{^{\text{o}}C}{time} }[/math].

[math]\displaystyle{ \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} }[/math]

It's important to highlight one thing: the above example of the atmosphere's temperature is a linear case (assuming we're flying at constant speed), which translates to [math]\displaystyle{ T_1'(t) = T_2'(h) \cdot h'(t) }[/math]. Where [math]\displaystyle{ T_1'(t) }[/math] is the variation in temperature over time and the right side is the product of the variation of the temperature over height by the variation of height over time. Notice that the function that goes nested in [math]\displaystyle{ T_2'(h) }[/math] is the function that gives the ratio height / time. Notice that there are two different rates of change.

[math]\displaystyle{ h'(x) = g'(x) \cdot f'(g(x)) }[/math]

We can have any number of functions nested within another. The rule still holds and the name comes from the fact that we have a chain of operations, a chain of derivatives.

Note: sometimes we have composite functions but we don't see them clearly. For example: [math]\displaystyle{ y = \sin^2(x) }[/math]. We can see that we have a product [math]\displaystyle{ y = \sin(x) \sin(x) }[/math], but we could also see it as [math]\displaystyle{ y = x^2 }[/math] and [math]\displaystyle{ x = sin(x) }[/math]. In a more conventional notation: [math]\displaystyle{ f(x) = x^2 }[/math], and [math]\displaystyle{ g(x) = \sin(x) }[/math] and [math]\displaystyle{ f \circ g = \sin^2(x) }[/math]. This is specially common with implicit differentiation.

Graphical reasoning for the chain rule

I don't know about textbooks that show a graphical interpretation for the chain rule. Let's consider [math]\displaystyle{ f(x) = 3x }[/math] and [math]\displaystyle{ g(x) = x^2 }[/math]. The graph of the former is a straight line and the constant factor is the angular coefficient, greater meaning a stepper inclination. The latter is a parabola. The first has a constant rate of change, the second does not.

The graph of [math]\displaystyle{ g(f(x)) = (3x)^2 }[/math] has a greater rate of change than the graph of [math]\displaystyle{ g(x) = x^2 }[/math]. Think about this: if we choose [math]\displaystyle{ x = 2 }[/math] the rates of change are, at that point and for each function, [math]\displaystyle{ f'(2) = 6 }[/math] and [math]\displaystyle{ g'(2) = 4 }[/math]. For the composite function we have [math]\displaystyle{ g'(f(x)) = f'(2)g'(f(2)) = 6 \cdot 2 \cdot 3 = 36 }[/math]. I did this simple example with positive numbers but the chain rule holds for negative numbers and for more complicated functions.

Note: in this specific case we could have used the product rule. Or even faster, the power rule.

Proof of the chain rule

It's natural to think that the derivative of the composite function is the composition of the derivatives. It's the same intuition that commonly happens with the product and quotient rules. When we have a composition, the value of one function depends on the value of the other. We can be easily fooled and think that the derivative of [math]\displaystyle{ f(g(x)) }[/math] is [math]\displaystyle{ f'(g'(x)) }[/math]. Mathematically this doesn't make sense because we just swapped a function by its derivative. Who said that it's right to replace a function by its derivative and expect the result of this operation to be meaningful? Who said that the rate of change of [math]\displaystyle{ f }[/math] depends on the rate of change of [math]\displaystyle{ g }[/math]?

The problem of finding the tangent line describes how a differentiable function can be seen as a linear function if we consider a small enough interval around a point. Let's begin by defining two affine functions:

[math]\displaystyle{ f(x) = ax + b }[/math]
[math]\displaystyle{ g(x) = cx + d }[/math]

Let's take a look at:

[math]\displaystyle{ f(g(x)) = ag(x) + b }[/math]
[math]\displaystyle{ f(cx + d) = a(cx + d) + b }[/math]
[math]\displaystyle{ f \circ g = acx + ad + b }[/math]

Did you notice the product between the angular coefficients, [math]\displaystyle{ a \cdot c }[/math]? If we differentiate the expression [math]\displaystyle{ acx + ad + b }[/math] in respect to [math]\displaystyle{ x }[/math], the operation yields [math]\displaystyle{ ac }[/math]! Surprise! That's not a formal proof though. The fundamental idea behind it is that if the function is differentiable, then near some point of it we can treat it as a linear function.

Links for the proof: