Chain rule for multivariable functions: Difference between revisions

From Applied Science
No edit summary
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
where <math>\eta \to 0</math> with <math> r = \sqrt{\Delta x^2 + \Delta y^2} \to 0</math>, because <math>f</math> is differentiable. The last term is about error when we use a linear approximation. Now since we are differentiating in respect to <math>t</math>:
where <math>\eta \to 0</math> with <math> r = \sqrt{\Delta x^2 + \Delta y^2} \to 0</math>, because <math>f</math> is differentiable. The last term is about error when we use a linear approximation. Now since we are differentiating in respect to <math>t</math>:


<math>\frac{\Delta f}{\Delta t} = \frac{\partial f}{\Delta x} \cdot \frac{\Delta x}{\Delta t} + \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta t} \pm \sqrt{\left(\frac{\Delta x}{\Delta t}\right)^2 + \eta\left(\frac{\Delta y}{\Delta t}\right)^2}</math>
<math>\frac{\Delta f}{\Delta t} = \frac{\partial f}{\Delta x} \cdot \frac{\Delta x}{\Delta t} + \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta t} \pm \eta\sqrt{\left(\frac{\Delta x}{\Delta t}\right)^2 + \left(\frac{\Delta y}{\Delta t}\right)^2}</math>


where the sign of the last term is positive if <math>\Delta t > 0</math> and negative if <math>\Delta t < 0</math>. When we take the limit, <math>\Delta t \to 0</math>, <math>\eta \to 0</math> and the last term goes away. The resulting expression is:
where the sign of the last term is positive if <math>\Delta t > 0</math> and negative if <math>\Delta t < 0</math>. When we take the limit, <math>\Delta t \to 0</math>, <math>\eta \to 0</math> and the last term goes away. The resulting expression is:
Line 16: Line 16:
<math>\frac{d}{dt}f(x(t),y(t)) = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t}</math>
<math>\frac{d}{dt}f(x(t),y(t)) = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t}</math>
</div>
</div>
Taking a closer look, the chain rule is a dot product between the gradient and another vector, <math>(x'(t), y'(t))</math>. It's pretty similar to the directional derivative and that is no coincidence, because when we want to find rates of change in space, we have to have a direction.
Another way to see it:
<div style="text-align:center;">
<math>\frac{d}{dt}f(P(t)) = \nabla f \cdot P'(t)</math>
Where <math>P(t) = (x(t),y(t))</math> and <math>P'(t) = (x'(t),y'(t))</math>.
</div>
'''A natural question arises here:''' what can we infer from <math>\nabla f \cdot (x'(t), y'(t)) = 0</math>? From analytical geometry we know that the dot product is zero when the vectors are '''perpendicular'''. We also know that the gradient is perpendicular to a level curve. Suppose that <math>\gamma(t)</math> is a level curve, a circle for example. As we walk over the circle we keep the same function's level, the <math>z</math> coordinate in case of a function of two variables. As we know, infinitely many <math>(x,y)</math> pairs correspond to the same level. More than that, <math>(x'(t), y'(t))</math> is tangent to the level curve.
For each point on a level curve we have a tangent vector and the gradient. This means that we have a whole set of pairs of vectors for which the dot product is zero. We have a function. The reasoning in the previous paragraph can be synthetized in the following equation:
<math>F(t) = f(x(t),y(t)) = k</math> for all <math>t</math>
When <math>\frac{d}{dt}F(t) = 0</math> ?
<math>F(t) = \frac{\partial f}{\partial x}x'(t) + \frac{\partial f}{\partial y}y'(t) = \nabla f \cdot P'(t) = 0</math>.
Suppose that <math>P'(t) \neq 0</math>, this shows that the directional derivative of <math>f</math> in the direction of <math>\overrightarrow{u} = \frac{P'(t)}{||P'(t)||}</math>, tangent to the level curve, is zero:
<div style="text-align:center;">
<math>D_uf = \nabla f \cdot \overrightarrow{u} = 0</math>
</div>
With this we have shown that <math>f(P)</math> is constant when we pick up <math>(x,y)</math> pairs that belong to the same level curve.
The natural extension of the previous rule is about functions where each variable is a function of two or more variables: <math>f(x(s,t),y(s,t))</math>. For such cases the usage of substitution comes in handy to avoid losing track of the functions and variables in the process. If we do <math>x(s,t) = u</math> and <math>y(s,t) = v</math>. Then we know how to differentiate <math>f(u,v)</math> from the previous rule. In turn, we already know how to differentiate <math>u</math> and <math>v</math>.
<div style="text-align:center;">
If <math>F(s,t) = f(x(s,t),y(s,t))</math>
Then <math>\frac{\partial F}{\partial t} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial t} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial t}</math>
And <math>\frac{\partial F}{\partial s} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial s} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial s}</math>
</div>
One may have asked about <math>\frac{d F}{d(s,t)}</math>. Remember, partial derivative really means partial. There is no meaning in deriving in respect to two or more variables at the same time.

Latest revision as of 02:05, 17 May 2022

With a single variable function the chain rule tells us that [math]\displaystyle{ [f(g(x))]' = g'(x)f'(g(x)) }[/math]. For multivariable functions the idea is the same, it's still a product of derivatives. Both functions have to be differentiable for the chain rule to work. Now some textbooks have a different approach here. We have essentially two cases to treat: one is [math]\displaystyle{ f(g(t),h(t)) }[/math]; the other is [math]\displaystyle{ f(g(a,b),h(t,s)) }[/math]. One of the textbooks that I follow go for a general form [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) }[/math] is a (vector valued) function of n variables.

I'm going to begin with the easiest case [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) = (x(t), y(t)) }[/math] is a vector function or a (differentiable) curve. [math]\displaystyle{ x(t) }[/math] and [math]\displaystyle{ y(t) }[/math] are both differentiable. Before moving on to calculations, notice that any change in [math]\displaystyle{ t }[/math] is going to change the value of [math]\displaystyle{ f }[/math]. Which means that [math]\displaystyle{ f }[/math] depends, indirectly, on [math]\displaystyle{ t }[/math].

An increment [math]\displaystyle{ \Delta t }[/math] is going to produce the increments [math]\displaystyle{ \Delta x }[/math] and [math]\displaystyle{ \Delta y }[/math]. As such:

[math]\displaystyle{ \Delta f = f(x + \Delta x, y + \Delta y) - f(x,y) = \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y}\Delta y + \eta r }[/math].

where [math]\displaystyle{ \eta \to 0 }[/math] with [math]\displaystyle{ r = \sqrt{\Delta x^2 + \Delta y^2} \to 0 }[/math], because [math]\displaystyle{ f }[/math] is differentiable. The last term is about error when we use a linear approximation. Now since we are differentiating in respect to [math]\displaystyle{ t }[/math]:

[math]\displaystyle{ \frac{\Delta f}{\Delta t} = \frac{\partial f}{\Delta x} \cdot \frac{\Delta x}{\Delta t} + \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta t} \pm \eta\sqrt{\left(\frac{\Delta x}{\Delta t}\right)^2 + \left(\frac{\Delta y}{\Delta t}\right)^2} }[/math]

where the sign of the last term is positive if [math]\displaystyle{ \Delta t \gt 0 }[/math] and negative if [math]\displaystyle{ \Delta t \lt 0 }[/math]. When we take the limit, [math]\displaystyle{ \Delta t \to 0 }[/math], [math]\displaystyle{ \eta \to 0 }[/math] and the last term goes away. The resulting expression is:

[math]\displaystyle{ \frac{d}{dt}f(x(t),y(t)) = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t} }[/math]

Taking a closer look, the chain rule is a dot product between the gradient and another vector, [math]\displaystyle{ (x'(t), y'(t)) }[/math]. It's pretty similar to the directional derivative and that is no coincidence, because when we want to find rates of change in space, we have to have a direction.

Another way to see it:

[math]\displaystyle{ \frac{d}{dt}f(P(t)) = \nabla f \cdot P'(t) }[/math]

Where [math]\displaystyle{ P(t) = (x(t),y(t)) }[/math] and [math]\displaystyle{ P'(t) = (x'(t),y'(t)) }[/math].

A natural question arises here: what can we infer from [math]\displaystyle{ \nabla f \cdot (x'(t), y'(t)) = 0 }[/math]? From analytical geometry we know that the dot product is zero when the vectors are perpendicular. We also know that the gradient is perpendicular to a level curve. Suppose that [math]\displaystyle{ \gamma(t) }[/math] is a level curve, a circle for example. As we walk over the circle we keep the same function's level, the [math]\displaystyle{ z }[/math] coordinate in case of a function of two variables. As we know, infinitely many [math]\displaystyle{ (x,y) }[/math] pairs correspond to the same level. More than that, [math]\displaystyle{ (x'(t), y'(t)) }[/math] is tangent to the level curve.

For each point on a level curve we have a tangent vector and the gradient. This means that we have a whole set of pairs of vectors for which the dot product is zero. We have a function. The reasoning in the previous paragraph can be synthetized in the following equation:

[math]\displaystyle{ F(t) = f(x(t),y(t)) = k }[/math] for all [math]\displaystyle{ t }[/math]

When [math]\displaystyle{ \frac{d}{dt}F(t) = 0 }[/math] ?

[math]\displaystyle{ F(t) = \frac{\partial f}{\partial x}x'(t) + \frac{\partial f}{\partial y}y'(t) = \nabla f \cdot P'(t) = 0 }[/math].

Suppose that [math]\displaystyle{ P'(t) \neq 0 }[/math], this shows that the directional derivative of [math]\displaystyle{ f }[/math] in the direction of [math]\displaystyle{ \overrightarrow{u} = \frac{P'(t)}{||P'(t)||} }[/math], tangent to the level curve, is zero:

[math]\displaystyle{ D_uf = \nabla f \cdot \overrightarrow{u} = 0 }[/math]

With this we have shown that [math]\displaystyle{ f(P) }[/math] is constant when we pick up [math]\displaystyle{ (x,y) }[/math] pairs that belong to the same level curve.

The natural extension of the previous rule is about functions where each variable is a function of two or more variables: [math]\displaystyle{ f(x(s,t),y(s,t)) }[/math]. For such cases the usage of substitution comes in handy to avoid losing track of the functions and variables in the process. If we do [math]\displaystyle{ x(s,t) = u }[/math] and [math]\displaystyle{ y(s,t) = v }[/math]. Then we know how to differentiate [math]\displaystyle{ f(u,v) }[/math] from the previous rule. In turn, we already know how to differentiate [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math].

If [math]\displaystyle{ F(s,t) = f(x(s,t),y(s,t)) }[/math]

Then [math]\displaystyle{ \frac{\partial F}{\partial t} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial t} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial t} }[/math]

And [math]\displaystyle{ \frac{\partial F}{\partial s} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial s} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial s} }[/math]

One may have asked about [math]\displaystyle{ \frac{d F}{d(s,t)} }[/math]. Remember, partial derivative really means partial. There is no meaning in deriving in respect to two or more variables at the same time.