Integration and Mean Value Theorems

The problem posed in the basic study of integration is to find for the graph of a particular function f(x), the area under the curve of f between two points. As seen here:

The process is to take a set of intervals (x_{i} - x_{i-1}) = \Delta x along a segment of the x-axis, and treat each such interval as the base of a rectangle, with the value for f(x_{i}) as the height of our rectangle, then summing each rectangle allows us to calculate the approximate area under the curve. An increase in exactitude can be achieved as we reduce the base of each rectangle so that “intuitively” no rectangle exceeds or “dips” under the height of the curve .

As can be seen here the rectangles can extend above and below the x-axis, but the result is the same the smaller we set the base of our rectangles, the more accurately we approximate the space between the axis and the curve. So at first glance this technique seems merely about calculating an area A. Formally we denote this as follows:

Let a, b be two points on the x-axis and f, then we divide the segment as follows:

a = x_{0} < x_{1} < x_{2} < x_{3} .... x_{n-1} < x_{n} = b

and we set x_{i} - x_{i-1} = \Delta x as before. Now to calculate the area of those rectangles we need to define the summation of the products:

A = \sum\limits_{i}^{n} f(x_{i}) \Delta x

But this will be inexact unless we shrink the base of these rectangles i.e. \Delta x as described above. To achieve this last specification we simply take the limit of A as \Delta x approaches zero. This process is called integration. Formally:

\int\limits_{a}^{b}f(x_{i}) \Delta x = \lim_{\Delta \to 0} \sum\limits_{i}^{n} f(x_{i}) \Delta x

More specifically this is a definite integral since we have specified the bound a, b but we need not. If instead we treat the bounds as variable then we still have a function known as the indefinite integral.

The Summation/Difference Property:

We want to show that:

\int\limits_{a}^{b} [f(x) + g(x)] \Delta x = \int\limits_{a}^{b} f(x) \Delta x + \int\limits_{a}^{b} g(x) \Delta x

 but this is almost self evident as we can see:

\lim_{\Delta \to 0} [ \sum\limits_{i}^{n} f(x_{i}) \Delta x] + \lim_{\Delta \to 0} [ \sum\limits_{i}^{n} g(x_{i}) \Delta x ]

is by the distributive properties of limits equivalent to:

\lim_{\Delta \to 0} [ \sum\limits_{i}^{n} f(x_{i}) \Delta x] + [ \sum\limits_{i}^{n} g(x_{i}) \Delta x]

but by distribution and rearranging we get:

\lim_{\Delta \to 0} \sum\limits_{i}^{n} [f(x_{i}) + g(x_{i})] \Delta x

which is exactly:

\int\limits_{a}^{b} [f(x) + g(x)] \Delta x

as desired. An exactly analogous proof applies to show that \int distributes over the difference of two functions.

The Mean Value Theorem(s):

First note that for any continuous function there is a maximum and minimum point on the interval i.e. there exists a point at the lowest or highest depth of the curve. Now suppose f is defined in the closed interval [a. b] and is differentiable in the open interval (a, b), we will show that if f(a) = f(b) = k, then there is a point c \in (a, b) such that f'(c) = 0. This result is known as Rolle’s Theorem.

 

There are three cases (i) where f(x) = k is defined as a constant function with value k and the limit of a constant function collapses to 0 at all points, so this would prove the theorem.

If on the other hand (i) m< k or (ii) k > M then there is slightly more work to be done. There are two sub-cases but both are analogous so we prove (ii): Suppose k > M, then pick a point c such that f(c) = M. This satisfies the hypothesis so it remains to show that f'(c) = 0. We do this by reductio, but first we prove a small lemma.

Lemma: If f'(p) is positive (negative) then f is increasing(decreasing) at a neighborhood of p.

Assume positive (the negative case is analogous), then we know by work here that the the limit of the quotient minus the derivative is equivalent to 0, so rewriting in terms of \epsilon, \delta can take any \epsilon > 0 and see that:

-\epsilon < | \dfrac{f(p+ \Delta x) - f(p)}{\Delta x} - f'(p) | < \epsilon

When |\Delta x| < \delta. Let f'(p) = \epsilon, then add f'(p) to all sides to get:

0 < \dfrac{f(p + \Delta x) - f(p)}{\Delta x} < 2f'(p)

 which ensures that the difference quotient is positive, and both numerator and denominator must have the same sign. It falls out that: f(p - \Delta x) < f(p) < f(p + \Delta x). Hence f is an increasing function in the neighborhood around p. This completes the proof.

Rolle’s Theorem

To complete Rolle’s theorem we need only consider the case that f'(c) \neq 0, either f'(c) is positive or negative, and then by the above lemma, f is either decreasing or increasing in the continuous p-neighborhood. In either case M would not be the maximum or extreme point on the interval. This is a contradiction. Hence f'(c) = 0 as desired. An analogous strategy works for the case (i). Geometrically this theorem states that the slope of the curve vanishes at some point in the interval between f(a) and f(b), which means the slope becomes briefly horizontal as can be seen in the picture above.

The Mean Value Theorem (Differentiation)

Let f be continuous in the closed interval [a, b ] and differentiable, with the derivative f' in the open interval (a, b). Then there is a point c in (a, b) such that:

f'(c) = \dfrac{f(b) - f(a)}{b -a } = f(b) - f(a) = f'(c)(b - a)

This can be most intuitively thought of as saying that if f tracks movement of an object across a plane between two points a and b, then there is a point on it’s trajectory of the function’s equivalent to the average rate of ascent. If you travel between A and B 100 miles apart and arrive in an hour, then at some point on your journey you were traveling at one hundred miles an hour. Here the claim is that the instantaneous rate of change at point c is the same as the average rate of change between the two points a and b. Geometrically it means that the slope of the curve at some point must match the slope of the secant line.

The Proof: 

Define a function g(x) = f(x) - kx, where k is a constant so that g(a) = g(b). Hence,  g(a) = f(a) - ka = f(b) - kb = g(b). We can solve for k. Take

f(a) - ka = f(b) - kb

adding kb to each side we get:

f(a) - ka + kb = f(b)

subtracting -f(a) becomes:

-ka + kb = f(b) - f(a)

from which it follows:

k(-a + b) = f(b) - f(a)

which is equivalent to:

k(b-a) = f(b) - f(a)

then dividing by (b - a) and canceling we get:

k = \dfrac{f(b) - f(a)}{b - a}

ensuring that k is equivalent to the slope of the secant line. We must show that there is a the tangent line on the curve of f equivalent to k, if we are to prove the mean value theorem. That is we must find a point c \in (a b) such that f'(c) = k

But then note that since g satisfies the hypothesis of Rolle’s theorem, so we have a point g'(c) = 0 Differentiating we see that

g'(c) = [f'(c) - (k c')] = 0

but the derivative c is 1 so this collapses into

g'(c) = [f'(c) - k(1)] = 0

from which we can infer that

f'(c) = k = \dfrac{f(b) -f(a)}{b -a}

so we have shown that an appropriate point c \in (a, b) exists as desired.

The Mean Value Theorem: Integration

We want to prove that if f is continuous in the closed interval [a, b], then there is a point c \in [a, b] such that

\int_{a}^{b} f(x) \Delta x = f(c)(b - a)

We begin the proof by letting m and M be defined as the minimum and maximum points on the curve as in the last proof. We are looking for the average value of our integrand function defined on the the interval [a, b ]. First let:

\tau = \sum_{i = 1}^{n} f(x_{i}) \Delta x

then since

m \Delta x \leq f(x_{i}) \Delta x \leq M \Delta x

 it follows:

\sum\limits_{i}^{n} m \Delta x \leq \tau \leq \sum\limits_{i}^{n} M \Delta x

which by definition and distribution is just:

m(b -a) \leq \tau \leq M(b - a)

then taking the limit as \Delta x approaches zero adds nothing, so:

m(b - a) \leq \int_{a}^{b} f(x) \Delta x \leq M(b-a)

then dividing by (b-a) we get:

m \leq \dfrac{\int_{a}^{b} f(x) \Delta x}{b-a} \leq M

which is equivalent to:

m \leq \dfrac{1}{b-a} \int_{a}^{b} f(x) \Delta x \leq M

Let’s call this point f(c) \in [m, M]. Then by the intermediate value theorem (proof easy and omitted), there is a point c \in [a, b]. Geometrically the mean value theorem for integrals can be understood as stating that for any area under the curve of the line there is a rectangle defined at the average height f(c) with respect to [a, b] such that the area of it is equivalent to the area under the total curve defined by f. In a picture:

The area in excess of the height f(c) is “compensated” for by inclusion of the novel sections defined by our rectangle. To see this precisely: note that [ \dfrac{1}{b-c} \int_{a}^{b} f(c) \Delta c]\dfrac{b-c}{1} cancels to give us \int_{a}^{b} f(c) \Delta c as desired. This concludes the section; in the next post we will prove the first fundamental theorem of calculus.

Narrative and Characterisation in Proof – The Case against Sontag

We’ll argue that like good narratives, the better proofs tend towards the surprising or the inevitable. The latter proofs have a pleasing finality. The former make you doubt your own capacities; so shocking are the methods or the result that you cannot quite believe another human mind formulated such a conclusion. Nevermind, proved it!

In both cases the proof must make you confident of the connection between the premises  and conclusion. We must be left with a concrete impression that the narrative has been played out, and the players retired. The details of each step can be made more or less explicit so long as they are suitably suggestive of the narrative intended by the author, and its culmination is recognised as an ending.

In short, we want to “show not tell” but the details we do show have to be “telling”.

We owe an example here. Given that we have spent the last few posts proving properties of \lim, we shall now show that the underlying concept can be characeristed in terms of converging sequences.

Converging Sequences

To do this we need to introduce a new character. A sequence \{x_{n}\} is a list of n numbers. We say that \{x_{n}\} converges to x_{0} if and only if the distance between x_{n}, x_{0} tends to 0 as n gets increasingly larger.

This can be put as follows:

x_{0} is the accumulation point of a sequence \{x_{n}\} if and only if for every positive measure of closeness  \epsilon to x_{0} (no matter how small) there exists a natural number N, such that if n > N, we have |x_{n} - x_{0}| < \epsilon

since this definition ensures that the distance between x_{n}, x_{0} becomes vanishingly small as the sequence increases.

First Steps: A Suggestive Fact

Now as sequences enter our narrative their features become relevant. So we might come to suggest that the sum of two sequences is the same as the sum of their accumulation points. Formally:

\{ x_{n} + y_{n} \} \rightarrow x+y

We calcify this suggestion by a quick proof using the triangle inequality of absolute values. Assume the accumulation points x, y exist for our sequences. See first that

|(x_{n} + y_{n}) - (x + y)| = |(x_{n} - x) + (y_{n} - y)| \leq |x_{n} - x|+ |y_{n} - y|,

then appealing to the definition of a sequence we need to pick an \epsilon > 0. Now by assumption there are \epsilon_{1}, \epsilon_{2} measures of closeness. We specify these so that there sum is equal \epsilon, i.e. \epsilon_{1} = \epsilon_{2} = \epsilon/2. Then we have

|x_{n} - x| < \epsilon/2 (\forall n > N_{x}) \\ |y_{n} - y| < \epsilon/2 (\forall n > N_{y})

Now we pick N = max\{N_{x}, N_{y}\} We need to show that that the the distance between the sum of our nth terms  and the sum our accumulation points is strictly less than \epsilon. But by our choice of \epsilon_{1}, \epsilon_{2} we know this to be true, since

 |(x_{n} - x) + (y_{n} - y)| \leq |x_{n} - x|+ |y_{n} - y| < \epsilon/2 + \epsilon/2 = \epsilon

as desired.

If this looks familiar, we’re on the right course. The lemma above should be suggestive of the summation property of limits since both proofs rely on similar manipulations of the \epsilon distance measure. Only now, with this familiarity established, can we make a compelling suggestion about the relationships between limit points and accumulation points. This hint, like the revealed motive of a shadowy villain in a spy thriller sheds light on the character.

The Characterisation Theorem

The most memorable characters tend to lodge themselves in our mind as personalities, we can ask what would Jesus do, and have concrete notions as to how he would act in any fanciful circumstance. This is to say, that somewhere along the way we internalise the idea of Jesus the person. No matter what you think of that character, you think of the individual as a character equipped with various capacities and propensities. Whatever is true for gods is true for mathematical objects too.

A characterisation theorem supplies a way to describe the mathematical behavior of a poorly understood notion in terms of the limitations of a well understood notion. In this way we make new questions tractable. What would Jesus do? How do sequences behave under composition? Both questions presuppose characterisation.

Fortunately, in the case of sequences, we can provide one:

For f:X \rightarrow Y with x_{0} an accumulation point in X,  then \lim_{x \to x_{0}}f(x) = y_{0}  if and only if \text{ for all } \{ x_{n} \} \rightarrow x_{0} \text{ then } \{f(x_{n}) \} \rightarrow y_{0}

The idea is to define the limit of a function in terms of the relationship between two convergent sequences. We prove each direction separately.

\Rightarrow

First suppose that (i) \lim_{x \to x_{0}}f(x) = y_{0}  and (ii) \text{ for all } \{ x_{n} \} \rightarrow x_{0}

We want to show that \{f(x_{n}) \} \rightarrow y_{0}. In other words,  for any distance \epsilon to x there exists a natural number N_{\delta}, such that if n > N_{\delta}, we have |f(x_{n}) - y_{0}| < \epsilon.

Let \epsilon > 0.

Then by (i) we know that when 0 < |x - x_{0}| < \delta_{1} there is an \epsilon_{1}-distance, arbitrarily small, such that |f(x) - y_{0}| < \epsilon_{1}.

and by (ii) it follows, in particular that we can find an n > \delta_{1} , which ensures |x_{n} - x_{0}| < \epsilon

Now by these two facts we can specify N_{\delta}  = \delta_{1}, and the claim follows by the fact that the absolute value of |f(x_{n}) - y_{0}| ~ |x_{n} - x_{0}| < \epsilon. This proves the right to left direction.

\Leftarrow

For this direction we prove the contrapositive. That is to say we need to show that:

If \lim_{x \to x_{0}}f(x) \neq y_{0} then \exists \{ x_{n} \} \rightarrow x_{0} such that \{ f(x_{n}) \} \nrightarrow y_{0}

So assuming the antecedent we know that:

\neg\forall\epsilon(\epsilon > 0)\exists\delta(\delta > 0) such that if 0 < |x - x_{0}| < \delta, then |f(x) - y_{0}| < \epsilon

Rolling negation through the quantifiers we get:

\exists\epsilon(\epsilon > 0)\forall\delta(\delta > 0) such that \exists (x) where 0 < |x - x_{0}| < \delta, then |f(x) - y_{0}| \geq \epsilon

Pick some \epsilon > 0. To see that the sequence \{ f(x_{n}) \} does not converge to y_{0} we need to see that

There is a positive measure of closeness  \epsilon to x for all natural numbers N, such that there exists n > N, we have |f(x_{n}) - y_{0}| \geq \epsilon

Now consider a sequence of \delta-distance measures  \{ \delta_{ 1/n} \} around x_{0} By our construction we have a sequence where \{ x_{n} \} \rightarrow x_{0} such that we have for every positive n > 1/n and by our initial assumption |f(x_{n}) - y_{0}| \geq \epsilon. This proves our claim and finishes the proof.

The Ramifications of the Reveal

This is not a difficult proof, but its primary benefit is that it confirms an implicit suspicion that most readers will have when encountering the \epsilon , \delta definition of limits, that there must be a more intuitive way! In the same way that you might find it difficult to relate to Darth Vader before the “Daddy” revelation, \epsilon , \delta proofs are harder to comprehend before their characterisation in terms of convergent sequences.

But once the reveal occurs, the behavior of sequences is more predictable. They are seen to be well behaved under the standard arithmetic operations. This is a direct corollary of the fact that \lim respects the standard algebraic operations. Similarly Vader’s temptation of his son is more excusable than a faceless bad guy’s  attempted seduction of Skywalker. The insight of our proof is not surprising, but given our suggestive fact about the sum of sequences it has a pleasing inevitability and provides some conceptual clarification of the nature of \epsilon , \delta statements. Because characterization determines expectation, we expect the Scorpion to sting the frog in Aesop’s fable and we are somehow sadistically pleased to see the result. Similarly once a mathematical entity has been characterised they become familiar and reassuringly predictable.

Mathematics as Art

I have spent far too much ink on this result, but the point I want to make relates not to this particular proof, but proof in general. While here I explained the relevance of each connection, it is neater and more engaging to leave a proof somewhat inexplicit. The sequence of statements should be suggestive and coaxing; they should inspire intellectual curiosity rather than simply satiate the readers desire for knowledge. Seen as a work of art a proof demands engagement first, rather than mute appreciation.

In contrast Susan Sontag has argued that works of art should not be interpreted, the choice and motivation of the artist are not particularly relevant to the experience of artistic appreciation. This is not true in the case of mathematical proof.

She writes:

…interpretation is the revenge of the intellect upon art. Even more. It is the revenge of the intellect upon the world. To interpret is to impoverish, to deplete the world – in order to set up a shadow world of “meanings”…It is always the case that interpretation of this type indicates a dissatisfaction (conscious or unconscious) with the work, a wish to replace it by something else. (Against Interpretation)

Considered in the case of mathematical beauty this view is strikingly silly. Firstly the quote assumes an availability of a plethora of “meanings”, but in mathematical proof the semantics of each statement are much more rigidly defined than, for instance, in a novel. The hunt for meaning in mathematics is simply the search for understanding. It is certainly not some dramatic pursuit of vengeance upon an artist presumptuous enough to try and imbue their own creation with a purely personal significance. It is a conversation with an artist who happens to be capable of cogently discussing their own work.

While I am sympathetic to view that artistic criticism is often just windy rhetoric coupled with inane interpretive efforts, there is no call for the wholesale renunciation of interpretation as a method of artistic engagement.

Free Indirect Style

It’s perhaps unfair to Sontag, who wrote in a very different context, but her call for an “erotics of art” and the dismissal of interpretive endeavor risks ignorance of the arts which are premised upon the audience’s interpretive engagement.

Where artists rely on the audience’s engagement, there can be a beautiful dynamic by which the artist coaxingly leads their audience to the intended revelation, and inspires the onset of understanding. This is true as much in good fiction as it is in mathematics. Although the techniques are markedly different, and much more subtle in fiction.

The best argument to this effect is described by James Wood in the book How Fiction WorksHe writes of the narrative technique called indirect free style which allows the voice of the narrator to inhabit various roles (e.g.  reliable, unreliable narrator or foil of a particular characters) as the narrative demands. Each role is freely adopted to better communicate an aspect of character or situation so as to make both more concrete and relatable.

Free indirect style is at its most powerful when hardly visible or audible: “Ted watched the orchestra through stupid tears.” In [this] example, the word “stupid” marks the sentence as written in free indirect style. Remove it and we have standard reported thought: “Ted watched the orchestra through tears.” The addition of the word “stupid” raise the question: Whose word is this? It’s unlikely [he] would want to call [his] character stupid for merely listening to some music in a concert hall. No, in a marvelous alchemical transfer the word now belongs partly to Ted. (How Fiction Works, pg10)

It is, in part ,Ted’s judgement of his own tears that allows for this evaluation. So in this description the author hints at a greater emotional and intellectual depth to the character of Ted. It becomes true of him that he would be embarrassed by such needless emotional displays. We know more now about him than we did before. Ted is more predictable, more relatable because we have discovered a truth about him. This kind of revelation is only inspired by a profound intellectual attention to technique, usage of language and engagement with an imagined intent of the writer.

What is true of fiction is true too of proof. Sontag’s call for an “erotics of art” fails here, we suggest, simply because neither are particularly erotic. The primary aesthetic value comes from understanding a proof, not merely appreciating that it holds. This is not to say that understanding is the only aesthetic value, or even that understanding is the primary such value in all domains. As there are many kinds of art, there are many ways to enjoy Art, but a proof is nothing but ugly scribblings without interpretation and understanding.

Conclusion

To write a stylistically good proof, we must write to be understood. We must write to inspire curiosity and engagement; we should aim for nothing less. As a consequence we are appealing to the desire of the reader to test their own suspicions. Whether they want to be confirmed or overthrown, a good proof will have to be surprising or tinged with a sense of inevitability. Proof should be written like noir fiction, there should be a hinted mystery that allows the reader to engage, and each subsequent step should prompt further speculation while simultaneously leading us to a reveal, inevitable in retrospect.

Limits at Infinity: Some Facts

We now wish to examine a special case of how the \lim behaves with \infty. To see this we have to modify our traditional definition of \lim a little bit. If we say f(x) has an infinite limit then

\lim_{x \to x_{0}}[f(x)] = \infty \leftrightarrow \forall m (m > 0) \exists\delta > 0\text{ such that } f(x) > m \text{ if } 0 < |x - x_{0}| < \delta

and we say f(x) has a limit at infinity if…

\lim_{x \to \infty}[f(x)] = A \leftrightarrow \forall \epsilon (\epsilon > o) \exists m > 0 \text{ such that } |f(x) - A| < \epsilon \text{ if } x > m.

Natural analogues of these definitions hold for - \infty. Now we shall prove a few results to demonstrate the utility of this definition with regard to asymptotic functions.

From here on out, assume that \lim_{x \to x_0}[f(x)] = \infty and \lim_{x \to x_{0}}[g(x)] = B.

Fact(1): Summation and Difference

\lim_{x \to x_{0}}[f(x) \pm g(x)] = \infty

To show this note that our assumptions imply (a)  that f(x) > \epsilon_{1} \forall(\epsilon_{1} > 0) \text{ if } 0 < |x - x_{0}| < \delta_{1} > 0. and we also know that (b) |g(x) - B| < \epsilon_{2} \text{ if } 0 < |x - x_{0}| < \delta_{2} > 0.

Let m > 0. Assume \delta exists such that 0 < |x - x_{0}| < \delta, then we need to show that f(x)+g(x) > m.

Set \epsilon_{1} = m - B+1 and let \epsilon_{2} = 1. Pick \delta = min(\delta_{1}, \delta_{2}) then by assumption we know that:

0 < |g(x) - B| < 1

from which it follows:

-1 < |g(x) - B| < 1

then by adding B to both sides:

B -1 < |g(x) -B| < B+1

which is equivalent to:

B- 1 < |g(x) +0|< 1

from which it follows:

|g(x)| > B-1

Substituting this information into our sum, we have:

f(x)+g(x) > m-B+1+B-1

which cancels to give us:

f(x)+g(x) > m

as desired. This completes the proof of addition. The proof for subtraction is almost identical.

Fact(2):Products

Let m > 0, and our assumptions be as above. Then we pick \epsilon_{1} = \dfrac{2m}{B} and \epsilon_{2} = \dfrac{B}{2} and \delta = min(\delta_{1} \delta_{2}) These specifications are positive numbers since B is positive by assumption.

We need to show that g(x) > \dfrac{B}{2}. The method is analogous to the method above since:

 0 < |g(x) - B| < \dfrac{B}{2} entails that -\dfrac{B}{2} < g(x) - B < \dfrac{B}{2}

By adding \dfrac{B}{2}+\dfrac{B}{2} to all sides, it follows that:

g(x) > \dfrac{B}{2}

Then substituting these restrictions into our product and canceling:

f(x)g(x) >\dfrac{2m}{B}\dfrac{B}{2} = m

as desired. This completes the proof.