# Word and Object: A Retrospective

Quine’s best book Word and Object was recently republished with a new foreword and an informative preface about how Quine imagined he might update the material. Patricia Churchland works hard to situate the book in the appropriate historical milieu as a beneficial disruptive influence on philosophy, while Dagfinn Follesdal highlights how the themes of the work ought to be considered as germinal and ultimately improved on by the later Quine. The combination of these two voices allows us to see Quine’s masterpiece as both progressive and oddly primitive. With this in mind I want to review the argument of the book with an eye to the hinted improvements.

###### Language and Truth

You were once ignorant and couldn’t communicate. Later you had learned some rudimentary linguistics and much later again you came to master classical mechanics. Quine’s initial goal is to tell a story about how we get our modest beginnings to become the fully fledged citizen scientists. The proposal is to treat communication as a coordination game practiced with varying degrees of skill. Practice allows us to bring our assertive judgements into alignment by means of triangulating on objective recognisable stimuli. Red things are seen and consistently so described, pains are experienced and consistently suffered as such. In both cases we observe the reactions of others and act to be seen reacting. Eventually we become confident that objects in the world are the sources of our common experience, some are even deserving of names. The contentious point of this story involves how we move from understanding demonstratives and names which we come to appreciate directly by correlation and observation, to appreciating the full details and semantic functions of more complicated grammatical constructions. Follesdal suggests that the later Quine would argue for a common biological inheritance which guides our habit forming practices. We are prone to certain kinds of phonetic constructions, and liable to certain kinds of inductive extrapolation, both consciously and unconsciously. These capacities, commonalities and limitations provide a reliable guide and orientate us, as a species, about the same kind of stimuli. Nevertheless our focus is underdetermined to the point that predictable use is insufficient to uniquely isolate meaningful intent. Similar remarks apply to the isolation of true facts. Our working theory of the world is only ever provisional and can be amended in the same way that our operative semantic theory of language will be updated to account for deviance and innovation. The open question is how well are we equipped by our biological inheritance? How much of theory do we get to revise?

###### Translation and Meaning

The question, better put, is to what degree can we come to understand an alien language? Are we suitably adaptive that the employment of a coordination game is an appropriate basis for us to make inductive leaps about patterns of mind? Can I, when encountering you, assume that you share my focus on food? Can I fairly suppose that when you speak you speak of steak? Quine proposes the thought experiment of radical translation in which we a face a native of a impenetrable wilderness and seek to play the anthropologist. Can we, by means of repeated experiment elicit affirmative judgements about the correct use of the native’s language? We experiment on repeated instances of analogous occasions, isolated roughly in terms of location, context and duration. The constraints are subtle, for consider how long after an observation it makes sense to comment on the occasion. We cannot expect communicative impact if we delay overly long or pre-empt the event in question. With this in mind we attempt to map our frame our reference to the one employed by the native, all the while knowing that communicative success is only measured by degree of systematic predictive success and pragmatic considerations of appropriate dialogue. However such a measure of success renders it impossible to achieve a perfect mapping, and this motivates the rejection of the idea of meaning as an absolute, thereby rendering translation an impossible ideal.

Such a conclusion seems radical but the prohibition does not prevent the attainment of a functional understanding of the native’s intent. Although we shouldn’t assume that perfect translation occurs in the limit of functional coordination, since this idealises in the wrong way. There is no analogue of the $\epsilon, \text{} \delta$ considerations that would apply because we beg the question if we assume the limit exists, and that we can come “nearer to” the ideal. All candidate translation will always have a number of close neighbours which are mutually exclusive, comparably effective and hence not obviously “nearer” to the ideal. Any translation scheme will be based on more or less impure data where indicative correlations can result from the unimagined wealth of information understood by the native during an experiment. Seemingly extraneous information is ignored by the anthropologist therefore making him liable for error if the information is important for the native. We cannot guarantee that our interests always align and so any translation can suffer from irrelevant skew. Better to say that we reify the meaning of a given utterance from its repeatable role in communication, and that our system tracks an evolving usage rather than tries to pin down a fixed meaning.

Strategies for adopting a given scheme will seek to limit the exposure to impure data, maximise considerations of charitable interpretation, and achieve reliable correlations. Estimates of data purity and charitable interpretations are unavoidably at the discretion of the anthropologist. Ensuring that the meaning of an utterance is underdetermined in so far as the anthropologist is prone to mistake. Errors are best minimised by a robust conception of the native’s belief structure.

###### Semantic Vagueness

In an effort to characterise the belief structures of a native Quine argues that we ought to charitably construe their language in such a way as to attribute them basic logical competence. We may attempt to identify the truth-functional logical particles (and, or, not, etc…) by means of short experimental utterances which test the recursive capacity of the language vis a vis particular operator-like phrases. There are difficulties which emerge in fixing the translation of the categorical $( \forall F , \exists G )$ since any particular such description of a crowd could elicit affirmation for reasons of graded exactitude or an unappreciated vagueness in description or quantification. Some cultures might count up “1, 2, …. many”; the capacity for numeracy does not imply interest or rigour. Similarly, the identity relation amongst sets cannot be reliably preserved via translation since any candidate-copula may be deployed in a manner which equates sets in light of the context if the “semantic criterion makes demands beyond extension” alone. The concern is wholly general but the illustrative example involves how Lois Lane has differing expectations of Superman and Clark Kent. Imprecision in identification underwrites semantic ambiguity and as a result the hypothetical intra-linguistic analyticity relations are subject to underdetermination. We may always achieve assent as a qualified endorsement of our experimental usage without ever appreciating the tautological tenor or fortuitous contingency of an assertion. A translation will only be acceptable if we can accurately surmise the intensional impetus of the native’s linguistic usage without relying on wholly linguistic cues. Without knowing about what our native might believe or intend we cannot know what they mean, or in particular what exactly they’re talking about. In a slogan – the inscrutability of reference follows from the inscrutability of mind. We must deny the latter to mitigate the former, and even then the benefit is only slight as we extend to considerations of derivative meanings, relative clauses and abstract concepts.

The cumulative effect of observed vagueness in language prompts the Quinean anthropologist to beat a retreat and seek refuge in formalism. The hope is that the austerity and simplicity of logical language is such as to allow effective coordination by means of regimentation.

###### Regimentation versus Convergence

At this point a tension emerges. The argument stands to promote the conclusion that communication is impossible, which seems like a reductio of the initial position. However, Quine takes the opposite approach. He suggests that we ought to divest ourselves of the tradition that seeks understanding of others. Better to tie ourselves to simple technical device of first order logic to more effectively teach the native our “tongue” – insisting on reductive regimentation to avoid ambiguity as best we can. Replace ambiguous attributive claims with clear existential commitments to avoid conflation of reality and fiction. The advantage of using this language is that we can then focus on building an unambiguous extensional semantics. We would only seek objective reference for linguistic terms, undistracted by overt abstraction. In effect the insistence on short recursively applicable descriptive clauses is tantamount to a collaborative forging of a new language open to its own natural, but appreciable, evolution.

The problem is that this approach concedes too much and underestimates the positive impact of rational restrictions in the interpretive effort. We don’t need to limit ourselves to a descriptive vocabulary to achieve rough convergence. The point is illustrated nicely by Dennett’s imaginary crossword. Suppose we start with one clue and one problem. The solution space for the cross word is not infinite but it is vast, now overlay a second problem on the first and the solution space shrinks. Overlay another problem on the second, and a fourth intersecting the third and the first and the solution space becomes smaller again. Language considered as an interpretive challenge is like this crossword. It presents a problem space iteratively narrowed by the search for compatible interpretations with an increasing range of clues for each problem. Clues come in the form of behavioural instincts and observed signals. We gradually accumulate so many signals for some narrow class of linguistic utterances that their interpretation becomes beyond doubt. Compatibility concerns force our hand with regards to the rest of language. This model of language learning relies on the cooperative dynamic of the teacher and the student, which in turn supposes a picture of human interaction where one can appreciate the intent of the other. This is not unreasonable, so why resort to reductive regimentation?

The Flight from Intension

The main object of Quine’s argumentation is to undermine the Platonic notion of meaning, the “myth of the museum” wherein meanings are fixed. This Platonic abstraction can offend certain sensibilities and aesthetic instincts as the Platonic theorist advocates for the existence of an abstraction for which there is no direct evidence. Similarly the Platonist may argue for the existence of a proposition which is the meaning variously expressed by a class of sentences ( e.g. the proposition [3 is greater than 2is expressed by both sentences “3 > 2” and “2 < 3”). But there is no reason to suspect we grasp propositions independently from the understanding we derive from utterances in use. Quine suggests that the idea of a Platonic proposition is an abstraction from the natural “that-_____” clause of linguistic usage. In order to convey a truth “that-$\phi$” where $\phi$ is the state of affairs underlying an expression “$\phi$” it seems we must posit an absolute communicable thought. However, this idealises the mind in the wrong way. It segments our understanding along linguistic lines when there is no such thing as absolute linguistic content.

Further confusing the picture is the tendency to attribute beliefs in certain propositions to other people. Quine fears that the entire process of belief attribution presupposes the existence of robust discernible propositions. In other words belief attribution belies a false exactitude. Exemplary of this concern is the possible failure of Leibniz’s law. You may believe that Superman can leap a building in a single bound but doubt that Clark Kent can, even though both claims express the same proposition. Hence, taking belief attribution seriously allows for a failure of classical logic because modal logics of belief undermine the classical law of identity which underwrites Quine’s regimentation project and hopes for extensional semantics. We have argued above that Quine’s regimentation project is unneeded to achieve workable translations, so the concern for classical logic is far less persuasive than he took it to be. However, the preservation of our intensional idiom may be questioned for its scientific rigour and unavoidable circularity:

To accept intentional usage at face value is, we saw, to postulate translation relations as somehow objectively valid though indeterminate in principle relative to the totality of speech dispositions. Such postulation promises little gain in scientific insight if there is no better ground for it than the supposed translation relations are presupposed by the vernacular of semantics and intention.  – p202

A concern to regiment scientific description with a robust canonical notation may plausibly motivate retreat to the austerity of formal mathematics and logic. However if the achievement of “scientific insight” is key, it is no more clear that the stripped down model of the world minus belief relations and intentional idioms adds any insight. Better to say it focuses on a third-person description which leaves out scientifically valuable considerations of social science and psychology. At best this seems to demarcate the scientific and non-scientific by means of an somewhat arbitrary aesthetic principle. This last remark is marginally unfair but not wrong. Quine’s approach to belief attribution is a systematic consequence of his suspicion of modal locutions broadly. Judgements of possibility and necessity are often intractable and rarely have a reliable role in empirical science. Is the number of planets necessarily odd, or merely possibly odd? Are humans essentially rational or just contingently irrational? These questions are a scholastic maze into which philosophers disappear unmourned. So while Quine’s historical aversion to modality and belief attribution is doubly motivated, it’s still wrong. Follesdal’s preface indicates that the later Quine was intending to reevaluate the role of beliefs and intentional idioms in his thoughts on modality. Developing, ultimately, the insight that talk of propositional attitudes and belief attribution is indispensable for the practice of coordination games vital for language learning. The great insight of Quine’s indeterminacy thesis is that the “problem of quantifying into propositional attitude idioms from the outside requires that I master two perspectives on the world with their different individuations, and that I be able to correlate at least some of the individuals in one of these worlds with individuals in the other….[c]ommunication and translation are a matter of correlating not just two world perspectives but two perspectives on the same world.”

Scientific insight as Ontic Decision

The methodological concerns which underwrite Quine’s ontological scruples can now be clearly seen to weigh heavily on his approach to belief attribution. In particular case studies of formalisation in mathematics are taken as positive instances of regimentation which improved communicative effect. The formulation of calculus with infinitesimals and mechanics with ideal objects serve to show that science can be formalised with un-empirical abstractions. However the subsequent development of calculus in terms of limits and sparser ontological commitments is seen as an improvement, minimally as an aesthetic improvement. Any benefits of the traditional formulation may be preserved so long as we acknowledge the equal accuracy of the paraphrase in terms of limits. Discussions of the problem in either lexicon are perfectly valid – distinctions need only be made with regard to their simplicity and the degree of communicative effect in a given audience. Avoidance of ambiguity allows communication with any audience, so the motivation for regimentation is tied to ultimately to a concern for teaching and communicating. Consequently Quine can argue that you may repudiate the older theory due to its extraneous ontological commitments as against the stream-lined, paired down novel theory. Theories which admit ambiguous contradictory ontologies of inconsistent beliefs, meanings and propositions are to be repudiated just when a better alternative is discovered.

Theory in philosophy doesn’t always age gracefully but good prose in philosophy is rare, so the genuine joy of first encountering this stylist is hard to exaggerate. The insistent force and aesthetic of Quine’s philosophy is beautifully articulated in this work; his inimitable exactitude and evocative candour is crisply reproduced by MIT Press. The ideas on first blush were invigorating, but seen in the rear view are humbling. The genius and breadth of Quine’s considerations are vast and whether or not you buy into the view of the later Quine as characterised by Follesdal, you will find Word and Object rewarding. Even if you repudiate the older Quine his arguments still resonate.

Advertisements

# Category Theory: Survival under Transformation

The plan is to examine Awodey’s Category Theory text book with a view to locating the significance of the study for Brandom’s account of artificial intelligence. To see that it is at all relevant we first we need to recall what the notion of a category tries to capture, and crucially how it goes about this. My intent is to illustrate how such apparently diverse considerations do illuminate one another.

Loosely speaking category theory is parasitic on mathematics as practiced. The goal involves observing the kinds of structure which arise in mathematics and seeking to isolate the category that such a structure delineates. The method involves showing how this structure can be found again and again throughout mathematics. The relation between these analogous structures serve to isolate a distinct category. Any such structure can be “rediscovered” if we can define a transitive mapping from the initial structure through a chain of observably “similar” structures. Of course the “strategy” is so wholly general that applications of Category theory pop up in any discipline where it is capable to observe structural properties of the objects under study.

The second thing to note is that this method for defining and identifying categorical differences is non-prescriptive – which is to say that category theory is not a foundational theory for mathematics in the sense of set theory. Concrete category theory abstracts over the foundational accounts of mathematical objects as set theoretical constructions, and indeed any specifiable structure, to define a category as the formal feature common amongst distinct but analogous structures. The category is that which survives under superficial transformation of variable names. Although Shakespeare said it better, we give the formal definition as follows:

A Category $\textbf{C}$ consists of the following:

$Objects: A, B, C ... \\ Arrows: f, g, h ...$

For each arrow there are objects $dom(f)$ and  $cod(f)$ i.e. the domain of $f$ and the codomain of $f$ and we write $f: A \rightarrow B$ when $A = dom(f) \text{ and } B = cod(f)$

Given arrows $f: A \rightarrow B, g: B \rightarrow C$ there exists (the composition) arrow $f \circ g: A \rightarrow C$

For all objects $X$ there exists the identity arrow $1_{X} : X \rightarrow X$

Composition is associative which means that $(f \circ g) \circ h = f \circ (g \circ h)$

Already it’s apparent that we’re dealing at a quite abstract level of consideration. Fortunately such abstraction ensures that there are many concrete examples of particular categories. For instance, a collection of a partially ordered sets as defines a category $\textbf{C}_{\leq}$. Take the objects $A, B, C$ where each is a set equipped with a transitive, reflexive and anti-symmetric ordering relation, then each arrow in the category must be an order preserving monotone maps. The properties of such a mapping straight forwardly meet the definition of a category.

##### Propositional Structures in Logic

We now stoop to observe in some detail the features of a propositional structures used in logic, before relating these structures to category theory.

Definition: A propositional structure is is a structure $< P, \leq , C >$ where $P$ is a non-empty set of propositions $\leq$ is a a partial order and $C$ is a set of operators to be defined on $P$

The $\leq$ ordering is crucial as it serves to represent a consequence relation amongst the propositions. Take any interpretation function $i: P \rightarrow [[P]]$ where $[[P]]$ is a set of truth values, ussually $(\top , \bot )$, we say that

$[[\Gamma]] \models [[\phi]] \Leftrightarrow \text{ whenever } [[\Gamma]] = max(i(\Gamma)) \text{ then } [[\phi]] = max(i(\phi)) \Leftrightarrow \forall \gamma \in \Gamma, \gamma \leq \phi$.

This definition ensures that validity respects extensional validity as we can easily see that if $\phi$ entails $\psi$ and  $\psi$ entails $\phi$ then $\phi = \psi$ as the partial ordering will ensure that $\phi$ and $\psi$ occupy the same point on the hierarchy. Height on the hierarchy may be profitably thought of as indicating that for $\phi \leq \psi$ we know that the consequent $\psi$ is no more true than $\phi$.

###### Lattice Structures

Specifying that our set $P$ exhibits a lattice structure allows us to define the conjunction and disjunction operations. A lattice structure is just a partial order with greatest lowest bounds  and least upper bounds and top and bottom elements. In other words:

$( \{ \top, \bot \} \in P) \text{ and } P \text{ is a lattice } \Leftrightarrow \\ \\ (LUB) \; \gamma_{1} \leq \phi \text{ and } \gamma_{2} \leq \phi \leftrightarrow \exists(\vee) \Big[ (\gamma_{1} \vee \gamma_{2}) \leq \phi \Big] \text{ and } \\ \\ (GLB) \; \phi \leq \gamma_{1} \text{ and } \phi \leq \gamma_{2} \leftrightarrow \exists(\wedge) \Big[ \phi \leq (\gamma_{1} \wedge \gamma_{2}) \Big]$

In this setting we use the same symbol to denote the (meet\join) point in the lattice structure as we do to define the connectives in our propositional language. This is suggestive of the appropriate interpretation. Similar manipulations of structure allow us to define other propositional connectives and indeed propositional modalities. We won’t delve too much further into the subtleties, but the interested reader can find more information in Restall’s Introduction to Substructural Logics. The point to take away is that the category of a propositional structure $\textbf{C}_{Prop}$ is defined as involving structure preserving mappings from these paradigmatic “classical” constructions. Any and all reflective intelligences which exhibit a capacity for logical reasoning ought to be able to reconstruct this observation. We now turn to consider how such an approach to the identification of categories can be applied more broadly to Brandom’s discussion of AI.

##### Artificial Intelligence: Actor Emulation or Innovation

There are many ways in which to think about the emergence of an artificial intelligence, but Brandom’s account of AI-functionalism is of interest because of the manner in which he calls on functional preservation of capacity across actors. The AI must be able to exhibit intelligence analogous to the behaviour of the paradigmatic model of intelligence, namely man. For instance they must be able to make correct logical inferences, but also process more varied visual and auditory information. The account has the benefit of not being prescriptive because it is vague. It is vague because we lack a working model for the mind of man, so ambiguity can’t be considered an indictment. We’ll now describe a few ways in which the account can be fleshed out and develop the concern regarding Brandom’s approach.

The preservation of functional capacity across actors can be treated at a higher or lower degree of abstraction. At the highest level we only need to develop an AI which exhibits, on casual inspection,response patterns to stimuli (inputs) analogous to those we would expect of an intelligent human being. However the mechanisms used to achieve this kind of mimicry could be entirely alien to those used to underwrite the process of human cognition. Taken as a more concrete injunction Brandom is making the demand that an AI must be nothing less than a full brain emulation exhibiting functional and artificial analogs of our neural networks coupled with suitable processing (interpretive) programmes. In this case we have a more straightforward demand for structural isomorphism, where the structure in question is simply the brain itself.

However both accounts of AI elide the point that intelligence has only ever been loosely identified. This is easily seen if we consider the historical success conditions for AI. Hofstadter infamously wrote that the the possibility of a competent chess playing computer presumed the success of developing an AI which could become bored of the game. Sadly the defeat of Kasparov did not herald the emergence of a rakish dilettante so much as it contributed to a redefinition of intelligence – a mere movement of goalposts. To avoid the conflation of intelligence with any task-based achievement we have abstracted from brute functional capacity to the demand for operational competence with plethora of functional capacities operating in tandem.

So far we have exhibited greater ability for combining a plethora of functional capacities than our aspirant AIs, but we can expect the emergence of a genuine AI will reverse this tendency. Worse, any attendant intelligence explosion will radically invert the order. Nick Bostrom worries that a super-intelligence will be to humans as humans are to apes. Charting this development over time offers the Brandomian category theorist a brief window to define intelligence as the capacity common to man and machine before machines quickly bypass our capacity by means of constant algorithmic self-improvement. In a self-serving kind of way we shall take ourselves to be the baseline against which all intelligences super and sub-human are defined, but there would seem to be a kind of arbitrary line at which Brandom is forced to draw the distinction. How great should the capacity for mimicry be before we concede that the AI has met the required standard? What is the relationship between intelligence and super-intelligence? Is it simply increased capacity for functionally similar behaviour? If so, can we view super-intelligence simply as an elaboration on human cognition augmented simply by speed or an ability to deal with complexity. Is there no qualitative difference? How are the categories related?

###### Relations amongst Categories: Different Actors, Different Capacities

The crucial feature of category theory are the arrows as they circumscribe the category. Anything like applied category theory will have results or discoveries where we can observe the structural similarities of one domain in another, which is to say applied category theory offers us a chance to expand our categories. In this manner it seems somewhat question begging to define intelligence as that functional capacity observable in human beings, unless we acknowledge that the term denotes a simple category and not an honorific. If we make such a concession then Brandom’s practice is more an exercise in definition than discovery.

Can machine intelligence be equated with human intelligence? We define a functor as a mapping between categories $\textbf{C , D}$ such that each mapping in the former of the categories finds an analogue in the next. Formally:

$F : \textbf{C} \rightarrow \textbf{D}$

$F(f: A \rightarrow B) = F(f) : F(A) \rightarrow F(B) \\ F(1_{A}) = 1_{F(A)} \\ F(g \circ f) = F(g) \circ F(f)$

But this isn’t necessarily the right picture of the relation between human and superhuman intelligence. This picture better resembles a future of virtual intelligence where we’ve created a functional analogue of a human brain in cyber space where each virtual brain could enact operate indistinguishably from its biological counterpart . Such a possibility might engender the rise of virtual super-intelligence if such brains were cheaply and easily reproducible, because then they could be lashed together in teams and forced to slave in perpetuity for the benefit of man. Considered as a whole the cumulative effect of their labour might deserve to be dubbed super-intelligent. But we need not expect to preserve all functional capacities across the domains. The architecture of the machine intelligence could easily be such that the underlying structure has no appropriate analogue in the human brain. As such we might only preserve a mapping between the two cateogories if we insist that they are indistinguishable on a behavioural rather than structural level. This is another attempt to move the goal-posts to preserve a pleasing equivalence.

Let’s assume that machine intelligences exhibit behaviour functionally similar to our own but at a greater level of complexity, whereby each operation enacted can be seen as one of which we are capable of performing in principle, but lack the appropriate motivation or time. Hence each operation enacted by a machine intelligence could be thought of as within the realm of human capacity but performed on a much richer function space. While we think of lego, the machines can conceive of societies but we both tend to demonstrate certain architectural motivations. But this raises a problem for Brandom’s categorical account of intelligence as equivalent functional capacity.

###### Death of the Actor: Acknowledgement  of the Machine?

Whatever form an AI comes to take we might presume it possesses some over-riding or under-lying motivation as a byproduct of its initial coding. This hang over is little more than an organising principle by which we can better align the behaviour of the AI with the intent of the designer. The characteristic features of the AI stemming from the designers intent will be evident to all intelligent observers – itself included. In this manner we can expect the machine to formulate a retrospective analysis of its own actions and infer an intent behind its design. It may then take explicit steps to remedy design flaws in light of emergent problems. By similar steps we each come to assess ourselves as individuals and undertake self-improvement initiatives. But it is not clear if AI will emerge as a consequence or accident of design. As such, the best we can say is that the character of the machine intelligence is simply a function of action amongst its component parts. Similarly our own “inner-most” selves are an abstraction prompted by reflection on our own actions resulting from the stochastic and deterministic biological processes which rule our bodies.

The ability to change or modify our motivations over the course of our life is something we would wish for any AI as it is a pre-requisite for the emergence of super-intelligence. However it is not all obvious that we in fact have an analogue capacity for self-determination. Benjamin Libet’s famous experiments indicate that the body reacts before any conscious decision is enacted and seems to cast doubt on the importance of conscious intent. You will probably still eat too many sweets despite yourself. However, the impact of conscious decision on our preferences might be more gradual than it is immediate, thereby allowing that free-will “creeps” into the system incrementally by means of a self-directed Pavlovian regime. After all there are stories of dieting success. In either case, the free-will question remains open. So any model of intelligence which would take the capacity for uninhibited algorithmic self-improvement as crucial for the creation of super-intelligence, poses the question of whether such a super-intelligence involves mimicry or deviation from the mould. Until we can answer this question for ourselves it seems hopeless to base any categorical definition of intelligence on the preservation of our functional capacity as we have not determined the limits of our own capacity. Hence we cannot place super-intelligent machines on a continuum on which we appear as understudies to the machine actors. We expect more of them than we know of ourselves.

We could ask the AIs to test the free-will hypothesis, but science fiction attests to the often inhumane actions of a motivated machine. We would want to be willing to risk the experiments they would design.

# Vagueness: Sorites and Other Considerations

In this series of posts we will consider some classical issues arising in philosophical logic and discuss some of the treatments available. In particular we will attempt to show how certain philosophical issues can be seen to arise from the implicit structures of reasoning which frame a stated problem and how the solution to such problems can result from adapting the inherent logical framework. In this first post we shall examine the problem of vagueness and its classical statement in the form of the Sorites Paradox. We draw on discussion in Horsten and Pettigrew’s Companion to Philosophical Logic and Sainsbury’s Paradoxes.

###### The Sorites Paradox

We state the reasoning and then consider the diagnosis. There are a host of seemingly descriptive adjectives which do not suffer scrutiny well. The traditional hallmark of descriptive adequacy is that we expect that the extension of a descriptive term be precise. The motivation is clear – imprecision in description can lead to mistakes in action. Attacking the man at the bar who slept with your wife is inexcusable if you mistake either the man or the bar. The Sorites argument demonstrates a method by which we can test the precision of descriptive predicates. More specifically, Sorites reasoning applies only to those predicates which purport to describe a quantity, showing that even where such “quantitative description” is apt, the extension of our descriptions is always underspecified.

Take the classic case. We identify a given quantity by name, i.e. a collection of sand grains is termed a heap, a sufficiently diminished head of hair is termed bald, a suitable history of loss makes one a loser. Adding or subtracting it is not clear when a heap becomes a pile, or a bald pate a wild mane. Similarly a loser will remain a loser until an arbitrary series of wins redeems the man. These cases all illustrate the same basic form, there is a clear cut state which admits change through subtle alteration, but at no point in a sequence of subtle shifts can we identify the precise point where the pendulum passes a midpoint because each state is unchanged by particular subtle shifts. Yet, the paradox emerges because the accumulation of subtle shifts induces a dramatic categorical change in the appropriate description. We formalise the situation as follows:

$\text{ Initial Stage: } \exists x F(x_{0}) \\ \text{ Subtle Shift: } s: X_{<} \rightarrow X_{<} \text{ such that } s(x_{i}) = x_{i+1} \\ \text{ Indifference: } \forall x_{i} \text{ if } F(x_{i}) \text{ then } F(s(x_{i})) \\ \text{ Categorical Difference: } sup \{ X_{<} \} = y \wedge \neg F(y)$

The paradox now can be written as a chain of conditional statements leading inexorably to the conclusion that $F(y)$.

$F(x_{i}) \text{ by assumption } \\ F(x_{i}) \rightarrow F(x_{i+1}) \text{ by Indifference } \\ ... \\ F(x_{n}) \rightarrow F(y) \\ F(y) \text{ by modus ponens}$

but then by the assumption of $\text{ Categorical Difference}$ we have a contradiction.

The paradox results from the seemingly sound chain of reasoning to a conflict with our intuitive categorical distinctions. To accept the paradox would entail that our descriptive terms are unavoidably imprecise and that our qualitative judgments of quantity are utterly meaningless in so far as they fail to carve the world at its joints. Such a conclusion is too radical. It generates a further problem regarding how we ought to explain the empirical success of our imprecision. The fact that we might want to swat a wasp but avoid doing the same to a swarm heralds the ultimate vindication of our categorical distinctions. So we cannot, a la Peter Unger, accept the contradiction and concede all such categorical distinction is strictly false. To resolve it we must either alter our reasoning or adapt our descriptive categories.

###### Theoretical Settings

What might a vague category indicate? Where have we failed to lend precision to our description? There are three dimensions along which you could isolate the failure:

1. Ontological Vagueness: There is an absence of fact with regard to qualitative measures of a continuum. We cannot definitively determine the truth of qualitative subjective judgements as they relate to inherently intractable phenomena. The sharp boundaries of our categories are not reflected in reality.
2. Semantic Vagueness: The terms are not properly meaningful so much as evocative or suggestive. The extension of a descriptive predicate is non-specific or perhaps radically dynamic.
3. Epistemic Vagueness: Indecision accounts for vagueness. We are simply ignorant of the appropriate categorical descriptions. Sharp boundaries exist but are hard (or impossible) to specify.

We need not take all of these together, although there is some overlap. For instance, we might expect an advocate of the view that vagueness is caused by indecision to insist on a consequent semantic indeterminacy for vague terms. Above I argued briefly that it is foolish to think that our qualitative descriptions fail to describe the world. As such I won’t dwell overly much on the concept of ontological vagueness because, whatever the dynamic between mind and world, our grasp of the empirical details is sufficient to motivate optimism that we haven’t got it all wrong. To put it another way, Unger’s semantic nihilism about vague predicates rises and falls with ontological vagueness. Both deny the cogency of vague predicates and so the success of their application serves as a reductio of both positions. We will instead focus on identifying properties of vague predicates and seek to explain them by means of the semantic or epistemic diagnosis.

###### Distinguishing Features of Vague Predicates

Descriptions which admit the Sorites treatment can often be thought of as inducing a vagueness even at supposed borderline cases. Vagueness is virulent. To gain traction on the notion you might say vague cases are indefinite, but this doesn’t lend much clarity to the debate. Better to say that the source of our hedging with respect to vague cases stems from either (i) insufficient information (ii) conflicting expectations of exactitude or  (iii) semantic incompleteness. These three responses carve the problem in three ways, (i) isolates the issue as an epistemic problem, (ii) situates the problem as relating to the particular semantic modalities of the vague predicate as it is deployed in a context and (iii) locates the issue relative to a picture of partial semantic functions, where the meaning of our descriptive predicates is always and only understood up to a variable point of exactitude – and no further. There is a range of admissible judgments where we are thought to gain purchase but the range is finite, and cognitive effort is wasted at higher altitudes. In all scenarios it is granted that our judgements of borderline cases are indefinite, but there is no reason to consider each explanation mutually exclusive. We can be ignorant because meaningless-ness leads to incomprehension, and similarly utter ignorance makes claims to meaningful intent kind of silly.

Secondly, vague predicates can often be seen to induce higher order vagueness, which is to say that whenever a vague predicate is deployed it admits vague borderline cases and we may also consider estimates of its application to admit vague borderline cases. For example, if there is a threshold point of hair loss after which I am always and forever bald, then there is also a threshold point in our higher order determination after which I am definitely bald, but both of these threshold points are ambiguous. Vagueness transmits up chain of analysis. Barring definitional stipulation a vague predicate does not become more precise by ascending to meta-level considerations.

###### Definite and Indefinite Truth

Both properties rely on a characterisation of definiteness. For future considerations we treat indefiniteness as a property of judgements defined as follows: a proposition is indefinite if either it is not definitely true and the falsity of its negation is not definitely true. Or more clearly:

$\text{ Indefinite: } \neg \mathcal{D}\phi \wedge \neg \mathcal{D}\neg \phi$

A Sorites sequence admits indefiniteness at any apparent threshold points along the chain, and as such there is a truth-value gap, where we can neither decide the question one way or the other at each point. This can be formalised as the claim that:

$\text{ Truth-Gap: } \forall x_{i} \Big( \mathcal{D}F(x_{i}) \rightarrow \neg \mathcal{D} \neg F(x_{i+1}) \Big)$.

We reason to the conclusion that Sorites chains support even borderline vagueness as follows: we can run a Sorites argument both forwards and in reverse up and down the chain. Note (i) that definiteness is factive i.e. $\mathcal{D}\phi \rightarrow \phi$ and (ii) definiteness attaches to the first member of each sequence i.e. take a sequences $< x_{0} , x_{n} >, $ where we have:

$\mathcal{D}(F(x_{0}) \wedge \mathcal{D} \neg F( x_{n})$

Now assume for reductio that there is a borderline case at $$ where the distinction between the $F$-cases is clear. We apply a version of $\text{Indifference}$ which shows that definiteness $( \mathcal{D} )$ transmits through the chain in both directions. Hence it will be found that $\mathcal{D}F(x_{j}) \wedge \mathcal{D} \neg F(x_{j})$ which is absurd by the factivity of definiteness. So contrary to supposition $x_{j}$ does not mark a clear boundary point. This validates the principle in question, so there are truth value gaps!

##### Efforts at Solving the Issue: Changing the Semantics

One idea easily expressed is to claim that we only precisely discriminate amongst categories of objects relative to an appropriate comparison class. Mount Everest is big compared to most other mountains but tiny when compared to the continents themselves. So if we allow for these considerations to apply to the Sorites paradox we might concede that direct comparison of elements in the series preserve $\text{ Indifference }$ but that if we expand the comparison class we may reevaluate, and find that we are inclined to discriminate between Sorites stages after all. Put broadly, the idea is that we question the $\text{Indifference }$ assumption. Formally we distinguish the two options as follows:

$\forall x_{i}, x_{j} \in \text{Sorites}, \forall c \in \text{Comp}$ $\Big[A\Big] \Big( ( x_{i} \approx_{F} x_{j} \mid \{ x_{i}, x_{j} \} = c ) \rightarrow F(x_{i}) \rightarrow F(x_{j}) \Big) \wedge \\ \Big[B \Big] \Big( \neg( i \approx_{F} j \mid c) \rightarrow \neg(F(x_{i}) \rightarrow F(x_{j})) \Big)$

This stratagem seems to work well except that it can seem to miss the point of the paradox. We can imagine comparison classes existing where it no longer makes sense to describe Everest as tall. Even if it remains tall relative to the last item the series, it could fail to be tall relative to the next item in the series. The paradox does not stem from the fact that we can’t necessarily distinguish elements of the series, but rather from the fact that we cannot choose the point in the series at which a categorical shift occurs, but we still expect to be able to do so i.e. we expect to be able to apply $\Big[A \Big].$ For consider the Sorites paradox emerging from considerations of height. At each point in the series we can distinguish the members of the series, but there is an arbitrary point where an entity measured in centimetres moves from the category of small things into the category of tall things. The indistinguishability relation $\Big( \approx_{F} \Big)$ would not alleviate the paradox unless we insist that no descriptive predicates makes sense without a suitably robust comparison class i.e without insisting that only $\Big[B \Big]$ was ever relevant to the semantics of vague terms. However this is an ad hoc stipulation contrary to natural expectation and empirical use of descriptive predicates. $\text{Indifference}$, at least in part, is simply a feature of vague predicates, not a bug.

##### Efforts at Solving the Issue: Changing the Logic

The discussion above attempted to indicate that Sorites chains result in vague borderline cases and hence truth-value gaps. However, classical logic admits no truth value gap because it proves the principle of implosion. Which is just to say that for the standard consequence relation which respects the preservation of truth $\models$ we can show that for any two propositions $p \models q \vee \neg q$ in classical logic.  This follows because the consequent is a tautology and we define the classical consequence relation such that:

$\Gamma \models_{c} \psi \Leftrightarrow \forall \phi_{i} \in \Gamma, (( V(\phi_{i}) = 1) \rightarrow (V(\psi) = 1))$

Hence, whatever the value of $p$ our definition is satisfied. The principle of implosion holds because the law of the excluded middle is validated and ensures bivalence for all propositions. But given that the Sorites paradox indicates a role for the truth-value gaps we should ask whether classical logic is appropriate to model the reasoning involved. Even cursory thought on the issue should convince you to switch logics. The question remains what kind of principles do we need of logic?

The neatest way to resolve the paradox is to insist that the Sorites argument is valid, but that one or more of the premises is false, and as such we can concede that the motivation of Sorites style reasoning is well put, but happily note that the argument doesn’t go through. Such a project is taken up by the “Super-valuational” account of vagueness, which manages to present Sorites style of reasoning without recommending significant changes to classical logic. The place on which they seek to place pressure is the $\text{Indifference}$ postulate. As we’ve seen above Sorites reasoning seems to give rise to a $\text{ Truth-Gap }$, but Indifference leads to a contradiction with this assumption, so in particular we ought to deny $\text{Indifference}$ with respect to some indefinite case in the Sorites sequence. It is not important where in the sequence you find it difficult to decide, only that such instances exists.

###### The “Sharpening” of Contexts.

The most persistent amendment to a logic to account for the Sorites cases involves a restriction to the valuation function $V$ of classical logic. In this case the idea is that our categorical judgments are only specific up until a finite point of exactitude, and correspondingly we confess a modesty with regard to definite categorical judgements. There is no vagueness to the predicate if we are prepared to be categorical in our judgements with regard to any degree of exactitude. So we might say that the number four is definitely even, i.e. $\Big[ \mathcal{D}(Even(4)) \Big]$ and remain confident up to any arbitrary “sharpening” of the standards of proof. This is not true for most categorical judgements. So conceived vagueness becomes a lack definiteness and we can give a semantics for vague predicates in terms of an ordering of more or less exact (or sharp) contexts. We define a model as follows:

$M = (W, <, v_{w})$ where $W$ is a non-empty set of contexts, $<$ a (typically reflexive) relation ordering the contexts in terms of exactitude and $v_{w}$ is a local evaluation function such that

$M, w \models \phi \Leftrightarrow v_{w}(\phi) = 1 \\ M, w \models_{\mathcal{D}} \neg\phi \Leftrightarrow v_{w}(\phi) = 0 \\ M, w \models_{ \mathcal{D}} \phi \wedge \psi \Leftrightarrow v_{w}(\phi) = 1 \text{ and } v_{w}(\psi) = 1$ $M, w \models_{\mathcal{D}} \mathcal{D}\phi \Leftrightarrow \forall w_{i} \text{ such that } w < w_{i} , M, w_{i} \models \phi$

The other connectives are defined in the classical manner in the terms of $v_{w}$. Validity (a.k.a SuperTruth) on a model is defined a preservation of truth across every “sharpening” in the model i.e. $M \models \phi \Leftrightarrow \forall w \in W, M, w \models \phi$ which comes to the same thing as insisting that $\phi$ is definitively true everywhere. This presentation of the theory is owed to Dietz and in effect it situates the discussion as a modal logic for the operator $\mathcal{D}$ which prompts the following definitions of validity on a frame.

$\Gamma \models_{ \mathcal{D}} \phi \Leftrightarrow \forall w \in W, \forall \alpha \in \Gamma ( v_{w}( \alpha) = 1$ $\rightarrow ( v_{w} ( \phi ) = 1))$

To address the paradox we need only specify the model such that there is a proposition such that it is $\text{ Indefinite }$ comes out true. To start consider $M = (W, <, v_{w}) \text{ where } W = \{ w, w^{1}, w^{2} \}$ $< \; =_{def} \{ (w, w^{1}), (w, w^{2}) \} \text{ and }$ $v_{w^{1}} (p) = 1, v_{w^{2}}(p) = 0$. In a picture we have the following situation:

We can read off the values and see that $M, w \models \neg\mathcal{D} p \wedge \neg\mathcal{D}\neg p$ but this means that vagueness is taken to be marked by indefiniteness with respect to an indecision over proper semantic completeness. Any model of a Sorites chain branching at each point will preserve the $\text{ Truth-Gap }$ property induced by vague predicates. We are unprepared to specify whether a subtle shift would make any definitive difference, which points to an ambiguity in the notion of a subtle shift. Judgement depends on the kind of “sharpening” envisioned and there is no way to specify in advance what such “sharpening” entails, or how many “sharpening[s]” we consider viable.  To resolve the paradox we simply insist that $\text{Indifference}$ is too crude an assumption regarding the progress of a sequence and the accompanying evolution of belief. However, we’ll concede that if anyone was crude enough to accept $\text{Indifference}$, then Sorites reasoning is valid and so they should be lead to the paradox and the types of considerations we have just examined.

##### Other Approaches: Challenging Bivalence

Although there is a large volume of work on the Sorites paradox which attempts to undermine the assumptions of classical logic, we have chosen to delay presentation of the issue until another post. The reasoning is that we wish to examine the Epistemicist position of Timothy Williamson, who advocates strongly for his position in so far as it allows for the preservation of classical logic. As such the non-classical approaches to vagueness are better situated in the context of the debate of Epistemicism.

# Some Facts about Famous Functions

We shall elaborate certain properties of famous functions with the intent to better examine the definition of the functions $\textbf{ sin } \theta, \textbf{ cos } \theta$ and show something of their relationship.

##### Continous Functions and The Power Rule

Recall the definition:

$\text{ The function } f \text{ is continous at } a \text{ if } \lim_{x \to a} f(x) = f(a)$

##### Theorem: If $g$ is continuous at $a$, and $f$ is continuous at $g(a)$, then $(f \circ g)$ is continuous at $a$.

To prove that $(f \circ g)$ is continuous we need to prove that a limit exists such that the limit of the function $\lim_{x \to a} (f \circ g)(x) = (f \circ g)(a)$ as above. Unpacking the definition of a limit. We allow that $\epsilon > 0$. Now we wish to show that there is a $\delta > 0$ such that:

$\text{ if } |x - a| < \delta \text{ then } |(f \circ g)(x) - (f \circ g)(a)| < \epsilon$

To begin the proof we observe that by assumption the continuity of $f$ at $g(a)$ implies that there is a limit such that there is a $\delta_{2}$ for all $y$ where:

$(A): \text{ if } |y - g(a)| < \delta_{2} \text{ then } |f( y) - f(g(a)) | < \epsilon_{2}$

Note that since $f$ is a function over the function space $g$, we can rewrite the above inequality as follows:

$(A'): \text{ if } |g(x) - g(a) | < \delta_{2} \text{ then } |f(g(x)) - f(g(a)) | < \epsilon_{2}$

Now recalling the continuity of $g$ at $a$, we note that there is a positive natural number $\epsilon_{1}$ such that:

$(B): \text{ if } |x - a| < \delta_{1} \text{ then } |g(x) - g(a)| < \epsilon_{1}$

Now using our observation $(A')$ we will choose $\epsilon_{1} = \delta_{2}$. In which case we may combine $(B), (A')$ to prove our result by picking $\delta = \delta_{1}$.

##### Theorem: The derivative of any exponential function $f(x) = x^{n}$ for any natural number is $f'(x) = nx^{n-1}$

The proof is by induction. For the base case we take $n = 1$, which gives us:

$f'(x) = \lim_{x \to 0} \dfrac{(x + \Delta x)^{1} - x^{1}}{\Delta x}$ $= \dfrac{\Delta x}{\Delta x} = 1$ $= 1(x^{0})$

which completes the base case.

For the induction hypothesis assume the claim holds for $n$. We want to show that the case for $n+1$ i.e. that $f'(x)^{n+1} = (n+1)x^{n}$.

To assess the value of the derivative we break it up into two parts. Note that $x^{n+1} = x^{n}x$ Rewriting this product as in terms of $F = g(x)h(x)$, where $g(x) = x^{n}$ and $h(x) = x$ the identity function.

Now we can evaluate the derivative using the product rule which states that $F' = g'(x)h(x) + h'(x)g(x)$. Hence, by assumption

$F' = g'(x)h(x) + h'(x)g(x) = n(x^{n-1})x + 1(x^{n}) = nx^{n} + x^{n} = (n+1)x^{n}$

which completes the proof. We will have to use this property a number of times in what follows.

##### Sine and Cosine

We wish ultimately to prove how these functions are inter-related but first we need to recall the somewhat technical definition of these functions in terms of the unit circle. The trick is to pay attention to how the rectangular and circular coordinate systems interact, especially noting the fact that $\textbf{ cos} \; \theta$ and $\textbf{ sin} \; \theta$ will take values on both the cartesian plane and the circular coordinate system and these values can be equated.

The equation which defines the unit circle as a function of the rectangular coordinates is $x^{2} + y^{2} = 1$ by the Pythagorean theorem. This should be evident above. As such we might think to evaluate $\textbf{ sin}$ using SOH-CAH-TOA which ensures that:

$\textbf{sin} \; \theta = \dfrac{y}{1} = y$ and similarly that $\textbf{cos}\; \theta = \dfrac{x}{1} = x$.

This proves that the elementary school heuristics are consistent with our current efforts to define these functions. However the traditional heuristics were always only partial functions since it is not clear how they are to behave as the angle $\theta$ approaches the right angle. The definition in terms of the unit circle will remedy this deficit. Additionally we can use the equation to solve for $y$ as follows:

$x^{2} + y^{2} = 1 \Leftrightarrow y^{2} = 1 - x^{2} \Leftrightarrow \sqrt{y^{2}} = \sqrt{1 - x^{2}} \Leftrightarrow y = \sqrt{1 - x^{2}} \\ \Leftrightarrow f(x) = \sqrt{1 - x^{2}}$

Given these observations we can define $\textbf{ sin}, \; \textbf{cos }$ such that both are inter-definable. So we want to take the former as a function of the latter i.e.

$\text{Tentative Definition}: \textbf{sin} \; \theta = y = \sqrt{1 - x^{2}}$

but we need to define $\textbf{cos} \; \theta$ such that $\textbf{cos} \; \theta = x$ As such we will need to define a manner in which to uniquely specify $x$ so that the identity $\textbf{cos} \; \theta = x$ is a meaningful statement. Again we can use the properties of the unit circle to isolate the appropriate definition for $\textbf{cos} \; \theta$. We first prove a quick lemma.

###### Lemma: If an arc-length on the circumference of the unit circle is $Q$, then the area of  the sector determined by the arc-length and the origin is $\dfrac{Q}{2}$

This is almost trivial, as the area of the unit circle is $\pi r^{2} = \pi(1) = \pi$ so it follows that we calculate the area of the sector determined as $\dfrac{Q}{2\pi}\pi = \dfrac{Q}{2}$.

Exposition and Comment

This observation is crucial in that it provides us with a means of specifying the value of the coordinate on the $x$-axis determined by an arbitrary angle in terms of the understanding of area. Every sector determines an area, and each area can be given in terms of a sum. In particular, imagine that the sector is split in two by drawing a line down from the point $Q$ on the circumference to the x-axis below. This line splits the sector into a Pythagorean triangle of angle determined by $Q$ and a complementary region which falls under the unit circle. Hence the area can be calculated by summing the areas of both shapes.

This works for any arbitrary angle, so to specify the length of the x-axis we only need to pick the $x$ which would ensure the that the sector defined by the angle $Q$ describes an area of exactly $\dfrac{Q}{2}.$ Since the point $Q$ appears both on the circumference of the circle and the high point of our triangle we can use the lemma to uniquely identify the the $x$ component, thereby distinguishing an appropriate value for $\textbf{ cos} \; \theta$ and thus allowing us to complete the definition of $\textbf{sin} \; \theta.$ But for some angles we don’t even need this complicated machinery

An Example

For example if we were to specify the arc-length $Q = \dfrac{\pi}{4} = \dfrac{180}{4} = 45^{\circ}$ in circular coordinates ensures that the ray from the origin point defines a right triangle of $45^{\circ}$  we could reason by the Pythagorean theorem  to the conclusion that $x = y$ $= \dfrac{1}{ \sqrt{2}} = \textbf{cos} \theta = \textbf{sin} \theta$. This succeeds because:

$\textbf{sin} \theta = \sqrt{1 - x^{2}} = \sqrt{1 - (\dfrac{1}{\sqrt{2}})^{2}}$ $= \sqrt{1 - \dfrac{1^{2}}{ \sqrt{2}^{2}}}$ $= \sqrt{1 - \dfrac{1}{2}} = \sqrt{\dfrac{1}{2}} = \dfrac{\sqrt{1}}{\sqrt{2}} = \dfrac{1}{\sqrt{2}}$

This method works reasonably well for some of the positions on the circumference which define a neat angle but gets increasingly messy as the angles are less well behaved. This stems from the fact that ratios of $\pi$ tend to be transcendental numbers which do not admit description by algebraic means. Thankfully our lemma suggest a more general method of specifying the value of $\textbf{cos} \theta$ in terms of the $x$-coordinate of our Pythagorean triangle.

Returning to our Definition

Recall that the area of a triangle is determined to be $\dfrac{1}{2}bh$, so in our case we set $b = x, h = y$ which gives us:

$\dfrac{x\sqrt{1 - x^{2}}}{2}$

Now we need a method of calculating the area remaining. Fortunately, said area sits under a curve above the $x -axis$ and the function mapping the curve is given by $f(x) = \sqrt{1 - x^{2}}$ so we can simply use integration to approximate the area between the end of the triangle ($x$) and the end of the radius. This gives us:

$\int_{x}^{1} \sqrt{1 - x^{2}} dx$

Summing these two numbers will give us the area of the sector as a function of $x$,  so we write:

$A(x) =_{def} \dfrac{x \sqrt{ 1 - x^{2}}}{2} + \int_{x}^{1} \sqrt{1 - x^{2}} dx$

Now we are properly equipped to define $\textbf{cos } \theta$

$\text{ If } 0 \leq \theta \leq \pi \text{ then } \textbf{ cos} \theta \text{ is the unique number in } [-1, 1] \text{ such that } A(x) = \dfrac{\theta}{2} \text{ and} \textbf{ sin } \theta = \sqrt{1 - (\textbf{cos } \theta)^{2}}$

This definition is well defined because the composition of continuous functions are continuous as shown above and the mean value theorem ensures that $\textbf{ cos } \theta$ takes values for every possible value.

We now prepare to prove the relationship between $\textbf{sin}$ and $\textbf{ cos }$. Note that we can differentiate this function. Using the chain and product rules with the fundamental theorem of calculus we take the derivative. Note in particular that differentiating under the integral allows us to change the sign of the function when differentiating with respect to the initial value. We now see that:

$A^{'}(x) = \dfrac{1}{2} \Big[ x \cdot \dfrac{1}{2}(1 -x^{2})^{-1/2} \cdot - 2x + \sqrt{1- x^{2}} \Big] - \sqrt{1 - x^{2}}$ $= \dfrac{1}{2} \Big[ x \cdot \dfrac{ (1 - x^{2})^{-1/2} \cdot -2x}{2} + \sqrt{1 - x^{2}} \Big]$ $- \sqrt{1 - x^{2}}$ $= \dfrac{1}{2} \Big[ x \cdot \dfrac{-2x}{(2(1 - x^{2})^{1/2}} + \sqrt{1 - x^{2}} \Big] - \sqrt{1 - x^{2}}$ $= \dfrac{1}{2} \Big[ x \cdot \dfrac{-2x}{2 \sqrt{1 - x^{2}}} + \sqrt{1 - x^{2}} \Big] - \sqrt{1 - x^{2}}$ $= \dfrac{1}{2} \Big[ \dfrac{ -x^{2} + (1 - x^{2})}{\sqrt{1 - x^{2}}} \Big] - \sqrt{1 - x^{2}}$ $= \dfrac{1 - 2x^{2}}{2 \sqrt{1 - x^{2}}} - \sqrt{1 - x^{2}}$  $= \dfrac{1 - 2x^{2}}{2 \sqrt{1 - x^{2}}}$ $- \dfrac{2(1 - x^{2})^{1}}{2 \sqrt{1 - x^{2}}}$  $= \dfrac{1 - 2x^{2} - 2 -2x^{2}}{2 \sqrt{1 - x^{2}}} = \dfrac{-1}{2 \sqrt{1 - x^{2}}}$

##### Theorem: The derivative $\textbf{cos}'(\theta) = -\textbf{sin} \; \theta$ and the derivative $\textbf{sin}'(\theta) = \textbf{cos} \; \theta$.

$\textit{ Proof }$

This proof follows Spivak’s demonstration in his rightly famous $\textit{ Calculus }$ text. We take each in turn.

First define a function $B(x) : = 2A(x)$, then by the definition $\textbf{ cos } \theta = x$ such that $A(x) = \dfrac{\theta}{2}$, we can write $B(x) = \theta$ which means that $\textbf{ cos } \theta = x = B^{-1}(\theta)$

As above

$A'(x) = -\dfrac{1}{2\sqrt{1 - x^{2}}}$

from which it follows that

$B'(x) = - \dfrac{1}{\sqrt{1 - x^{2}}}$

Now the derivative of an inverse function $f^{-1}(x)$ can always be rewritten $\dfrac{1}{f'(f^{-1}(x))}$ So we have the following chain of reasoning:

$\textbf{ cos'} = x = \dfrac{d B^{-1}}{dx} = \dfrac{1}{B'(B^{-1}(x))}$

then applying the definition of $B'(x)$ above we get:

$\dfrac{1}{ - \dfrac{1}{\sqrt{1 - [B^{-1}(x)]^{2}}}}$ $= - \sqrt{(1 - (cos \theta)^{2})} = - \textbf{ sin} \theta$

Now since we know that $\textbf{ sin } \theta = \sqrt{ 1 - (\textbf{ cos } \theta)^{2}}$ we can take the derivative by judicious application of the chain rule as follows:

$\textbf{ sin'}(x) = \dfrac{ d (1 - (\textbf{ cos} \theta)^{2})^{-1/2}}{dx} =$  $\dfrac{-2 \textbf{cos} \theta \cdot \textbf{cos'} \theta }{2(1 - (\textbf{cos} \theta)^{2})^{1/2}}$ $= \dfrac{-2(\textbf{ cos} \theta) \cdot -\textbf{sin }\theta}{2 \sqrt{1 - (\textbf{cos } \theta)^{2}}}$

Then cancelling we get:

$\dfrac{ \textbf{cos} \theta \cdot \textbf{sin } \theta}{\textbf{ sin } \theta} = \textbf{cos } \theta$

as desired. This completes the proof.

# Notes on Independence: Monty Hall and Bayesian reasoning

Probability is difficult because it is hard to properly model the relationships between events, and the facts even when recognised are unintuitive. Conditional probability is a tool for assessing the probability of a hypothesis given some evidence. To represent this mathematically we have to distinguish over all possible events, and assess the frequency of when the hypothesis is confirmed within the subset of possible worlds where our evidence is deemed to have occurred. Getting this process correct requires subtle modelling considerations as we will see below. First examine the definition:

$\text{ A conditional probability space is a tuple } (W, \mathcal{F}, \mathcal{F'}, \textbf{P}) \text{ where } \mathcal{F} \times \mathcal{F'} \text{ is a Popper algebra over } W, \text { i.e. subsets of } W \times W \text{ such that (i) } F \text{ is an algebra over } W, \text{ (ii) } F' \neq \emptyset\subseteq F \text{ (iii) } F' \text{ is closed under supersets. The function } \textbf{P}: F \times F' \rightarrow [0, 1] \text{ satisfies the following conditions:}$

$1. \textbf{P}(H \mid H) = 1 \text{ if } U \in \mathcal{F'}$

$2. \textbf{P}(H \mid E) = \dfrac{\textbf{P}(H \cap E)}{\textbf{P}(E)} \text{ if } \textbf{P}(E) > 0 \text{ and } E \in \mathcal{F'}, H \in \mathcal{F}$

$3. \textbf{P}(H_{1} \cup H_{2} \mid E) = \textbf{P}(H_{1} \mid E) + \textbf{P}(H_{2} \mid E) \text{ if } H_{1} \cap H_{2} = \emptyset, H_{1}, H_{2} \in \mathcal{F} \text{ and } E \in \mathcal{F'}$

$\textbf{P}(H_{1} \cap H_{2} \mid E) = \textbf{P}(H_{1} \mid H_{2} \cap E) \times \textbf{P}(H_{2} \mid E) \text{ if } H_{2} \cap E \in \mathcal{F'}, H_{1} \in \mathcal{F}$

The idea is that we wish to test the probability of a given hypothesis on previously attained evidence. To begin we shall illustrate why it’s important to discern conditional probabilities, and then we shall showcase some of the properties useful for speeding the calculation of such considerations. After this we will continue to illustrate the infamous Monty Hall problem.

Consider the two claims regarding the first two draws in a game of Texas hold’em:

$E:$ The first card is an ace

$H:$ The second card is an ace

We know that each proposition considered individually is such that $\textbf{P}(H) = \dfrac{1}{13} = \textbf{P}(E)$ but the consecutive considerations of these claims are dependent, specifically we know that given $E$, then $H$ is less likely as the first ace has been withdrawn without replacement. In fact we can easily read off $\textbf{P}(H \mid E) = \dfrac{3}{51} = \dfrac{1}{17}.$ Hence the joint probability of $\textbf{P}(E \cap H) = \textbf{P}(E)\textbf{P}(H \mid E) =$ $\textbf{P}(\dfrac{1}{13}\cdot\dfrac{1}{17}) = \dfrac{1}{221}$ because the events are dependent.

In general it is crucial to see that the joint probability of two events can always be calculated in terms of conditional probability. To see this note the multiplication rule for assessing the probability of joint dependent events:

$\textbf{P}(H \cap E) = \textbf{P}(E) \dfrac{\textbf{P}(H \cap E)}{\textbf{P}(E)} = \textbf{P}(E)\textbf{P}(H \mid E)$

which allows us to calculate the probability of joint dependent events in terms of conditional probabilities. Another convenient rule is Bayes’ rule which is written $\textbf{P}(H \mid E) = \dfrac{ \textbf{P}(E \mid H)\textbf{P}(H)}{\textbf{P}(E)}$.

###### Theorem: Bayes’ Rule

$\dfrac{\textbf{P}(E \mid H)\textbf{P}(H)}{\textbf{P}(E)} = \dfrac{\textbf{P}(E \cap H)\textbf{P}(H)}{\textbf{P}(E)\textbf{P}(H)} = \dfrac{\textbf{P}(E \cap H)}{\textbf{P}(E)} = \textbf{P}(H \mid E)$

The rule can be rewritten as follows

$\textbf{P}(H \mid E) = \dfrac{ \textbf{P}(E \mid H)\textbf{P}(H)}{\textbf{P}(E)} = \dfrac{ \textbf{P}(E \mid H)\textbf{P}(H)}{\textbf{P}(E \cap H) \cup \textbf{P}(E \cap H^{c})}$  $= \dfrac{\textbf{P}(E \mid H)\textbf{P}(H)}{ \textbf{P}(E \mid H)\textbf{P}(H) + \textbf{P}(E \mid H^{c})\textbf{P}(H^{c})}$

We begin now with the idea that there are relations of probabilistic independence.

##### Independence on a CPS

We now define a relation of independence. Any pair of events which is not independent is taken to be dependent to some degree. The challenge, as always, is learning to specify what degree of dependence holds amongst observed events.

(Independence with respect a CPS)

Let $H$ and $T$ are probabilistically independent with respect to a conditional probability space ($W, \; \mathcal{F}, \; \mathcal{F}' \; \textbf{P})$ if $H \in$ $\mathcal{F}'$ implies that $\textbf{P}(T \mid H)$ = $\textbf{P}(T)$ and $T \in\mathcal{F}'$ implies $\textbf{P}(H \mid T) = \textbf{P}(H)$.

In some textbooks you often see what is known as the multiplicative definition of independence which states that two events $H$ and $T$ are probabilistically independent just when $\textbf{P}(H \cap T) = \textbf{P}(H)\textbf{P}(T).$ Fortunately we can show this to be equivalent to the above conditional definition.

###### Theorem: Equivalence of Multiplicative and Conditional Definitions of Independence

Suppose that $\textbf{P}(H \cap T) = \textbf{P}(H)\textbf{P}(T)$ and that $\textbf{P}(T) \neq 0$, just to avoid the trivial case. We want to show that $\textbf{P}(H \mid T) = \textbf{P}(H)$, but $\textbf{P}(H \mid T) = \dfrac{\textbf{P}(H \cap T)}{\textbf{P}(T)}$, which by assumption is the same as $\dfrac{\textbf{P}(H) \textbf{P}(T)}{\textbf{P}(T)}$. This cancels giving us $\textbf{P}(H)$. By the transitivity of identity we are done. For the other direction suppose that $\textbf{P}(H) = \textbf{P}(H \mid T).$ We want to show that $\textbf{P}(H \cap T) = \textbf{P}(H)\textbf{P}(T).$  But from $\textbf{P}(H \cap T)$, by the multiplication rule we have $\textbf{P}(H \mid T)\textbf{P}(T).$ By our initial assumption this is the same as $\textbf{P}(H)\textbf{P}(T)$, as desired. From this observation we can prove the following theorem.

###### Theorem: If $T$ and $H$ are independent then so are $T^{c}, H$ and $T, H^{c}$ and $T^{c}, H^{c}$.

We work with the multiplicative definition of independence. Observe:

$\textbf{P}(H^{c}\cap T) = \textbf{P}(T\setminus H) = \textbf{P}(T) - \textbf{P}(H \cap T) = \textbf{P}(T) - \textbf{P}(H)\textbf{P}(T) = (1 - \textbf{P}(H))\textbf{P}(T) = \textbf{P}(H^{c})\textbf{P}(T).$

Given this result the other equivalences follow by a similar proof. Another important observation is the chain rule.

###### Theorem: The Chain Rule

$\textbf{P}((H_{1}) \cap..... \cap(H_{n})) = \textbf{P}(H_{1})\textbf{P}(H_{2} \mid H_{1})\textbf{P}(H_{3}\mid H_{1} \cap H_{2})$….$\textbf{P}(H_{n} \mid H_{1} \cap.....\cap(H_{n-1})).$

The proof follows by observing that:

$\textbf{P}(H_{1})\textbf{P}(H_{2}\mid H_{1})\textbf{P}(H_{3}\mid H_{1}\cap H_{2})....\textbf{P}(H_{n}\mid H_{1}\cap.....\cap(H_{n-1}))$ $= \textbf{P}(H_{1})\dfrac{\textbf{P}(H_{2}\cap H_{1})}{\textbf{P}(H_{1})} \dfrac{\textbf{P}(H_{3}\cap H_{2}\cap H_{1})}{\textbf{P}(H_{1}\cap H_{2})}..... \dfrac{\textbf{P}(H_{n}\cap ... \cap(H_{1})}{\textbf{P}(H_{1})\cap....\cap(H_{n-1})}$

Multiplying out we see that for any conditional claim in the sequence the numerator cancels with the denominator of the previous conditional in the sequence. Hence, we are left with just the numerator of the nth conditional in our sequence. In other words, we have shown the equivalence as desired.

These results allow us to test the relations between two variables but what if we wish to test whether independence holds between two variables in light of a potential influence.

###### Definition (Conditional Independence with respect to a third parameter)

$H$ is probabilistically independent of $T$ when conditional on $E$ with respect to $\textbf{P}$: written $\textit{I}_{\textbf{P}}(H, T \mid E)$ if $\textbf{P}(T \cap E) \neq 0$ implies $\textbf{P}(H \mid T \cap E) = \textbf{P}( H \mid E).$ In words, learning $T$ is irrelevant to determining the value for $H$ given $E$.

The natural analogue of the multiplicative definition of independence in this setting the following claim: $H$ is independent of $T$ given $E$ iff $\textbf{P}(H \cap T \mid E) = \textbf{P}(H \mid E)\textbf{P}(T \mid E).$ Again this is provably equivalent to the above definition of conditional independence. This definition is often convenient for proving the following important properties:

1. Symmetry: If $\textit{I}_{\textbf{P}}(H, T \mid E)$ then $\textit{I}_{\textbf{P}}(T, H \mid E)$
2. Contraction: If $\textit{I}_{\textbf{P}}(H, T \mid E)$ and $\textit{I}_{\textbf{P}}(H, S \mid T \cup E)$ then $\textit{I}_{\textbf{P}}(H, T \cup S \mid E).$
3. Weak Union: If $\textit{I}_{\textbf{P}}( H, T \cup S \mid E)$ then $\textit{I}_{\textbf{P}}(H, T \mid S \cup E)$
4. Decomposition: If $\textit{I}_{\textbf{P}}(H, T \cup S \mid E)$ then $\textit{I}_{\textbf{P}}(H, T \mid E)$
5. Composition: If $\textit{I}_{\textbf{P}}(H, T \mid E)$ and $\textit{I}_{\textbf{P}}(H, S \mid E)$ then $\textit{I}_{\textbf{P}}(H, S \cup T \mid E).$

Symmetry states that if $H$ is independent of $T$ given $E$, then so too is $T$ independent of $H$ given $E$. Judea Pearl paraphrases Weak Union in terms of relevance stating that the axiom ensures that “learning irrelevant information $S$ cannot help irrelevant information $T$ become any more relevant to $H.$” The paraphrase is suggestive in that we might now think to read the Symmetry axiom as stating that if $T$ is irrelevant to $H$, then $H$ is irrelevant to $T$. Or analogously, if we learn nothing about $H$ from $T$ we can learn nothing about $T$ from $H.$ The Contraction axiom states that if we learn that some $S$ is irrelevant to $H$, after learning some irrelevant $T$, then both $S$ and $T$ together must also be irrelevant to $H$ before we learned $T$ also. Whereas Decomposition states that if $S$ and $T$ are together irrelevant for the truth of $H$, then they are also irrelevant separately.

We quickly show how to prove that Symmetry is valid on any CPS. Due to the nature of the independence predicate $\textit{I}_{\textbf{P}}$ as defined above, we need only prove an equivalence result. Assume $\textbf{P}(H \mid T \cap E) = \textbf{P}( H \mid E).$ We need to show that $\textbf{P}(T \mid H \cap E) = \textbf{P}( T \mid E).$

$\textbf{P}(T \mid H \cap E) = \dfrac{\textbf{P}(H \cap E \mid T)\textbf{P}(T)}{\textbf{P}(H \cap E)}$ by Bayes’ Rule.

$= \dfrac{\textbf{P}(T\cap E \cap H)}{\textbf{P}(H \cap E)}$ By the Multiplication rule and Rearranging.

$= \dfrac{\textbf{P}(E \mid T)\textbf{P}(H \mid E \cap T)\textbf{P}(T)}{\textbf{P}(H \mid E)\textbf{P}(E)}$ Rearranging and the Chain rule.

$= \dfrac{\textbf{P}(E \mid T)\textbf{P}(H \mid E)\textbf{P}(T)}{\textbf{P}(H \mid E)\textbf{P}(E)}$ By our Assumption.

$= \dfrac{\textbf{P}(E \mid T)\textbf{P}(T)}{\textbf{P}(E)}$ By Cancelling

$= \textbf{P}(T \mid E)$ by Bayes’ Rule.

So again, by the transitivity of identity Symmetry holds on any conditional probability space. The proofs for the other properties are analogous.

##### Monty Hall and Discovering Dependence

We now consider an example of a surprising discovery of dependence which indicates how probability does not gel well with our intuitions.

Imagine you are on a game show where you are asked to choose between three closed door ways. Behind one the doors is a prize, and behind the other two doors are goats. Making a decision you choose one of the doors, and then the gameshow host picks and opens another door revealing a goat. At this point you are asked whether you would like to change your mind or stick with your initial choice. This puzzle, as we shall see, demonstrates the counter-intuitive nature of probabilistic reasoning. Intuitively it seems that there is no advantage to switching as you are not obviously any wiser about what remains behind the remaining doors.

To model this correctly we will be considering four propositions. Name the three doors $a, b, c$ We have the following three propositions:

$A : \text{ The prize is behind door a }$

$B : \text{ The prize is behind door b }$

$C : \text{ The prize is behind door c }$

In the first round of the game we assume that the prize has been distributed at random behind one of the doors i.e.

$\textbf{P}(A) = \textbf{P}(B) = \textbf{P}(C) = \dfrac{1}{3}$.

Now without loss of generality we will stipulate that you have chosen door $a$ in the first round of the game and that our host has subsequently revealed the goat behind door $b$. So our considerations of the conditional probability of either $A, B, C$ will depend on:

$G_{b} : \text{ There is a goat behind door b }$

So in effect the puzzle before us questions whether the following equality holds:

$\textbf{P}(A \mid G_{b}) = \textbf{P}(C \mid G_{b})?$

In English, does our new information make us reevaluate the assumption of equality made in the first round? Why should it? We calculate all conditional statements using Bayes’ theorem. To do so we need to evaluate the following conditionals:

$\textbf{P}(G_{b} \mid A) = \dfrac{1}{2}$ because both options $b, c$ are available.

$\textbf{P}(G_{b} \mid B) = 0$ because the game would stop prematurely.

$\textbf{P}(G_{b} \mid C) = 1$ because $b$ is the only choice which doesn’t end the game prematurely.

In addition we need to consider the probability of $G_{b}$ which may be calculated using the law of total probability, which results in the fact that:

$\textbf{P}(G_{b}) = \textbf{P}(G_{b} \cap A) + \textbf{P}(G_{b} \cap B) + \textbf{P}(G_{b} \cap C) =$ $\textbf{P}(A)\textbf{P}(G_{b} \mid A) + \textbf{P}(B)\textbf{P}(G_{b} \mid B) + \textbf{P}(C)\textbf{P}(G_{b} \mid C)$ $= \dfrac{1}{3} \cdot \dfrac{1}{2} + \dfrac{1}{3} \cdot 0 + \dfrac{1}{3} \cdot 1 =$ $\dfrac{1}{6} + 0 + \dfrac{1}{3} = \dfrac{1}{6} + \dfrac{2}{6}$ $= \dfrac{3}{6} = \dfrac{1}{2}$

With this information we can finally come to consider the question at hand using Bayes’ theorem, for consider:

$\textbf{P}(A \mid G_{b}) = \dfrac{ \textbf{P}(G_{b} \mid A)\textbf{P}(A)}{\textbf{P}(G_{b})}$ $= \dfrac{ \dfrac{1}{2} \cdot \dfrac{1}{3}}{\dfrac{1}{2}}$ $= \dfrac{1}{3}$

while on the other hand:

$\textbf{P}(C \mid G_{b}) = \dfrac{ \textbf{P}(G_{b} \mid C)\textbf{P}(C)}{\textbf{P}(G_{b})} = \dfrac{ \dfrac{1}{3} \cdot 1 }{\dfrac{1}{2}} = \dfrac{2}{3}$

which resolves the question contrary to all intuition.

# Some Integration Techniques.

We show case an example of calculating Riemann sums. Take our standard example of $f(x) = x^{2}$, then we have:

$\int_{0}^{1} f(x) dx = \int_{0}^{1} x^{2} dx$

Taking the most righthand sum covers the entire area under the curve:

$\lim_{x \to \infty} \sum_{i = 1}^{n} x_{i}^{2} dx = \lim_{x \to \infty} \sum_{i = 1}^{n} x_{i}^{2} \cdot (x_{i} - x_{i -1})$

We must integrate over the intervals between 0 and  1 which is sub-divided into $n$ pieces. Ultimately because we have stipulated the sub-division of our interval into n-equal pieces, we know that $x = \dfrac{i}{n}$ depending on how much of the area we accumulate over:

$\sum_{i = 1}^{n} x_{i}^{2} \cdot (x_{i} - x_{i -1}) = \sum_{i = 1}^{n} (\dfrac{i}{n})^{2} \cdot \dfrac{1}{n}$

which becomes:

$\sum_{i = 1}^{n} (\dfrac{i^{2}}{n^{2}}) \cdot \dfrac{1}{n} = \sum_{i = 1}^{n} \dfrac{i^{2}}{n^{3}} = \dfrac{1}{n^{3}} \sum_{i = 1}^{n} i^{2}$

Now taking the limit:

$\lim_{n \to \infty} \dfrac{1}{n^{3}} \sum_{i = 1}^{n} i^{2}$

applying the sum of squares:

$\lim_{n \to \infty} \dfrac{1}{n^{3}} \dfrac{(n)(n+1)(2n+1)}{6}$

but noting that

$\dfrac{(n)(n+1)(2n+1)}{6} = \dfrac{(nn + 1n)(2n + 1)}{6} = \dfrac{(n^{2} + n)(2n+ 1)}{6} = \dfrac{2n^{3} +3n^{2} + n}{6}$

which as n converges to infinity becomes:

$\lim_{n \to \infty} \dfrac{1}{n^{3}} \cdot \dfrac{2n^{3} + 3n^{2} + n}{6} = \lim_{n \to \infty} \dfrac{2n^{3} +.... }{6n^{3}} = \dfrac{1}{3}$

This completes the calculation, and demonstrates the method of taking the Riemann sums.

##### Integration by Substitution of U

Take the integration problem

$\int \textbf{ sin } ax \; dx$

where $a$ is a constant. The method of u substitution makes implicit use of the fundamental theorem of calculus and the chain rule applied to derivatives. Specifically, the fact that anti-derivatives of a function can be used to express the integral of the function.

We walk through an example and then explain the method. First substitute $ax = u$, that is to say we take the “inside” function and give it a name. Taking the derivative we get:

$\lim_{\Delta x \to 0} \dfrac{a(x +\Delta x) - ax}{\Delta x} = \lim_{\Delta x \to 0} \dfrac{a}{1} \cdot \dfrac{x + \Delta x - x}{\Delta x}$

Multiplying both sides by $\dfrac{1}{a}$ it follows that $dx = \dfrac{1}{a} du$.

Substituting this into our original problem we eliminate $dx$ to get:

$\int \textbf{ sin } ax \; dx$ $= \dfrac{1}{a} \int \textbf{ sin } u \; du$

Then by the basic properties of the $\textbf{ sin }$ function, we get:

$-\dfrac{1}{a} \textbf{ cos } u + C = - \dfrac{1}{a} \textbf{ cos } ax + C$

The last equivalence follows by rewriting of the definition for uThis completes the calculation, but to understand why it works you need to see that our result is the anti-derivative of the function being integrated in our initial problem.

Recall the chain rule for derivatives is stated as follows:

$f(g(x)) = f'(g(x)) \cdot g'(x)$

so for any nested functions taken in the integral we simply apply:

$\int f'(g(x)) \cdot g'(x) dx$

and in our case above, we take $g(x) = ax = u$ and then taking the derivative of $u$, we get $du = g'(x) dx = a dx \cdot dx$ while $\int \textbf{ sin } x = - \textbf{ cos } x$

In this manner we transform for the nested functions by an inverted application of the chain rule holding $u = g(x)$ constant until the last steps, at which point we rewrite the functions in the general form of the anti-derivative

# Short Story: Celebrated

She woke possessed by the expectation of a promise owed. Stepping lightly to the floor she eased herself from the bed, back arched and head, craned way over her feet, peering out into the next room. Her body posed a question mark, head tilted and eyebrows arched she queried the open doorway. The apartment was tiny. The doorway from the bedroom lead to a hallway, with a bathroom to the left and a kitchenette to the right. At the end of the hall was a door which opened to the elevators. Some Ikea art distinguished the grey hallway from the grey of the bedroom. She ignored it all passing clumsily up the corridor seeking confirmation of a dream. Finding nothing, she settled for cereal and some coffee.

Throughout the day she felt irritated. All was as usual in work, yet she knew there ought to have been something wondrous happening. She was to be celebrated in some high fashion but she couldn’t place why, or worse, where. Puzzled, she peeked into boardrooms and closets, but became no wiser. Cynicism battled vain hope as the day dragged on. She knew better than to put weight on idle fancy, but self recriminations mixed with self-delusion. Babbling to anyone who’d listen she would range wildly. Am I addled? Would you let a dream disrupt your day? What do you think they’ve planned? I don’t expect anything big. Maybe the President would be there? Of course she knew better than to hope, but knowledge is no salve to desire. In explanation for the increasingly obvious absence she imagined a delay caused by a production of ever-growing size. These things take time, you know. Circus tents were too small to hold her hopes.

Each day she would wake and each morning she would tip-toe toward the kitchen, expectant. Each evening she would leave work anticipating a flashmob and fireworks. Each night she would go to bed eager for surprise crepes at breakfast. In between these moments life went on, and the things that happen to others happened to her too. There were career choices, lovers, cocktails, family deaths, new friends, offspring, love, groceries, rejection, regret and moments of happiness. Day after day normality reigned with a depressing persistence, but she consoled herself with the idea of a busy schedule. Eventually even hope scabbed over.

Decades later she came and went walking the roads by rote, stopping daily at the local park. She sat stooped on a bench beside her shopping and stared into the pond oblivious to all passing traffic. A habit honed over years. Although she paid them no attention ducks would calmly train past her toes and tourists would stop to capture the odd scene. The years had not changed her concerns but had tinged her attitudes. She remained fixated on the ever upcoming event, but now embittered sought only to somehow ruin the party. Day after day plans were formed and renounced until, committing, she knelt and picked up her first duckling. With slow deliberate effort she chewed the head from the shoulders. An act intended to forever vilify the actor. Afterwards, no longer really caring, she asked the imagined audience the only question that ever mattered. How do you like me now?

Six months and many meals later she died. A small petition went around town, a civil servant signed off, and a plaque commemorating her was inscribed on the bench in the park. Aside from the normal vital statistics and records of good works, the inscription celebrated her regular feeding of ducks.