C H A P T E R  16
First-Order Logic


16.1 Introduction

In the preceding chapter, we saw that it is often convenient to have synonyms in logical language. For example, we sometimes have nicknames for the same person - Michael, Mike. And, in elementary arithmetic, we frequently use different terms to refer to the same number - 2+2, 2*2, and s(s(s(s(0)))). In other words, we often have multiple names for the same object.

The opposite is also true. We sometimes find that there are objects in an application area for which we have no names at all. For example, while we have names for many real numbers - integers, decimal numbers, and some transcendental numbers, like pi and e, we do not have names for all of them because we have only countably many names and there can be uncountably many objects.

In this chapter, we consider a different logic that avoids this apparent limitation by allowing the universe of objects to vary independently of the space of ground terms in our language. The resulting logic is called First-Order Logic. We start by introducing the idea of a language-independent space of objects. Then we define a semantics that gives meaning to sentences without fixing in advance the space of objects. We also discuss how the limitation of a fixed universe can be circumvented without leaving the framework of Herbrand Logic.

16.2 Conceptualization

As we shall see, the semantics of First-Order Logic is based on the notion of a conceptualization of the world in terms of objects, functions, and relations.

The notion of an object used here is quite broad. Objects can be concrete (e.g. this book, Confucius, the sun) or abstract (e.g. the number 2, the set of all integers, the concept of justice). Objects can be primitive or composite (e.g. a circuit that consists of many sub-circuits). Objects can even be fictional (e.g. a unicorn, Sherlock Holmes, Miss Right). In short, an object can be anything about which we want to say something.

Not all knowledge-representation tasks require that we consider all the objects in the world; in some cases, only those objects in a particular set are relevant. For example, number theorists usually are concerned with the properties of numbers and usually are not concerned with physical objects such as resistors and transistors. Electrical engineers usually are concerned with resistors and transistors and usually are not concerned with buildings and bridges. The set of objects about which knowledge is being expressed is often called a universe of discourse.

As an example, consider the Blocks World scene in Figure 1. Most people looking at this figure interpret it as a configuration of toy blocks. Some people conceptualize the table on which the blocks are resting as an object as well; but, for simplicity, we ignore it here. The universe of discourse corresponding to this conceptualization is the set consisting of the five blocks in the scene.

A
B
D
C
E
 
Figure 1 - One state of Blocks World.

Although in this example there are finitely many elements in our universe of discourse, this need not always be the case. It is common in mathematics, for example, to consider the set of all integers or the set of all real numbers or the set of all n-tuples of real numbers, as universes with infinitely many elements.

An interrelationship among the objects in a universe of discourse is called a relation. Although we can invent many relations for a set of objects, in conceptualizing a portion of the world we usually emphasize some relations and ignore others. The set of relations emphasized in a conceptualization is called the relational basis set.

In a spatial conceptualization of the Blocks World, there are numerous meaningful relations. For example, it makes sense to think about the on relation that holds between two blocks if and only if one is immediately above the other. We might also think about the above relation that holds between two blocks if and only if one is anywhere above the other. The stack relation holds of three blocks that are stacked one on top of the other. The clear relation holds of a block if and only if there is no block on top of it. The bottom relation holds of a block if and only if that block is resting on the table.

One way of giving meaning to these intuitive notions in the context of a specific conceptualization of the world is to list the combinations of objects in the universe of discourse that satisfy each relation in the relational basis set.

For example, the on relation can be characterized as a set of all pairs of blocks that are on-related to each other. If we use the symbols a, b, c, d, and e to stand for the blocks in the scene shown above, then the table below encodes the on relation. There is a separate row in the table for each pair of blocks in which the first element is on the second.

ab
bc
de

The clear relation is a property of a block in and of itself, not with respect to other blocks. In this case, the table has just one column, and each row represents a block that is clear of other blocks.

a
d

The stack relation is a relationship among three blocks; and so the table requires three columns, as shown below. In this case, there is just one stack in the scene, and so there is just one row in the table.

a b c

The generality of relations can be determined by comparing their elements. Thus, the on relation is less general than the above relation, since, when viewed as a set of tuples, it is a subset of the above relation. Of course, some relations are empty (e.g. the unsupported relation), whereas others consist of all n-tuples over the universe of discourse (e.g. the block relation).

Logic allows us to capture the contents of such tables; and, more importantly, it allows us to assert relationships between these tables. For example, it allows us to say that, if one block is on a second block, then that block is above the second block. Also, the first block is not on the table. Also, the second block in not clear. Moreover, it allows us to say these things in general, i.e. out of the context of any particular arrangement of blocks.

One thing that our tabular representation for relations makes clear is that, for a finite universe of discourse, there is an upper bound on the number of possible n-ary relations. In particular, for a universe of discourse of size b, there are bn distinct n-tuples. Every n-ary relation is a subset of these bn tuples. Therefore, an n-ary relation must be one of at most 2^(bn) possible sets.

A function is a special kind of relation among the objects in a universe of discourse. For each combination of n objects in a universe of discourse (called the arguments), a function associates a single object (called the value) of the function applied to the arguments. Note that there must be one and exactly one value for each combination of arguments. Functions that are not defined for all combinations of arguments are often called partial functions; and functions for which there can be more than one value for a given combination of arguments are often called multi-valued functions. By and large, we will ignore partial and multi-valued functions in what follows.

Mathematically, a function can be viewed as a set of tuples of objects, one for each combination of possible arguments paired with the corresponding value. In other words, an n-ary function is a special kind of (n+1)-ary relation. For example, consider the unary function shown on the left below. We can view this function as a binary relation shown on the right.

a b
b a
c d
d c
e e
 
ab
ba
cd
dc
ee

In an analogous way, a binary function can be treated as a ternary relation; a ternary function can be treated as a 4-ary relation; and so forth.

Although the tabular representation for functions and relations is intuitively satisfying, it consumes a lot of space. Therefore, in what follows, we use a different way of encoding relational tables, viz. as sets of tuples, each tuple representing a single row of the corresponding table.

As an example, consider the on relation shown earlier. In what follows, we will treat the relation as the set of three 2-tuples shown below.

{[a, b], [b, c], [d, e]}

Since functions are relations, we could use the same representation. However, to emphasize the special properties of functions, we use a slight variation, viz. the use of an arrow in each tuple to emphasize the fact that the last element is the value of the function applied to the element or elements before the arrow.

Using this notation, we can rewrite the functional table shown earlier as the set shown below.

{ab, ba, cd, dc, ee}

Finally, before we get on with the details of First-Order Logic, it is worthwhile to make a few remarks about conceptualization. No matter how we choose to conceptualize the world, it is important to realize that there are other conceptualizations as well. Furthermore, there need not be any correspondence between the objects, functions, and relations in one conceptualization and the objects, functions, and relations in another.

In some cases, changing one's conceptualization of the world can make it impossible to express certain kinds of knowledge. A famous example of this is the controversy in the field of physics between the view of light as a wave phenomenon and the view of light in terms of particles. Each conceptualization allowed physicists to explain different aspects of the behavior of light, but neither alone sufficed. Not until the two views were merged in modern quantum physics were the discrepancies resolved.

In other cases, changing one's conceptualization can make it more difficult to express knowledge, without necessarily making it impossible. A good example of this, once again in the field of physics, is changing one's frame of reference. Given Aristotle's geocentric view of the universe, astronomers had great difficulty explaining the motions of the moon and other planets.

The data were explained (with epicycles, etc.) in the Aristotelian conceptualization, although the explanation was extremely cumbersome. The switch to a heliocentric view quickly led to a more perspicuous theory.

This raises the question of what makes one conceptualization more appropriate than another for knowledge formalization. Currently, there is no comprehensive answer to this question. However, there are a few issues that are especially noteworthy.

One such issue is the grain size of the objects associated with a conceptualization. Choosing too small a grain can make knowledge formalization prohibitively tedious. Choosing too large a grain can make it impossible. As an example of the former problem, consider a conceptualization of our Blocks World scene in which the objects in the universe of discourse are the atoms composing the blocks in the picture. Each block is composed of enormously many atoms, so the universe of discourse is extremely large. Although it is in principle possible to describe the scene at this level of detail, it is senseless if we are interested in only the vertical relationship of the blocks made up of those atoms. Of course, for a chemist interested in the composition of blocks, the atomic view of the scene might be more appropriate. For this purpose, our conceptualization in terms of blocks has too large a grain.

Finally, there is the issue of reification of functions and relations as objects in the universe of discourse. The advantage of this is that it allows us to consider properties of properties. As an example, consider a Blocks World conceptualization in which there are five blocks, no functions, and three unary relations, each corresponding to a different color. This conceptualization allows us to consider the colors of blocks but not the properties of those colors.

We can remedy this deficiency by reifying various color relations as objects in their own right and by adding a partial function - such as color - to relate blocks to colors. Because the colors are objects in the universe of discourse, we can then add relations that characterize them, e.g. pretty, garish, etc.

Note that, in this discussion, no attention has been paid to the question of whether the objects in one's conceptualization of the world really exist. We have adopted neither the standpoint of realism, which posits that the objects in one's conceptualization really exist, nor that of nominalism, which holds that one's concepts have no necessary external existence. Conceptualizations are our inventions, and their justification is based solely on their utility. This lack of commitment indicates the essential ontological promiscuity of logic: Any conceptualization of the world is accommodated, and we seek those that are useful for our purposes.

16.3 Semantics

An interpretation i is a mapping from the constants of the language into the elements of a conceptualization. We use the expression ∀i to refer to the universe of discourse corresponding to the conceptualization. Each object constant in our language is mapped into an element of the universe of discourse. Each function constant with arity n is mapped into an n-ary function on the universe of discourse. Each relation constant with arity n is mapped into an n-ary relation on the universe of discourse.

As an example, consider a First-Order language with object constants a, b, c, unary function constant f, and binary relation constant r.

As our universe of discourse, we take the set consisting of the numbers 1, 2, and 3. Admittedly, this is a very abstract universe of discourse. It is also very small. This has the advantage that we can write out examples in their entirety. Of course, in practical situations, universes of discourse are likely to be more concrete and much larger.

The equations shown below define one interpretation of our language in terms of this universe of discourse. As we did with Propositional Logic, we are using metalevel statements here. Remember that these are not themselves sentences in First-Order Logic.

i = {1, 2, 3}
ai = 1
bi = 2
ci = 2
fi = {1→2, 2→1, 3→3}
ri = {[1, 1], [1, 2], [2, 2]}

Note that more than one object constant can refer to the same object. In this case, both b and c refer to the number 2. Also, not every object in the universe of discourse need have a constant that refers to it. For example, there is no object constant that denotes the number 3. Note that the function denoted by f assigns one and only one value to every number in the universe of discourse. By contrast, in the relation r, some objects are paired with several other objects, whereas some objects do not appear at all.

A variable assignment for a universe of discourse ∀i is a mapping from the variables of the language into ∀i.

xi = 1
yi = 1
zi = 2

A value assignment based on interpretation i and variable assignment v is a mapping from the terms of the language into ∀i. The mapping must agree with i on constants, and it must agree with v on variables. The value of a functional term is the result of applying the function corresponding to the function constant to the objects designated by the terms.

aiv = 1
ziv = 2
f(a)iv = 2
f(z)iv = 1
f(f(z))iv = 2

The truth assignment based on interpretation i and variable assignment v is a mapping from the sentences of the language to true or false defined as follows.

Given an interpretation i and a variable assignment v, a relational sentence is true if and only if the tuple of objects denoted by the arguments is contained in the relation denoted by the relation constant.

For example, given the interpretation and the variable assignment presented above, we have the truth assignments shown below. The relational sentence r(a,b) is true since [1, 2] is a row in ri. The sentence r(z,a) is false since [2, 1] is not in ri.

r(a, b)iv = true
r(z, a)iv = false

The conditions for truth of logical sentences in First-Order Logic are analogous to those for the truth of logical sentences in Propositional Logic. A negation ¬φ is true if and only if φ is false. A conjunction (φ ∧ ψ) is true if and only if both φ and ψ are true. A disjunction (φ ∨ ψ) is true if and only if φ is true or ψ is true (or both). An implication is true unless the antecedent is true and the consequent is false. A reduction is true unless the antecedent is true and the consequent is false. An equivalence is true if and only if the truth values of the two embedded sentences are the same.

Intuitively, a universally quantified sentence is satisfied if and only if the embedded sentence is satisfied for all values of the quantified variable. Intuitively, an existentially quantified sentence is satisfied if and only if the embedded sentence is satisfied for some values of the quantified variable.

Sadly, expressing this intuition rigorously is a little tricky. First, we need the notion of a version of a variable assignment. Then, we can give a rigorous definition to our intuition by talking about some variations or all variations of a given variable assignment.

A version v:ν/x of a variable assignment v is a variable assignment that assigns object x as value of the variable ν and agrees with v on all other variables.

Suppose, for example, we had the variable assignment v shown below.

xv = 1
yv = 1
zv = 2

The version v:y/3 is shown below.

xv:y/3 = 1
yv:y/3 = 3
zv:y/3 = 2

With this notation, we can formalize the semantics of quantified sentences as follows. An interpretation i and a variable assignment v satisfy a universally quantified sentence ∀ν.φ if and only if i and v satisfy φ for all versions of v that can be defined over the same universe of discourse of i. An interpretation i and a variable assignment v satisfy an existentially quantified sentence ∃ν.φ if and only if i and v satisfy φ for at least one version of v that can be defined over the same universe of discourse of i.

An interpretation i and a variable assignment v are said to satisfy a sentence φ (written |=i φ[v]) if and only if the sentence φ is assigned the value true.

A sentence is satisfiable if and only if there is an interpretation and a variable assignment that satisfy it. A sentence is valid if and only if it is satisfied by every interpretation and variable assignment. A sentence is unsatisfiable if and only if it is satisfied by no interpretation and variable assignment.

An interpretation i is a model of a sentence φ (written |=i φ) if and only if the interpretation satisfies the sentence for every variable assignment (in other words, if |=i φ[v] for every variable assignment v). Note that, if an interpretation satisfies a closed sentence for one variable assignment, then it satisfies the sentence for every variable assignment (i.e. it is a model). Note also that, if an interpretation i satisfies an open sentence φ with free variable ν for every variable assignment, then it is a model of ∀ν.φ.

16.4 Blocks World

Let's revisit the Blocks World example from section 6.5. The typical Blocks World scene from chapter 6 is shown again in Figure 1.

A
B
D
C
E
 
Figure 1 - One state of Blocks World.

As in section 6.4, we adopt a vocabulary with five object constants.

{a, b, c, d, e}

Also as in chapter 6, let's consider the following literals that states whether the on relation holds for each pair of names. We will refer to the following set of sentences as Σ. (We have written the positive literals in black and the negative literals in grey in order to distinguish the two more easily.)

¬on(a,a)   on(a,b)   ¬on(a,c)   ¬on(a,d)   ¬on(a,e)
¬on(b,a)   ¬on(b,b)   on(b,c)   ¬on(b,d)   ¬on(b,e)
¬on(c,a)   ¬on(c,b)   ¬on(c,c)   ¬on(c,d)   ¬on(c,e)
¬on(d,a)   ¬on(d,b)   ¬on(d,c)   ¬on(d,d)   on(d,e)
¬on(e,a)   ¬on(e,b)   ¬on(e,c)   ¬on(e,d)   ¬on(e,e)

Here we illustrate one of the main differences between First-Order Logic and Herbrand Logic. Under Herbrand Logic, the set of five object constants fixes the universe of five objects. As a result, the above set of sentences completely defines the arrangement of blocks. That is, there is exactly one Herbrand Logic truth assignment over the language {a,b,c,d,e, on} that satisfies the above set of sentences. In contrast, under First-Order Logic, the set of five object constants are simply five names that refer to objects in a universe but do not constrain the universe in any way. As a result, under First-Order Logic, the set of sentences is satisfied by more than one interpretation. That is, the set of sentences accurately describes more than one arrangement of blocks and does not specify which one.

Here we look at several examples of interpretations that satisfy the sentences Σ.

An interpretation corresponding to arrangement shown in figure 1.
i= {o1, o2, o3, o4, o5}
ai = o1
bi = o2
ci = o3
di = o4
ei = o5
oni = {[o1,o2], [o2,o3], [o4,o5]}

Same five objects. Some objects have more than one name. (o1 and o2). Some objects have no names (o4 and o5).
i= {o1, o2, o3, o4, o5}
ai = o1
bi = o2
ci = o3
di = o1
ei = o2
oni = {[o1,o2], [o2,o3]}

Three objects.
i= {o1, o2, o3}
ai = o1
bi = o2
ci = o3
di = o1
ei = o2
oni = {[o1,o2], [o2,o3]}

Eight objects.
i= {o1, o2, o3, o4, o5, o6, o7, o8}
ai = o1
bi = o2
ci = o3
di = o1
ei = o2
oni = {[o1,o2], [o2,o3]}

Each of the three interpretations corresponds to a different world of blocks. But they all share a common feature; all three worlds have at least three blocks arranged in a stack. In fact, all the interpretations that satisfy Σ share this common feature.

Below are several example interpretations which do not satisfy set of sentences Σ.

i= {o1, o2, o3, o4, o5}
ai = o1
bi = o2
ci = o3
di = o4
ei = o5
oni = {[o2,o3], [o4,o5]}

This interpretation does not satisfy Σ because on(a,b) evaluates to false.

i= {o1, o2, o3, o4, o5}
ai = o1
bi = o2
ci = o3
di = o4
ei = o5
oni = {[o1,o1], [o1,o2], [o2,o3], [o4,o5]}

This interpretation does not satisfy Σ because ¬on(a,a) evaluates to false.

16.5 Arithmetic

In the introduction to this chapter, we mentioned as motivation the desire to write down properties satisfied by a class of arithmetic structures without first fixing the space of objects. Here we consider several properties of addition. We then examine some interpretations that satisfy the properties and some interpretations that do not.

Consider the language with the object constant 0, the function constant s, and the relation constant plus. Below are the the properties of addition we saw in section 6.7 (except that the same relation is replaced with equality). We call this set of sentences Γ

y.plus(0,y,y)
x.∀y.∀z.(plus(x,y,z) ⇒ plus(s(x),y,s(z)))
x.∀y.∀z.∀w.(plus(x,y,z) ∧ ¬(z=w) ⇒ ¬plus(x,y,w))

Below are some examples of interpretations that satisfy Γ.

Modular arithmetic with modulus 2
i = {0,1}
0i = 0
si = {0→1,1→0}
plusi = {[0,0,0],[1,0,1],[0,1,1],[1,1,0]}

Modular arithmetic with modulus 4
i = {0,1,2,3}
0i = 0
si = {0→1,1→2,2→3,3→0}
plusi = {[0,0,0],[1,0,1],[2,0,2],[3,0,3],[0,1,1],[1,1,2],[2,1,3],[3,1,0],[0,2,2],[1,2,3],[2,2,0],[3,2,1],[0,3,3],[1,3,0],[2,3,1],[3,3,2]}

Addition over the natural numbers
i = {0,1,2,3,4,...} (the natural numbers N)
0i = 0
si = {0→1,1→2,2→3,3→4,...} ({m→m+1 : m ∈ N)
plusi = {[m,n,k] : m,n,k ∈ N and k is the sum of m and n}

Nonstandard arithmetic over two objects, where 1+1=1
i = {0,1}
0i = 0
si = {0→1,1→1}
plusi = {[0,0,0],[1,0,1],[0,1,1],[1,1,1]}

Below are some examples of interpretations that do not satisfy Γ.

i = {0,1}
0i = 0
si = {0→1,1→0}
plusi = {[1,0,1],[0,1,1],[1,1,0]} (normal addition modulus 2, except that [0,0,0] is missing)
This interpretation does not satisfy Γ because ∀y.plus(0,y,y) evaluates to false.

i = {0,1}
0i = 0
si = {0→1,1→0}
plusi = {[0,0,0],[0,0,1],[1,0,1],[0,1,1],[1,1,0]} (normal addition modulus 2, except that [0,0,1] is added)
This interpretation does not satisfy Γ because (among other things), ∀x.∀y.∀z.∀w.(plus(x,y,z) ∧ ¬(z=w) ⇒ ¬plus(x,y,w)) evaluates to false.

16.6 Properties

Although the semantics of Herbrand Logic and First-Order Logic are different, many of the key concepts of the two logics are the analogous. Notably, the concepts of validity, contingency, unsatisfiability, and so forth have essentially the same definitions in First-Order Logic as in Herbrand Logic.

A sentence is satisfiable if and only if it is satisfied by at least one interpretation. A sentence is falsifiable if and only if there is at least one interpretation that makes it false. A sentence is unsatisfiable if and only if it is not satisfied by any interpretation, i.e. no matter what interpretation we take, the sentence is always false. A sentence is contingent if and only if it is both satisfiable and falsifiable, i.e. it is neither valid nor unsatisfiable.

16.7 Logical Entailment

A set of First-Order Logic sentences Δ logically entails a sentence φ (written Δ |= φ) if and only if every interpretation that satisfies Δ also satisfies φ.

As with validity and contingency and satisfiability, this definition is essentially the same for First-Order Logic as for Propositional Logic and Herbrand Logic.