Kinematic relations from symmetries in space and time

[Note: A PDF of the treatment below is available here. Some of the formatting might be cleaner in the PDF, since dedicated LaTeX compilers compile LaTeX code a little more nicely than WordPress.]

1.    Not-so-special relativity


1.1  Assumptions


Following Einstein, traditional derivations of kinematic relations in special relativity rely on the assumption that the speed of light is constant in vacuum. However, it is possible to derive the kinematic results of special relativity from a thought experiment with only the following assumptions:

(1) Experiments that are rotated with respect to each other (but are otherwise identical) give the same results.

(2) Experiments conducted at different times (but which are otherwise identical) give the same results.

(3) Experiments conducted in uniform relative motion (but which are otherwise identical) give the same results.

(4) Physical laws are expressed in terms of continuous functions.

(5) Physical laws are expressed in reference to a suitable coordinate system such that all of the above postulates are explicitly satisfied.

Property (1) is sometimes called “isotropy of space,” while property (2) is sometimes called “homogeneity of time.” Both of these properties are necessary in any derivation of special relativity, but are often considered so obvious as to not be indicated explicitly.

Statement (3) is called the “principle of relativity,” and is referred to as the “first postulate” in traditional derivations of special relativity. The veracity of this postulate accounts for our ability to pour drinks, push carts, and flush toilets while moving in an airplane at speeds of hundreds of miles per hour (relative to the ground), just as if we were sitting at home. Further, like assumptions (1) and (2), the assumption (4) is often not indicated explicitly.

Statement (5) bears some similarity to Newton’s first law, which states, “An object in an inertial frame moves at constant velocity unless acted on by an external force.” In other words, the Newtonian approach requires the existence of a special class of reference frames where the laws of physics take a particularly simple form. There are ways to avoid working with a specific coordinate system so that (5) can be dropped, but we won’t get into them here.

Absent from the list above is the usual “second postulate” of special relativity: that all observers in uniform motion measure the same speed for light in vacuum. Though this postulate is a well-established experimental fact, it strikes most people as strange at first glance. By contrast, the five postulates given above seem relatively easy to swallow. Because special relativity can be derived from just these assumptions, it is actually a very general model of kinematics.

Yet, none of the above have to be true a priori; in fact, postulate (5) is abandoned in general relativity, and it is essentially the restriction to certain coordinate systems that makes special relativity “special.” Having made ourselves aware of this, we’ll restrict our attention for the following discussion to the important cases in which all five of the above postulates hold.

1.2  Homogeneity of space


I will show that assumption (1) implies another important fact:

(1′) Experiments performed in different places are governed by the same physical laws.

This property is usually referred to as “homogeneity of space.” Before the proof of this statement, I would like to emphasize that (1′) does not imply (1). As an informal example, a uniform river might be described as homogeneous. However, it is not isotropic because downstream is fundamentally different from upstream.

The simplest proof of (1′) goes as follows: first suppose that some physical law is described in terms of a function f of position L. For any two points a and b, we can imagine an observer placed directly between them. By the isotropy of space (1), that observer shouldn’t see any difference between the two points, and will find that f(a) = f(b). Since this applies for any pair of points, it must be true that f(L) is a constant with respect to L. Therefore, experiments performed in different locations will give the same results.

This simple proof relies only on (1). However, it will be useful to use another picture to establish (1′), relying also on (4). Later, we’ll meet a similar situation where simple arguments like the one above are not adequate, and it will help to have already seen the approach required there in a more accessible context.

Suppose again that some physical law is described in terms of a function f of the position L. For simplicity, initially imagine that f is defined only along a single axis. If an observer is located at any position p along this axis, then isotropy requires that the observer find

f(p-x) = f(p+x)

for any x. In particular, for an observer at position p = a/2 measuring f a distance x = a/2 away, we see that f(a/2 -a/2) = f(a/2+a/2), or

f(0) = f(a).

By applying the same argument to an observer at position 3a/2, we find that

f(a) = f(2a).

Combining these results, we see that f(0) = f(a) = f(2a). In fact, we can keep moving the observer right or left by an amount a to obtain

              f(a) = f(na),          [1]

where n is any integer. Thus f(L) is periodic in a for all a.


Fig. 1: As a decreases, the set of points at which f(L) is constrained to be f(d) by Eq. 1 approaches a dense horizontal line

Supposing we know f(d) for some d \neq 0, Eq. 1 greatly constrains f(L). One might suspect from Fig. 1 that as a \rightarrow 0, periodicity in a will require that f(L) converge to a constant line. To confirm this, first consider L = \frac{n}{m} d with n and m nonzero integers, so that L is a rational multiple of d. Then

f(d) = f(nd) = f(mL) = f(L),

where we used Eq. 1 in the first and third equalities. Thus f(L) = f(d) for L any rational multiple of d. We would still like to know whether f(L) = f(d) for L any real multiple of d. To resolve this, we must use assumption (4) and require that f(L) is continuous in real values of the relative position L/d.

Any real number r can be defined by a sequence of rational numbers q_{1}, q_{2}, q_{3}, \ldots, q_{n}, \ldots in the following way:

r = \lim_{n \rightarrow \infty} q_{n}.

The most familiar example is the decimal expansion, where a real number is represented as c_{0}.c_{1}c_{2}\ldots, with c_{0} any integer and c_{1},c_{2},\ldots integers between 0 and 9, inclusive. In this example,

q_{n} = \sum_{k=0}^{n} \frac{c_{k}}{10^{k}}.

Since f is continuous by assumption (4), we can write

 f(r d)=\lim_{n \rightarrow \infty} f(q_{n} d) =\lim_{n \rightarrow \infty} f(d) = f(d),    [2]

where the second equality follows because we just showed that f(q_{n}d)=f(d) for q_{n} a rational number.

Thus f(x) = f(d) for x any real multiple of d and f(L) is a constant along the chosen axis. Since this axis was arbitrary, applying the same argument to all axes through a given point in space shows that f must be completely independent of position. We have now shown that, given assumptions (1) and (4), physical laws are the same at all positions in space, so statement (1′) holds.

In summary, the requirement that any experiment should be governed by the same physical laws whether it is oriented to the right or to the left quickly results in the constraint Eq. 1. At this point, we can intuitively see in Fig. 1 that the functions describing physical laws f(L) ought to be independent of position L. Rigorously demonstrating the validity of this intuition requires some care, but no new ideas are introduced in the process. We’ll use this procedure two more times in the remainder of this paper.

2.  The thought experiment


2.1  Defining and relating quantities


This treatment is taken from Achin Sen’s article “How Galileo could have derived the special theory of relativity,” American Journal of Physics 1994.

Consider a train moving in the +x-direction at constant speed relative to a platform. By statement (3), observers on the train and platform will agree on the magnitude of this relative speed v. Further, there is a fly moving in the +x-direction. We have two events: (1) The back of the train and the fly are at the back of the platform at the same instant in time; and (2) the front of the train and the fly are at the front of the platform at the same instant in time. The table below names important quantities, as they are measured by an observer A on the platform and an observer B on the train. Note that an observer always writes the lengths of objects at rest in her own frame without a prime. A diagram of this is shown in Fig. 2.


Fig. 2: Diagram of the observers and the fly


Observer Time interval Platform length Train length Fly speed
A T L l w
B t L' l' u

By definition, the speeds are

w = \frac{L}{T}

             u = \frac{l}{t}.       [3]

Further, the length of the train l’ as measured by the platform observer A is just the length of the platform minus the distance covered by the train in the time interval (as measured by A). Similarly, the length of the platform L’ as measured by B is just the length of the train plus the distance covered as the platform moves past the train (as measured by B). Thus

l' = L - vT

         L' = l + vt.        [4]

Readers already familiar with special relativity might be more comfortable with notation that consistently applies primes to measurements made by a single observer. Unfortunately, such notation would not be as illuminating in the present treatment since it would break the symmetry between observers A and B.


2.2  Relating length measurements


It is not immediately obvious how L and L’, or l and l’, are related to each other. A plausible guess would be L = L’ and l = l’. As we will see, this is permitted, but only as a special case of the general result.

To constrain the relationship between L’ and L, we will use assumption (3). Since the laws of physics are the same for all observers in uniform relative motion, any disagreement in the measurements by A and B must be reciprocal. In particular, if A obtains a larger value than B for the length of an object at rest with respect to her, then we expect that B obtains a correspondingly larger value than A for the length of an object at rest with respect to him. We may generally write this requirement as

           \frac{L'}{L}=\frac{l'}{l}=f(L,T,v,x,\tau),      [5]

where x is the position of the left-most edge of the platform and \tau is the time at which event (1) occurs, as determined by observer A. Using the homogeneity of space and time (principles (1′) and (2)), we can immediately remove the dependence of f on x and \tau. Then f = f(L,T,v). Bear in mind that we have written Eq. 5 from the platform observer’s perspective, but can also express f in terms of l and t when convenient.

A remark is in order before we continue. When we express a physical law in terms of a particular coordinate system and then require that the law take the same form everywhere, we are using a very particular notion of the law being “the same everywhere” (in particular, one consistent with postulate (5)). For example, in the usual spherical coordinates, the area carved out on the surface of a sphere in the region with polar coordinate between \theta and \theta + \Delta\theta and azimuthal coordinate between \phi and \phi + \Delta\phi is \sin\theta \, \Delta\theta \, \Delta\phi. Since this depends explicitly on the coordinate \theta, we will not consider this law to be “the same everywhere.” However, there is a very meaningful sense, which we will not develop here, in which the geometry of a sphere is the same everywhere. By requiring that physical laws expressed in terms of a particular coordinate system take a form that doesn’t explicitly depend on the coordinates, we are effectively requiring that each observer’s coordinates describe Euclidean space. Our arguments can be developed in a way that avoids this restriction, but the procedure for handling this more general case is astonishingly more cumbersome.

We may determine the L-dependence of f by comparing measurements of L’ made in two different ways. (1) First, measure L and transform it to the train frame using L’ = L f. (2) Alternatively, split L into n pieces of size a = L/n and transform the length of each piece to the train frame. In this case, the total distance L’ must be the sum of the transformed lengths of each piece. The first method allows us to write

L' = Lf(L) = na f(na),

where I am suppressing the explicit dependence of f on T and v. The second method gives

L' = a f(a) + a f(a) + \ldots + af(a) = na f(a).

Equating the right-most sides of the two equations above and dividing by na, we find that f(a) = f(na). Thus f is periodic in the length a, for all positive a. We have already seen this condition in Section 1.2. Following the same argument given there leads to the conclusion that f(L,T,v) = f(T,v) is in fact independent of length.

We can apply the same sort of reasoning to show that f is independent of T. Notice that dividing the lines in Eq. 4, together with Eq. 5, we obtain

           \frac{t}{T} = \frac{L'-l}{L-l'} = \frac{Lf-l}{L-lf}.       [6]

Since the laws of physics are the same for all observers in uniform relative motion, we must be able to write f(L,T,v) = f(l,t,–v) with the same functional form on the left and right, where the velocity is reversed on the right because the platform is moving in the –x-direction relative to the train. Thus the independence of f(L,T,v) on L implies that f(l,t,–v) is independent of l. Then for purposes of determining f, we might as well let l = 0 in Eq. 6 so that t = T f. This equation has the same form as L’ = L f. Applying an analogous argument as we did in that case, we find that f(T,v) = f(v) is also independent of T. Consequently, f is a function of at most v.

In summary, the fact that measurements of a length may be made all at once or bit by bit quickly results in the content displayed in Fig. 1, from which point it is clear that f should be a constant with respect to L. We might therefore say that the independence of f from L follows from our ability to physically demarcate parts of a length that add up to a whole length (for example, by painting benchmarks along the length). We can similarly demarcate parts of a time interval (for example, by breaking the time interval up into the intervals between reaching painted benchmarks) so that the same kind of argument demonstrates the independence of f from T. However, no such demarcation is possible in the case of a speed. This is already suggestive that f might depend nontrivially on v.


2.3  Velocity-dependence of f


We still need to determine the speed-dependence of f. Notice that we still have not used assumption (1) in its strongest form. Since we are considering a one-dimensional problem, a rotation of the experiment can only mean a reversal of directions. The only thing distinguishing right from left in this problem is the direction of the velocities v, w, and u. Clearly, these three speeds are not independent. By investigating the relationship between these quantities and insisting that this relationship is unchanged by a reversal of direction for every speed, we might find an additional constraint on f.

We may think of the speed w as being defined either as (1) L / T, or alternatively as (2) a composition of the speeds u and v. From this latter perspective, we would like to know the speed V_{FP} = w of the fly relative to the platform given the speed V_{FT} = u of the fly relative to the train and the speed V_{TP} = v of the train relative to the platform. In other words, we want to find the function \Phi such that

          w = V_{FP} = \Phi(V_{FT}, V_{TP}) = \Phi(u,v).      [7]

This same function will relate the speed V_{PF} = -w to the speeds V_{PT} = -v and V_{TF} = -u. Explicitly,

          -w = V_{PF} = \Phi(V_{PT},V_{TF}) = \Phi(-v,-u).      [8]

By the isotropy of space, we are free to reverse the velocities of all objects in any equation describing a physical law. Doing this in Eq. 8 yields

          w = \Phi(v,u).      [9]

Comparing Eq. 9 to Eq. 7, we see that w must depend symmetrically on u and v:

          w = \Phi(u,v) = \Phi(v,u).      [10]

We can now consider investigating w from the perspective of Eq. 3, w = L / T. In order to compare this definition with Eq. 10, we need to manipulate L / T into a form that explicitly displays dependence on u and v. Since w should be a symmetric function of u and v, we should be able to pull out a factor of u+v, with some remaining unitless factor. This unitless factor must be a function of f(v) and u/v, so we compute the following useful relations:

u + v = \frac{l+vt}{t} = \frac{Lf}{t}

          \frac{u}{v} = \frac{l}{vt} = \frac{l}{Lf-l}.      [11]

We now begin by factoring out u+v from w:

w = \frac{L}{T}

           = \frac{Lf}{t}\frac{t}{f T}

                                       = (u+v)\frac{Lf - l}{Lf - lf^{2}},      [12]

where in the last line I used Eq. 6 to remove t and T. To simplify the remaining unitless factor, we can substitute L f = l(1+(v/u)) from Eq. 11. We can see that all factors of l will cancel in the unitless factor, as they must:

w = (u+v)\frac{v/u}{1-f^{2}+(v/u)}

   = (u+v)\frac{1}{\frac{u}{v}(1-f^{2})+1}

= \frac{u+v}{1+\frac{u}{v}(1-f^{2})}.++++

Since w must be a symmetric function of u and v, and f is independent of u, the denominator in the last line must be of the form 1 + \frac{u}{v}(1-f^{2}) = 1+Kuv, where K is an experimentally determined constant with units of (speed)^{-2}. Thus 1-f^{2} = Kv^{2}, and

f(v) = \pm \sqrt{1-Kv^2}.

In the special case that observers A and B are at rest with respect to each other, all measurements must agree. We therefore take the positive solution for the square root above. Defining the speed c by c^{2}=1/|K|, we may write

           f(v) = \sqrt{1 \pm \left(\frac{v}{c}\right)^{2}}.      [13]

For infinite c, we obtain the intuitive result f = 1 regardless of our choice in the \pm.

Plugging Eq. 13 into the last line of Eq. 12, we obtain

           w_{\pm} = \frac{u+v}{1\mp \frac{u v}{c^{2}}},      [14]

where w_{+} correspond to a choice of + in Eq. 13 and w_{-} corresponds to a choice of — in Eq. 13. Note that the sign in the denominator is opposite the sign in the expression for f(v).

We will first consider w_{+}. If we let u = c, then w is undefined for v = c. We therefore insist that all observers must have speeds less than c. However, if u = v = c/2, then w = c / (3/4) > c, and we have found a way to compose velocities less than c to obtain a velocity greater than c. In order to escape a contradiction, we must insist that there can be no observers with velocity greater than c/2. Repeating this argument indefinitely, we see that observers have a maximal speed of 0, provided c is finite. Oddly, it appears that a universe would be self-consistent if it behaved in this way and there were objects with speeds w > 0, so long as those objects were not allowed to function as observers themselves. In any event, this appears to be a relatively uninteresting case, which obviously does not describe our universe. We will not study it further.

The only remaining possibility is

f(v) = \sqrt{1-\left(\frac{v}{c}\right)^{2}} = 1/\gamma

w(u,v) = \frac{u+v}{1+\frac{uv}{c^{2}}},                   [15]

where I have simply defined the Lorentz factor \gamma = 1/\sqrt{1-(v/c)^{2}} \geq 1.

From this, all of the usual results of special relativity follow. For example, Lorentz contraction is simply L' = L/\gamma, which tells us that moving rods are shorter by a factor of \gamma. Time dilation t=T/\gamma is given by Eq. 6 when l = 0, so that a moving observer present at two events measures a period between the events smaller by a factor of \gamma than an observer using synchronized stationary clocks. This is often summarized with the phrase, “moving clocks run slow.”

As a final example, we will establish the existence of a “universal speed limit” and compare its value for observers and non-observers. First, suppose u = c. Then Eq. 15 gives

w = \frac{c+v}{1+\frac{v}{c}} = c

independent of v. This means that if something travels at speed c with respect to one observer, then it must travel at speed c with respect to all observers. We have therefore derived the constancy of “the speed of light” c, with only the assumptions stated at the beginning and the additional assumption that the undetermined parameter c is finite. I emphasize that even though we must defer to measurements to determine the value of c, the structure of special relativity falls into place from only the stated assumptions about the way physical laws behave for different observers.

Now, what if we instead choose v = c? Since w is symmetric with respect to u and v, we already know that we must find w = c. However, we haven’t imposed any conditions on what serves as the “fly” for this scenario, since we did not specify u. This means that if an observer B travels at speed c with respect to another observer A, then everything must travel at speed c with respect to observer A. This bizarre behavior should perhaps not be surprising. After all, the Lorentz factor diverges when v = c so that length measurements are infinitely contracted and time measurements are infinitely dilated. It seems that observations do not really make sense when v = c. However, no paradoxes arise from supposing that non-observers can travel at speed c. In conclusion, observers must remain at speeds strictly less than c at all times, while non-observers may travel at speed c.

Is there a speed limit for non-observers? We can answer this by supposing that there were something moving relative to observer A with speed w > c. Then we can consider another observer moving at speed v = c^{2}/w < c and use the velocity composition formula

w = \frac{u+c^{2}/w}{1+u/w}.

Cross-multiplying, we see that w + u = u + c^{2}/w. Cancelling u on each side and multiplying by w, we find w^{2} = c^{2}. Since we initially assumed that w > c, this is a contradiction. It is therefore impossible that anything could travel at a speed greater than c relative to any observer.

In summary, we have just shown that (1) something travelling at speed c with respect to one observer travels at speed c with respect to all observers; (2) no observer may travel at or faster than speed c; and (3) nothing may travel faster than speed c.

The last result is sometimes called the “cosmic speed limit.” It is typically established by assuming that the laws of physics preserve causality (a special time-ordering of events that signals can travel between). It is interesting to note that our argument relied only on the velocity composition formula; causality need not be invoked.


3.  Comments and suggested reading


In this section I will make several general comments and refer the reader to various extensions of the discussion above.

The reader interested in pursuing special relativity from this perspective is encouraged to start with Sen, “How Galileo could have derived the special theory of relativity,” American Journal of Physics 1994. The approach used here is essentially identical to the one used there, but with different emphasis and more elaboration than in Sen’s paper.

It is worth noting that the value of c can be experimentally determined from measurements of length contraction and similar relativistic effects alone; no reference to light must be made for the theory of special relativity to be a complete and self-consistent theory of kinematics. In fact, given certain additional assumptions, the theory of electrodynamics can be viewed as a consequence of special relativity. For an especially clear discussion of this, see Relativity and Electricity by RS Elliott, IEEE Spectrum, March 1966.

As mentioned previously, we could relax our notion of homogeneity of space to allow for a more powerful and meaningful class of homogeneous spaces. This is described at length in Cacciatori, et al., “Special Relativity in the 21st Century,” Annalen der Physik, September-October 2008. The thorough historical introduction is very illuminating, but the reader is warned that the mathematical treatment is rather complex. Amazingly, the result of relaxing our notion of homogeneity is that we recover special relativity, with the additional feature of a cosmological constant (if you don’t know what that is, then don’t worry about it!).

We could also drop the assumption of isotropy of space. For instance, see Sonego and Pin, “Foundations of anisotropic relativistic mechanics,” J. Math. Phys., 2009. As with the previous example, dropping the assumption of isotropy complicates things extraordinarily. Though conceptually interesting, these results do not appear to have much broader application because space is apparently isotropic (at least where general relativity is not relevant).

I am not aware of any introductory textbooks that treat special relativity from this perspective. There are many good textbooks on special relativity that use a more historical approach. I originally learned the subject from Unit R in Thomas Moore’s Six Ideas that Shaped Physics series and highly recommend it.

I am grateful to William Sweeney for pointing out that the w_{+} velocity composition formula results in an upper speed limit of 0 for observers, and to Eric Dodds for providing insightful feedback on this discussion. Remaining errors are, of course, my own.

Leave a Reply

Your email address will not be published.

To write LaTeX output, type $latex [input]$ with [input] replaced by LaTeX input.