Statistical Science

1994, Vol. 9, No. 3, 429-438 (abridged) **Equidistant Letter Sequences in the Book of Genesis Doron Witztum, Eliyahu Rips and Yoav
Rosenberg**

Abstract. It has been noted that when the Book of Genesis is written as two-dimensional arrays, equidistant letter sequences spelling words with related meanings often appear in close proximity. Quantitative tools for measuring this phenomenon are developed. Randomization analysis shows that the effect is significant at the level of 0.00002.

Key words and phrases:Genesis, equidistant letter sequences, cylindrical representations, statistical analysis.

** **

**1. INTRODUCTION**

The phenomenon discussed in this paper was first discovered
several decades ago by Rabbi Weissmandel [7]. He found some
interesting patterns in the Hebrew Pentateuch (the Five Books of Moses),
consisting of words or phrases expressed in the form of equidistant letter
sequences (ELS's)--that is, by selecting sequences of equally spaced letters in
the text.

As impressive as these seemed, there was no rigorous way of determining if
these occurrences were not merely due to the enormous quantity of combinations
of words and expressions that can be constructed by searching out arithmetic
progressions in the text. The purpose of the research reported here is to study
the phenomenon systematically. The goal is to clarify whether the phenomenon in
question is a real one, that is, whether it can or cannot be explained purely on
the basis of fortuitous combinations.

The approach we have taken in this research can be illustrated by the
following example. Suppose we have a text written in a foreign language that we
do not understand. We are asked whether the text is meaningful (in that foreign
language) or meaningless. Of course, it is very difficult to decide between
these possibilities, since we do not understand the language. Suppose now that
we are equipped with a very partial dictionary, which enables us to recognise a
small portion of the words in the text: "hammer" here and "chair" there, and
maybe even "umbrella" elsewhere. Can we now decide between the two
possibilities?

Not yet. But suppose now that, aided with the partial dictionary, we can
recognise in the text a pair of conceptually related words, like "hammer" and
"anvil." We check if there is a tendency of their appearances in the text to be
in "close proximity." If the text is meaningless, we do not expect to see such a
tendency, since there is no reason for it to occur. Next, we widen our check; we
may identify some other pairs of conceptually related words: like "chair" and
"table," or "rain" and "umbrella." Thus we have a sample of such pairs, and we
check the tendency of each pair to appear in close proximity in the text. If the
text is meaningless, there is no reason to expect such a tendency. However, a
strong tendency of such pairs to appear in close proximity indicates that the
text might be meaningful.

Note that even in an absolutely meaningful text we do not expect that,
deterministically, every such pair will show such tendency. Note also, that we
did not decode the foreign language of the text yet: we do not recognise its
syntax and we cannot read the text.

This is our approach in the research described in the paper. To test whether
the ELS's in a given text may contain "hidden information," we write the text in
the form of two-dimensional arrays, and define the distance between ELS's
according to the ordinary two-dimensional Euclidean metric. Then we check
whether ELS's representing conceptually related words tend to appear in "close
proximity."

Suppose we are given a text, such as Genesis *(G)*. Define an
equidistant letter sequence (ELS) as a sequence of letters in the text whose
positions, not counting spaces, form an arithmetic progression; that is, the
letters are found at the positions

n,n+d,n+2d, ... ,n+(k-1)d.

We call *d* the *skip*, *n* the *start* and
*k* the *length* of the ELS. These three parameters uniquely
identify the ELS, which is denoted (*n*,*d*,*k*).

Let us write the text as a two-dimensional array--that is, on a single large page--with rows of equal length, except perhaps for the last row. Usually, then, an ELS appears as a set of points on a straight line. The exceptional cases are those where the ELS "crosses" one of the vertical edges of the array and reappears on the opposite edge. To include these cases in our framework, we may think of the two vertical edges of the array as pasted together, with the end of the first line pasted to the beginning of the second, the end of the second to the beginning of the third and so on. We thus get a cylinder on which the text spirals down in one long line.

Figures 2 and 3

It has been noted that when Genesis is written in this way, ELS's spelling
out words with related meanings often appear in close proximity. In Figure 1 we
see the example of 'patish-שיטפ' (hammer) and 'sadan-נדס' (anvil); in Figure 2,
'Zidkiyahu-והיקדצ'
(Zedekia) and 'Matanya-הינתמ' (Matanya), which was the
original name of King Zedekia (Kings II, 24:17). In Figure 3 we see yet another
example of 'hachanuka-הכונחה' (the Chanuka) and
'chashmonaee -יאנומשח'
(Hasmonean), recalling that the Hasmoneans were the priestly family that led the
revolt against the Syrians whose successful conclusion the Chanuka feast
celebrates.

Indeed, ELS's for short words, like those for 'patish-שיטפ' (hammer) and 'sadan-נדס' (anvil), may be expected
on general probability grounds to appear close to each other quite often, in any
text. In Genesis, though, the phenomenon persists when one confines attention to
the more "noteworthy" ELS's, that is, those in which the skip |*d*| is
*minimal* over the whole text or over large parts of it. Thus for
'patish-שיטפ' (hammer),
there is no ELS with a smaller skip than that of Figure 1 in all of Genesis; for
'sadan-נדס' (anvil),
there is none in a section of text comprising 71% of *G*; the other four
words are minimal over the whole text of *G*. On the face of it, it is
not clear whether or not this can be attributed to chance. Here we develop a
method for testing the significance of the phenomenon according to accepted
statistical principles. After making certain choices of words to compare and
ways to measure proximity, we perform a randomization test and obtain a very
small *p*-value, that is, we find the results highly statistically
significant.

2. OUTLINE OF THE PROCEDURE

In this section we describe the test in outline. In the
Appendix, sufficient details are provided to enable the reader to repeat the
computations precisely, and so to verify their correctness. The authors will
provide, upon request, at cost, diskettes containing the program used and the
texts *G*, *I*, *R*, *T*, *U*, *V* and
*W* (see Section 3).

We test the significance of the phenomenon on samples of pairs of related words (such as hammer-anvil and Zedekia-Matanya). To do this we must do the following:

(i) define the notion of "distance" between any two words, so as to lend meaning to the idea of words in "close proximity";

(ii) define statistics that express how close, "on the whole," the words making up the sample pairs are to each other (some kind of average over the whole sample);

(iii) choose a sample of pairs of related words on which to run the test;

(iv) determine whether the statistics defined in (ii) are "unusually small" for the chosen sample.

Task (i) has several components. First, we must define the notion of
"distance" between two given ELS's in a given array; for this we use a
convenient variant of the ordinary Euclidean distance. Second, there are many
ways of writing a text as a two-dimensional array, depending on the row length;
we must select one or more of these arrays and somehow amalgamate the results
(of course, the selection and/or amalgamation must be carried out according to
clearly stated, systematic rules). Third, a given word may occur many times as
an ELS in a text; here again, a selection and amalgamation process is called
for. Fourth, we must correct for factors such as word length and composition.
All this is done in detail in Sections A.1 and A.2 of the Appendix.

We stress that our definition of distance is not unique. Although there are
certain general principles (like minimizing the skip *d*) some of the
details can be carried out in other ways. We feel that varying these details is
unlikely to affect the results substantially. Be that as it may, we chose one
particular definition, and have, throughout, used *only* it, that is, the
function *c*(*w*,*w*') described in Section A.2 of the
Appendix had been defined before any sample was chosen, and it underwent no
changes. [Similar remarks apply to choices made in carrying out task (ii).]

Next, we have task (ii), measuring the overall proximity of pairs of words in
the sample as a whole. For this, we used two different statistics
*P*_{1}* *and *P*_{2} , which are defined
and motivated in the Appendix (Section A.5). Intuitively, each measures overall
proximity in a different way. In each case, a small value of
*P _{i}* indicates that the words in the sample pairs are, on the
whole, close to each other. No other statistics were

In task (iii), identifying an appropriate sample of word pairs, we strove for
uniformity and objectivity with regard to the choice of pairs and to the
relation between their elements. Accordingly, our sample was built from a list
of personalities (*p*) and the dates (Hebrew day and month) (*p*')
of their death or birth. The personalities were taken from the *Encyclopedia
of Great Men in Israel* [5].

At first, the criterion for inclusion of a personality in the sample was
simply that his entry contain at least three columns of text and that a date of
birth or death be specified. This yielded 34 personalities (the *first
list*--Table 1).
In order to avoid any conceivable appearance of having fitted the tests to the
data, it was later decided to use a fresh sample, without changing anything
else. This was done by considering all personalities whose entries contain
between 1.5 and 3 columns of text in the *Encyclopedia*; it yielded 32
personalities (the *second list*--Table 2). The
significance test was carried out on the second sample only.

Note that personality-date pairs (*p*,*p*') are not word pairs.
The personalities each have several appellations, there are variations in
spelling and there are different ways of designating dates. Thus each
personality-date pair (*p*,*p*') corresponds to several word pairs
(*w*,*w*'). The precise method used to generate a sample of word
pairs from a list of personalities is explained in the Appendix (Section A.3).

The measures of proximity of word pairs (*w*,*w*') result in
statistics *P*_{1}* *and *P*_{2} . As
explained in the Appendix (Section A.5), we also used a variant of this method,
which generates a smaller sample of word pairs from the same list of
personalities. We denote the statistics *P*_{1} and
*P*_{2}* *, when applied to this smaller sample, by
*P*_{3} and *P*_{4} .

Finally, we come to task (iv), the significance test itself. It is so simple
and straightforward that we describe it in full immediately.

The second list contains of 32 personalities. For each of the 32!
permutations p of these personalities, we define the
statistic *P*_{1}^{p}*
*obtained by permuting the personalities in accordance with p, so that Personality *i* is matched with the dates
of Personality p(*i*). The 32! numbers
*P*_{1}^{p} are ordered, with
possible ties, according to the usual order of the real numbers. If the
phenomenon under study were due to chance, it would be just as likely that
*P*_{1} occupies any one of the 32! places in this order as any
other. Similarly for *P*_{2}, *P*_{3}* *and
*P*_{4}. This is our null hypothesis.

To calculate significance levels, we chose 999,999 random permutations p of the 32 personalities; the precise way in which this was
done is explained in the Appendix (Section A.6). Each of
these permutations p determines a statistic
*P*_{1}^{p}; together with
*P*_{1}, we have thus 1,000,000 numbers. Define the *rank
order* of *P*_{1} among these 1,000,000 numbers as the number
of *P*_{1}^{p} not exceeding
*P*_{1}; if *P*_{1} is tied with other
*P*_{1}^{p}, half of these
others are considered to "exceed" *P*_{1}. Let r_{1} be the rank order of *P*_{1},
divided by 1,000,000; under the null hypothesis, r_{1} is the probability that *P*_{1}
would rank as low as it does. Define r_{2},
r_{3} and r_{4} similarly (using the same 999,999 permutations
in each case).

After calculating the probabilities r_{1}
through r_{4}, we must make an overall decision
to accept or reject the research hypothesis. In doing this, we should avoid
selecting favorable evidence only. For example, suppose that r_{3} = 0.01, the other r* _{i}* being higher. There is then the
temptation to consider r

More generally, for any given d, the probability
that at least one of the four numbers r* _{i}* is less than or equal to d is at most 4 d. This is known as
the Bonferroni inequality. Thus the overall significance level (or

3. RESULTS AND CONCLUSIONS

In Table 3, we list the rank order of each of the four
*P _{i}* among the 1,000,000 corresponding

We conclude that the proximity of ELS's with related meanings in the Book of Genesis is not due to chance.

** **

**APPENDIX: DETAILS OF THE PROCEDURE**

In this Appendix we describe
the procedure in sufficient detail to enable the reader to repeat the
computations precisely. Some motivation for the various definitions is also
provided.

In Section A.1, a "raw" measure of distance between words is defined. Section
A.2 explains how we normalize this raw measure to correct for factors like the
length of a word and its composition (the relative frequency of the letters
occurring in it). Section A.3 provides the list of personalities *p* with
their dates *p*' and explains how the sample of word pairs (*w*,
*w*') is constructed from this list. Section A.4 identifies the precise
text of Genesis that we used. In Section A.5, we define and motivate the four
summary statistics *P*_{1}, *P*_{2},
*P*_{3} and *P*_{4}. Finally, Section A.6
provides the details of the randomization.

Sections A.1 and A.3 are relatively technical; to gain an understanding of the process, it is perhaps best to read the other parts first.

**A.3 The Sample of Word Pairs**

The reader is referred to Section 2, task (iii), for a
general description of the two samples. As mentioned there, the significance
test was carried out only for the second list, set forth in Table 2. Note that
the personalities each may have several appelations (names), and there are
different ways of designating dates. The sample of word pairs (*w*,
*w*') was constructed by taking each name of each personality and pairing
it with each designation of that personality's date. Thus when the dates are
permuted, the total number of word pairs in the sample may (and usually will)
vary.

We have used the following rules with regard to Hebrew spelling:

1. For words in Hebrew, we always chose what is called the

grammatical orthography--"ktiv dikduki." See the entry "ktiv" in Even-Shoshan's dictionary [1].2. Names and designations taken from the Pentateuch are spelled as in the original.

3. Yiddish is written using Hebrew letters; thus, there was no need to transliterate Yiddish names.

4. In transliterating foreign names into Hebrew, the letter "alef-א" is often used as a

mater lectionis; for example, "Luzzatto" may be written "וטצול" or "וטאצול." In such cases we used both forms.

In designating dates, we used three fixed variations of the format of the
Hebrew date. For example, for the 19th of Tishri, we used ירשת ט'י, ירשת ט'יב and ירשתב ט'י. The 15th and 16th of
any Hebrew month can be denoted as ה'י or ו'ט and ו'י or ז'ט, respectively. We used both
alternatives.

The list of appellations for each personality was provided by Professor S. Z.
Havlin, of the Department of Bibliography and Librarianship at Bar Ilan
University, on the basis of a computer search of the "Responsa" database at that
university.

Our method of rank ordering of ELS's based on (*x*, *y*,
*z*)-perturbations requires that words have at least five letters to
apply the perturbations. In addition, we found that for words with more than
eight letters, the number of (*x*, *y*, *z*)-perturbed
ELS's which actually exist for such words was too small to satisfy our criteria
for applying the corrected distance. Thus the words in our list are restricted
in length to the range 5-8. The resulting sample consists of 298 word pairs (see
Table 2).

**A.4 The Text**

We used the standard, generally accepted text of Genesis
known as the *Textus Receptus*. One widely available edition is that of
the Koren Publishing Company in Jerusalem. The Koren text is precisely the same
as that used by us.

**A.6 The Randomizations**

The 999,999 random permutations of the 32 personalities
were chosen in accordance with Algorithm *P* of Knuth [4], page 125. The
pseudorandom generator required as input to this algorithm was that provided by
Turb-Pascal 5.0 of Borland Inter Inc. This, in turn, requires a seed consisting
of 32 binary bits; that is, an integer with 32 digits when written to the base
2. To generate this seed, each of three prominent scientists was asked to
provide such an integer, just before the calculation was carried out. The first
of the three tossed a coin 32 times; the other two used the parities of the
digits in widely separated blocks in the decimal expansion of p. The three resulting integers were added modulo
2^{32}. The resulting seed was 01001 10000 10011 11100 00101 00111 11.

The control text *R* was constructed by permuting the 78,064 letters
of *G* with a single random permutation, generated as in the previous
paragraph. In this case, the seed was picked arbitrarily to be the decimal
integer 10 (i.e., the binary integer 1010). The control text *W* was
constructed by permuting the words of *G* in exactly the same way and
with the same seed, while leaving the letters within each word unpermuted. The
control text *V* was constructed by permuting the verses of *G* in
the same way and with the same seed, while leaving the letters within each verse
unpermuted.

The control text *U* was constructed by permuting the words within
each verse of *G* in the same way and with the same seed, while leaving
unpermuted the letters within each word, as well as the verses. More precisely,
the Algorithm *P* of Knuth [4] that we used
requires *n* - 1 random numbers to produce a random permutation of
*n* items. The pseudorandom generator of Borland that we used produces,
for each seed, a long string of random numbers. Using the binary seed 1010, we
produced such a long string. The first six numbers in this string were used to
produce a random permutation of the seven words constituting the first verse of
Genesis. The *next* 13 numbers (i.e., the 7th through the 19th random
numbers in the string produced by Borland) were used to produce a random
permutation of the 14 words constituting the second verse of Genesis, and so on.

** **

**REFERENCES**

[1] EVEN-SHOSHAN, A. (1989). *A New
Dictionary of the Hebrew Language*. Kiriath Sefer, Jerusalem.

[2] FCAT (1986). The Book of Isaiah, file
ISAIAH.MT. Facility for Computer Analysis of Texts (FCAT) and Tools for
Septuagint Studies (CATSS), Univ. Pennsylvania, Philadelphia. (April 1986.)

[3] FELLER, W. (1966). *An Introduction
to Probability Theory and Its Applications ***2**. Wiley, New
York.

[4] KNUTH, D. E. (1969). *The Art of
Computer Programming ***2**. Addison-Wesley, Reading, MA.

[5] MARGALIOTH, M., ed. (1961).
*Encyclopedia of Great Men in Israel; a Bibliographical Dictionary of Jewish
Sages and Scholars from the 9th to the End of the 18th Century*
**1- 4. Joshua Chachik, Tel Aviv. [6] TOLSTOY, L. N. (1953) War and
Peace. Hebrew translation by L. Goldberg, Sifriat Poalim, Merhavia.
[7] WEISSMANDEL, H. M. D. (1958).
Torath Hemed. Yeshivath Mt. Kisco, Mt. Kisco. **