Zipf's Law

back to index

5 results

The Art of Computer Programming: Sorting and Searching
by Donald Ervin Knuth
Published 15 Jan 1998

A more typical sequence of probabilities, called "Zipf's law," has Pi = c/1, p2 = c/2, ..., pN = c/N, where c = 1/HN. (8) This distribution was popularized by G. K. Zipf, who observed that the nth most common word in natural language text seems to occur with a frequency approx- approximately proportional to 1/n. [The Psycho-Biology of Language (Boston, Mass.: Houghton Mifflin, 1935); Human Behavior and the Principle of Least Effort (Reading, Mass.: Addison-Wesley, 1949).] He observed the same phenomenon in census tables, when metropolitan areas are ranked in order of decreasing population. If Zipf's law governs the frequency of the keys in a table, we have immediately CN = N/HN; (9) searching such a file is about \ In N times faster than searching the same file with randomly ordered records.

Notice that this probability distribution is very similar to that of Zipf's law (8); as 9 varies from 1 to 0, the probabilities 6.1 SEQUENTIAL SEARCHING 401 vary from a uniform distribution to a Zipfian one. Applying C) to A3) yields CN = H?e)/H$-0) = j^ + OiN1-') « 0.122AT A4) as the mean number of comparisons for the 80-20 law (see exercise 8). A study of word frequencies carried out by E. S. Schwartz [see the interesting graph on page 422 of JACM 10 A963)] suggests that distribution A3) with a slightly negative value of 9 gives a better fit to the data than Zipf's law (8). In this case the mean value is substantially smaller than (9) as N —>¦ 00.

In fact, Cn is always less than ir/2 times the optimal value Cm [Chung, Hajela, and Seymour, J. Comp. Syst. Sci. 36 A988), 148-157]; this ratio is the best possible constant in general, since it is approached when pj is proportional to 1/j2. Let us see how well the self-organizing procedure works when the key prob- probabilities obey Zipf's law (8). We have _ _ 2 \k<i+»~~2 N 1 2N N 2 = 1 2=1 2=1 = \ +c(BN + 1)H2N -2N- 2{N + 1)HN + 2N) = ^+c(iVln4-lniV + O(l)) m2N/]gN, A8) by Eqs. 1.2.7-(8) and 1.2.7-C). This is substantially better than |jV, when N is reasonably large, and it is only about In 4 fh 1.386 times as many comparisons as would be obtained in the optimum arrangement; see (9).

pages: 299 words: 92,782

The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing
by Michael J. Mauboussin
Published 14 Jul 2012

More formally, a power law is expressed in the form: p(x) = Cx−α, where C and α are constants. The exponent, α, is often shown as positive, although it is negative. Since x is raised to the power of α, the distribution is called a power law. The value of the exponent is typically 2 < α < 3. See M. E. J. Newman, “Power Laws, Pareto Distributions, and Zipf's Law,” Contemporary Physics 46, no. 5 (September–October 2005): 323–351; and Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman, “Power-law Distributions in Empirical Data,” SIAM Review 51, no. 4 (2009): 661–703. 11. Matthew 13:12 from the King James version, www.kingjamesbibleonline.org/matthew-13-12. 12.

Muller, Thor, and Lane Becker. Get Lucky: How to Put Planned Serendipity to Work for You and Your Business. San Francisco, CA: Jossey-Bass, 2012. Myers, David G. Intuition: Its Powers and Perils. New Haven, CT: Yale University Press, 2002. Newman, M. E. J. “Power Laws, Pareto Distributions, and Zipf's Law.” Contemporary Physics 46, no. 5 (September–October 2005): 323–351. Nisbett, Richard, and Lee Ross. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall, 1980. Odean, Terrance. “Are Investors Reluctant to Realize Their Losses?” Journal of Finance 53, no. 5 (October 1998): 1775–1798.

pages: 339 words: 112,979

Unweaving the Rainbow
by Richard Dawkins
Published 7 Aug 2011

In Barlow's terms derived from the theory of codes, we could say that the nervous system uses short, economical words for messages that occur frequently and are expected; long, expensive words for messages that occur rarely and are not expected. It is a bit like language, in which (the generalization is called Zipf's Law) the shortest words in the dictionary are the ones most often used in speech. To push the idea to an extreme, most of the time the brain does not need to be told anything because what is going on is the norm. The message would be redundant. The brain is protected from redundancy by a hierarchy of filters, each filter tuned to remove expected features of a certain kind.

pages: 349 words: 114,038

Culture & Empire: Digital Revolution
by Pieter Hintjens
Published 11 Mar 2013

The biggest cost is probably the paper form one has to fill in, and the front office that types it in, and takes a copy of your ID "for security purposes." Now let's look at competitors. The largest competitor to Western Union is MoneyGram International, one tenth the size. There is a mathematical "power law" called Zipf's Law that models the distribution in natural systems such as free markets, earthquakes, cities in a country, and words in a language. Yes, all these follow the same rules of distribution. Normally, you'd expect the largest firm to be twice the size of its next competitor, three times the size of the one after, and so on.

pages: 651 words: 180,162

Antifragile: Things That Gain From Disorder
by Nassim Nicholas Taleb
Published 27 Nov 2012

French, Roger, 2003, Medicine Before Science: The Rational and Learned Doctor from the Middle Ages to the Enlightenment. Cambridge: Cambridge University Press. Froot, K. A., 2001, “The Market for Catastrophe Risk: A Clinical Examination,” Journal of Financial Economics 60(2–3): 529–571. Fujiwara, Y., 2004, “Zipf Law in Firms Bankruptcy.” Physica A: Statistical and Theoretical Physics 337: 219–30. Fukumoto, S., and T. J. Martin, 2009, “Bone as an Endocrine Organ.” Trends in Endocrinology and Metabolism 20: 230–236. Fuller, Steve, 2005, The Intellectual. Icon Books. García-Ballester, Luis, 1995, “Health and Medical Care in Medieval Galenism.”