Jump to content

File:Zipf-heot-0 Hebrew - Books of the Torah.svg

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
From Wikipedia, the free encyclopedia

Original file (SVG file, nominally 512 × 504 pixels, file size: 3.14 MB)

Summary

Description
English: Zipf law plot (frequency as function of frequency rank) for the first five books (Torah, Pentateuch) of the Hebrew Bible. The original text is the Hebrew language version (the Masoretic text), with vowel points but without cantillation marks. That texts is a 10th century compilation of texts written probably around ~500 BCE, based on even earlier texts. The file was obtained from the Sacred Texts site, maintained by John B. Hare, and was converted to an ad-hoc single-byte encoding designed to look vaguely phonetic under an ISO-Latin-1 font.

The books and the respective word frequency files are:

  • Book 1, Bereis (Genesis). Sample: b¤°rë¡s¹ïy± b¤ârâ¡ ¡°êlöhïym ¡ë± häs¤¹âmäyïm w°¡ë± hâ¡ârêþ w°hâ¡ârêþ [...] wäy¤äçän°tw¤ ¡ö±wö wäy¤ïys²êm b¤â¡ârwön b¤°mïþ°râyïm. File hebr/tav/gen.1/gud.wfr (17211 words, N = 7212 distinct).
  • Book 2, Shmot (Exodus). Sample: w°¡ël¤êh s¹°mwö± b¤°nëy yïs²°râ¡ël häb¤â¡ïym mïþ°rây°mâh ¡ë± yä¿°äqöb [...] b¤wöl°¿ëynëy kâlb¤ëy±yïs²°râ¡ël b¤°kâlmäs°¿ëyhêm. File hebr/tav/exo.1/gud.wfr (13870 words, N = 5711 distinct).
  • Book 4, Bamidbar (Numeri). Sample: wäy°däb¤ër y°hwâh ¡êlmös¹êh b¤°mïd°b¤är sïynäy b¤°¡öhêl mwö¿ëd b¤°¡êçâd [...] b¤°¿är°bö± mwö¡âb ¿äl yär°d¤ën y°rëçwö. File hebr/tav/num.1/gud.wfr (13573 words, N = 5306 distinct).
  • Book 3, Vaykra (Leviticus). Sample: wäy¤ïq°râ ¡êlmös¹êh wäy°däb¤ër y°hwâh ¡ëlâyw më¡öhêl mwö¿ëd lë¡mör [...] ¡ê±mös¹êh¡êlb¤°nëy yïs²°râ¡ël b¤°här sïynây. File hebr/tav/lev.1/gud.wfr (9650 words, N = 3860 distinct).
  • Book 5, Devarim (Deuteronomium). Sample: ¡ël¤êh häd¤°bârïym ¡°äs¹êr d¤ïb¤êr mös¹êh ¡êlk¤âlyïs²°râ¡ël b¤°¿ëbêr [...] häm¤wörâ¡ häg¤âdwöl ¡°äs¹êr ¿âs²âh mös¹êh l°¿ëynëy k¤âlyïs²°râ¡ël. File hebr/tav/deu.1/gud.wfr (12007 words, N = 5455 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts are in the companion files */*/org/main.src. The extracted texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
Date
Source Own work
Author Jorge Stolfi

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

Zipf plot for the five books of the Hebrew Torah

Items portrayed in this file

depicts

15 May 2023

image/svg+xml

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current00:58, 16 May 2023Thumbnail for version as of 00:58, 16 May 2023512 × 504 (3.14 MB)Jorge StolfiUploaded own work with UploadWizard

The following page uses this file:

Metadata