The Words That Don't Move


Count to ten in Bengali: এক দুই তিন চার পাঁচ ছয় সাত আট নয় দশ (ek dui tin char panch chhoy shat at noy dosh).

Now count in Sanskrit: eka, dvi, tri, catur, pañca, ṣaṭ, sapta, aṣṭa, nava, daśa.

The family resemblance is obvious. Bengali numbers are Indo-Aryan, descended cleanly from Sanskrit through the Prākrits. Nobody disputes this.

Now count in Santali, the Austroasiatic language of the Santal people of Jharkhand, Odisha, and western Bengal: mit’, bar, pe, pon, mɔnɛ, turui, ɛae, irɛl, arɛ, gɛl.

A completely different system. No cognates with Bengali. No overlap. The counting was not shared.

But now ask: what is the Bengali word for unhusked rice? ধান (dhān). And in Sanskrit? Vrīhi. And in Santali? — close enough to notice.

The numbers did not travel across the language boundary. The word for rice may have. This is not a coincidence. It is a principle of how languages change under contact — a principle that, once you understand it, lets you read Bengali as a record of deep history.


A Language is Not a Single Thing

When linguists talk about language change, they are really talking about several different phenomena happening at very different speeds in very different parts of a vocabulary.

The most useful tool for thinking about this is something called the Swadesh list — named for the American linguist Morris Swadesh (not Indian — he was Jewish), who in the 1950s proposed a list of roughly 200 meanings that appear in every human language and are relatively resistant to borrowing. The list includes things like: body parts (hand, eye, mouth, blood), basic kinship terms (mother, father, child), basic environmental terms (sun, moon, water, fire, stone), basic verbs (eat, drink, sleep, die, walk), and small numbers (one, two, three).

You do not need to know the word “Swadesh list” to use the concept. The idea is simple: some things get named very early in a language’s life, and the names are resistant to replacement because the concepts are so basic, so universal, so cognitively prior to everything else, that borrowing a new word feels almost impossible. Every human community has a word for “hand.” The word for “hand” in a language almost never gets replaced by a foreign borrowing, because the hand is so present, so constantly referenced, that any foreign word is immediately disfavored against the existing one.

Bengali numbers are clean Indo-Aryan because counting is cognitively prior to language contact. You do not adopt another counting system unless you have been so completely absorbed into a new community that your old system is gone. The Munda-speaking people who were in Bengal before the Indo-Aryan arrival did not pass their counting system to the new language. But their words for what they were counting — the crops they grew, the pots they used, the ground they farmed — those traveled.


The Stable Layer: What Bengali Inherited Clean from Sanskrit

Bengali’s most stable vocabulary is its Sanskrit and Prākrit inheritance. These words arrived with the Indo-Aryan expansion, were reinforced by prestige and religion and text, and have remained essentially unchanged for two thousand years:

Numbers: এক, দুই, তিন (ek, dui, tin) — pure Indo-Aryan.

Basic body parts: হাত (hat, “hand”), চোখ (chokh, “eye”), মুখ (mukh, “mouth”), রক্ত (rakto, “blood”).

The sky, basic elements: আকাশ (akash, “sky”), জল (jol, “water”), আগুন (agun, “fire”), পাথর (pathor, “stone”).

Basic kinship: মা (ma, “mother”), বাবা (baba, “father”) — though baba is actually Persian, which tells you something about how completely Persian saturated the domestic vocabulary after the Sultanate.

These words are the Indo-Aryan skeleton of Bengali. They came with the language and have stayed with it. When you meet someone who speaks Bengali as a second language and wants to sound educated, these are the words they reach for first, because they are the words that feel “correct” in formal contexts — they are the prestige layer.

But the prestige layer is not where the story is.


The Semi-Stable Layer: Where the Substrate Survives

Now we reach the interesting territory. There is a set of Bengali words that are not Indo-Aryan, not Persian, not anything with a clean traceable etymology — and they cluster in very specific semantic domains.

Where the immovable words come from: Munda language zones

The first domain is agricultural subsistence and food production. This is where the words from the first post live: ঢেঁকি (dheki, “rice-husking lever”), হাঁড়ি (hari, “earthen pot”), ঝিঙে (jhinge, “ridge gourd”), ডাঙা (danga, “raised ground above flood level”). These words have probable Munda-language cognates and no convincing Sanskrit etymology. They survived because they were learned not from texts or teachers but from doing — from watching the ঢেঁকি work at dawn, from cooking in the হাঁড়ি every day, from knowing which ডাঙা to head for when the river came.

The Sanskrit word for the rice-husking apparatus is ulūkhala (for the mortar) and musala (for the pestle). These words survived in texts, in Sanskrit dictionaries, in Brahminical ritual contexts where rice is purified by Vedic procedure. But every woman who pounded rice every morning called the whole thing a ঢেঁকি (dheki) — and dheki is what her daughter learned, and her daughter’s daughter. The text word was replaced when people stopped reading texts in Sanskrit. The kitchen word was replaced by nothing, because no one replaced the kitchen.

This principle — word clusters resist change in proportion to how functionally embedded they are in daily life, not in proportion to prestige — is the key to reading Bengali’s layered vocabulary. Prestige vocabulary lives in texts. Texts are fragile. Kitchen vocabulary lives in bodies. Bodies are conservative.

The second domain is topography. The Rāṛh landscape — ডাঙা (danga, “highland”), ঝাড় (jhar, “forest, bush”), ধু (dhu, “flat expanse”), বিল (bil, “shallow seasonal lake”) — needed words when Austroasiatic-speaking people first inhabited it. Those words were learned by the landscape’s next inhabitants, and the next. The Sanskrit vocabulary for geographic features is generic and abstract; the deshi vocabulary of Bengal is specific to this particular alluvial-laterite-deltaic landscape in a way that Sanskrit was not, and never could be, because Sanskrit was not composed here.


The Onomatopoeia Zoo

There is a third domain of semi-stable vocabulary that deserves special attention: sound-mimicry, or onomatopoeia.

Bengali has an astonishing density of words that are pure acoustic representation of the world. Not just the classic examples — ঝমঝম (jhomjhom, heavy rain on a tin roof), টিপটিপ (tip tip, light drizzle) — but an entire vocabulary of texture and atmosphere:

খটখট (khatkhot) — the sound of a wooden door being knocked on, or a cart on a paved road.

ঘরঘর (ghorghor) — a rattling, grinding rumble; the sound of a bad engine, or of snoring.

কলকল (kolkol) — the sound of moving water, a stream over stones.

ছলছল (chhol-chhol) — glistening, shimmering; eyes full of tears about to spill.

ঝনঝন (jhon-jhon) — the sound of metal on metal, coins in a jar, bangles on a wrist.

These words are not borrowed from Sanskrit. They are not borrowed from Persian or English. They are made — constructed from sound to represent sound — and they are among the most stable words in the language precisely because they bypass abstract meaning. A word like dharma can be replaced by another word for moral duty. The sound of rain on a tin roof cannot be replaced: ঝমঝম (jhomjhom) is that sound, and any replacement would just be a worse version of the same word.

One hypothesis worth holding (not asserting) is that Bengali’s unusual density of onomatopoeic vocabulary reflects its position at the intersection of three substrate traditions — Austroasiatic (Munda), Tibeto-Burman (Bodo/Koch), and North Dravidian — each independently rich in oral and sonic culture. The people who lived in this landscape before Bengali arrived had developed elaborate acoustic vocabularies for their environments: the monsoon, the river, the forest. Bengali absorbed these not as conscious loans but as the natural vocabulary of a place, transmitted along with the knowledge of how to live there.


Echo-Words: Morphology as Fossil

There is a grammatical feature of Bengali that is, in linguistic terms, a fossil — a structural remnant of an earlier language preserved inside the later one.

Bengali has a productive echo-word formation. Any noun can be roughly doubled with a phonologically altered second copy to produce a meaning of “X and things of that type”:

ঘোড়াটোড়া (ghora-tora): from ঘোড়া (ghora, “horse”) — “horses and such things, animals of that kind”

কাপড়চোপড় (kapur-chopur): from কাপড় (kapur, “cloth”) — “clothes and such things”

মাছটাছ (mach-tach): from মাছ (mach, “fish”) — “fish and similar things”

ওষুধটোষুধ (oshudh-toshudh): from ওষুধ (oshudh, “medicine”) — “medicines and such things”

This is not a Sanskrit feature. Hindi has it weakly; Bengali has it with remarkable productivity — speakers generate new echo-pairs spontaneously and immediately. Linguists identify echo compounding as a characteristic feature of Austroasiatic languages, found across Munda and Mon-Khmer language families from eastern India through Southeast Asia. It entered Bengali as a grammatical pattern — not as a loanword but as a structural loan — and was absorbed so completely that it now feels natively Bengali.

The Munda speakers did not just leave words in Bengali. They left grammar.


Numeral Classifiers: A Grammar Fingerprint from the Northeast

Bengali has a feature that Hindi does not: numeral classifiers.

In English, you say “three fish.” In Bengali, you must say তিনটে মাছ (tinte mach) — literally “three [classifier] fish.” The particle টা / টি (ta/ti) is obligatory when counting most objects. There are several different classifiers, each for a different semantic category:

ClassifierPhoneticUsed for
টা / টিṭā / ṭigeneral objects (informal/formal)
খানাkhānāflat objects — paper, cloth, tiles
জনjonpeople
গাছাgāchālong/rope-like objects

This system is not Sanskrit. Classical Sanskrit has no classifier system. Modern Hindi has none. The classifier system is characteristic of Tibeto-Burman languages — languages like Bodo, Garo, and Koch — and also appears in Austroasiatic (Mon-Khmer) languages of Southeast Asia.

Bengali’s classifier system is evidence of prolonged contact with Tibeto-Burman speakers along the northern and northeastern frontier of the Bengal delta — the Bodo, Garo, Koch, and related communities who inhabited what is now North Bengal, Assam, and Meghalaya. Their grammatical influence was absorbed into Bengali so early and so completely that it is now invisible as a borrowing: every Bengali speaker uses classifiers automatically, without being aware that Sanskrit speakers never did.

The same classifier logic appears in Odia, though with somewhat different particles — a sign that the contact with Tibeto-Burman speakers was not narrowly Bengali but extended across the eastern Indo-Aryan zone. Maithili, by contrast, preserves more archaic features of the Māgadhī Prākrit ancestor that all these languages share: fuller case endings, a retained -a terminal vowel that Bengali long ago dropped. Hearing Maithili is a little like hearing what Bengali’s great-grandmother might have sounded like.


The Permeable Layer: What Gets Replaced

To complete the picture: the vocabulary domains that do get replaced under language contact are the prestige domains — the ones where a new elite wants to signal its authority.

Religious and ritual vocabulary was the first major replacement. Sanskrit flooded this domain when the Brahmanical order expanded eastward: dharma, karma, puja, mantra, yoga, brahmin, temple terminology. Earlier religious vocabulary — whatever the Munda and Dravidian and Tibeto-Burman speakers used for their spiritual practices — was largely displaced. Some survived in the tribal religious traditions that remained outside the Brahmanical synthesis, but they did not pass into the shared Bengali vocabulary.

Administrative and legal vocabulary was replaced again after 1204 CE, when the Sultanate established Persian as the language of governance in Bengal. দফতর (daftar, “office/record,” from Persian), জমিদার (zamindar, “landowner,” from Persian), সিপাহি (sipahi, “soldier,” from Persian), খাজনা (khajna, “tax,” from Persian) — the whole apparatus of state moved into Persian-derived vocabulary. This layer is still visible in the formal registers of Bengali bureaucratic speech.

Modern material culture is being replaced by English in real time: মোবাইল (mobile), ইন্টারনেট (internet), কম্পিউটার (computer). The same process, running at its current speed.

The kitchen remains, as always, the most conservative domain.


Reading Bengali as Archaeology

The image I want to leave you with is this: Bengali is a geological section.

If you cut through a hillside, you see the strata — the layers of deposition, each one a different era, each one sitting on top of what came before. Some layers are thick, some thin. Some are clearly defined at their boundaries; others blur into the layer below. A geologist can read the history of a landscape from the section.

Bengali vocabulary is the same. The substrate layer — the deshi words, the onomatopoeia, the echo-words, the classifiers — is the oldest visible stratum, the layer that predates the Indo-Aryan arrival. Above it: the Indo-Aryan layer, the Prākrit transformation, the Sanskrit reinforcement. Above that: the Persian-Arabic layer of the Sultanate and Mughal periods. Above that: the Portuguese colonial layer (ফর্সা, forsa, “fair-skinned”; আলমারি, almari, “cupboard”; বালতি, balti, “bucket”). Above that: the English layer we are still accumulating.

Every word in Bengali lives at a particular depth in this section. When you know which depth you are reading, you know something about when that concept entered this landscape, and from whom.

The words that don’t move — ঢেঁকি (dheki), ডাঙা (danga), হাঁড়ি (hari), ঝমঝম (jhomjhom) — are the oldest stratum. They are the floor. They are what was here before anything we recognize as “Bengali” arrived.

They are the words that have been patiently waiting for someone to ask about them.

Vocabulary Stratigraphy: Six Layers of Bengali
Layer Origin Examples Stability
Substrate (Deshi)Munda / Austroasiaticঢেঁকি, হাঁড়ি, ডাঙা, ঝিঙেHighest — kitchen & body words
Indo-Aryan coreSanskrit via Prakritমা, চোখ, জল, এক দুই তিনVery high — basic vocabulary
Sanskrit prestigeLiterary Sanskritধর্ম, কর্ম, বৃক্ষHigh — textual tradition
Persian-ArabicSultanate / Mughalদফতর, জমিদার, সিপাহিMedium — administrative layer
PortugueseColonial tradeআলমারি, বালতি, ফর্সাLow-medium — domestic objects
EnglishBritish rule / globalমোবাইল, ইন্টারনেট, কম্পিউটারLowest — still arriving
Layer Substrate (Deshi)
Origin Munda / Austroasiatic
Examples ঢেঁকি, হাঁড়ি, ডাঙা, ঝিঙে
Stability Highest — kitchen & body words
Layer Indo-Aryan core
Origin Sanskrit via Prakrit
Examples মা, চোখ, জল, এক দুই তিন
Stability Very high — basic vocabulary
Layer Sanskrit prestige
Origin Literary Sanskrit
Examples ধর্ম, কর্ম, বৃক্ষ
Stability High — textual tradition
Layer Persian-Arabic
Origin Sultanate / Mughal
Examples দফতর, জমিদার, সিপাহি
Stability Medium — administrative layer
Layer Portuguese
Origin Colonial trade
Examples আলমারি, বালতি, ফর্সা
Stability Low-medium — domestic objects
Layer English
Origin British rule / global
Examples মোবাইল, ইন্টারনেট, কম্পিউটার
Stability Lowest — still arriving

Swadesh List Comparison: Numbers and Core Vocabulary

The numbers confirm the Indo-Aryan inheritance cleanly. But look down the water, fire, and mother columns — the eastern languages (highlighted) share forms that diverge sharply from both Sanskrit and the western Indo-Aryan branch.

Core Swadesh vocabulary across eastern Indian language families
Language / ভাষা Family 1 2 3 water fire mother hand eye
BengaliIndo-Aryanএক (ek)দুই (dui)তিন (tin)জল (jol)আগুন (agun)মা (ma)হাত (hat)চোখ (chokh) / আঁখি (ankhi)
SanskritIndo-Aryanekadvitrijalaagnimātṛhastacakṣu / akṣi
HindiIndo-Aryanekdotīnpānīāghāthānkh (akṣi) / nayan
OdiaIndo-Aryanekaduitinijalaagnihātaāṅkhi (akṣi) / cakṣu
SantaliAustroasiatic (Munda)mit'barpedaksengelayobahumẽt
HoAustroasiatic (Munda)mɪʔbariapiadaasiŋgelayobamet
Oraon / KurukhNorth Dravidianondoirindmuundpaniciayobaɳeʈ
TibetanSino-Tibetangciggnyisgsumchumea-malag-pamig
BurmeseSino-Tibetantiqhniqthounqyeimia-méleqmyeq
VietnameseAustroasiatic (Mon-Khmer)mộthaibanướclửamẹtaymắt
Language / ভাষা Bengali
Family Indo-Aryan
1 এক (ek)
2 দুই (dui)
3 তিন (tin)
water জল (jol)
fire আগুন (agun)
mother মা (ma)
hand হাত (hat)
eye চোখ (chokh) / আঁখি (ankhi)
Language / ভাষা Sanskrit
Family Indo-Aryan
1 eka
2 dvi
3 tri
water jala
fire agni
mother mātṛ
hand hasta
eye cakṣu / akṣi
Language / ভাষা Hindi
Family Indo-Aryan
1 ek
2 do
3 tīn
water pānī
fire āg
mother
hand hāth
eye ānkh (akṣi) / nayan
Language / ভাষা Odia
Family Indo-Aryan
1 eka
2 dui
3 tini
water jala
fire agni
mother
hand hāta
eye āṅkhi (akṣi) / cakṣu
Language / ভাষা Santali
Family Austroasiatic (Munda)
1 mit'
2 bar
3 pe
water dak
fire sengel
mother ayo
hand bahu
eye mẽt
Language / ভাষা Ho
Family Austroasiatic (Munda)
1 mɪʔ
2 baria
3 pia
water daa
fire siŋgel
mother ayo
hand ba
eye met
Language / ভাষা Oraon / Kurukh
Family North Dravidian
1 ondo
2 irind
3 muund
water pani
fire ci
mother ayo
hand ba
eye ɳeʈ
Language / ভাষা Tibetan
Family Sino-Tibetan
1 gcig
2 gnyis
3 gsum
water chu
fire me
mother a-ma
hand lag-pa
eye mig
Language / ভাষা Burmese
Family Sino-Tibetan
1 tiq
2 hniq
3 thounq
water yei
fire mi
mother a-mé
hand leq
eye myeq
Language / ভাষা Vietnamese
Family Austroasiatic (Mon-Khmer)
1 một
2 hai
3 ba
water nước
fire lửa
mother mẹ
hand tay
eye mắt

Rows 5–7 (Santali, Ho, Oraon — highlighted) are Austroasiatic and Dravidian — the substrate languages that shaped Bengali before and during the Indo-Aryan contact period. Tibetan, Burmese, and Vietnamese are included for wider comparison: Tibeto-Burman languages contributed Bengali’s classifier system; Vietnamese shares Austroasiatic roots with Santali and Ho.

Sources

  1. Morris Swadesh. "Towards Greater Accuracy in Lexicostatistic Dating". International Journal of American Linguistics (1955). Vol. 21, No. 2, pp. 121–137 doi:10.1086/464321
  2. Suniti Kumar Chatterji. The Origin and Development of the Bengali Language (1926). Calcutta University Press. 2 vols.
  3. Colin Masica. The Indo-Aryan Languages (1991). Cambridge University Press. ISBN 978-0521234207
  4. Franklin Southworth. Linguistic Archaeology of South Asia (2005). Routledge Curzon, London doi:10.4324/9780203412916
  5. George van Driem. Languages of the Himalayas (2001). Brill, Leiden. 2 vols. On Tibeto-Burman contact doi:10.1163/9789004492530

Next in this series: Three Ways of Knowing the Same People