The Words That Don't Move
Count to ten in Bengali: এক দুই তিন চার পাঁচ ছয় সাত আট নয় দশ (ek dui tin char panch chhoy shat at noy dosh).
Now count in Sanskrit: eka, dvi, tri, catur, pañca, ṣaṭ, sapta, aṣṭa, nava, daśa.
The family resemblance is obvious. Bengali numbers are Indo-Aryan, descended cleanly from Sanskrit through the Prākrits. Nobody disputes this.
Now count in Santali, the Austroasiatic language of the Santal people of Jharkhand, Odisha, and western Bengal: mit’, bar, pe, pon, mɔnɛ, turui, ɛae, irɛl, arɛ, gɛl.
A completely different system. No cognates with Bengali. No overlap. The counting was not shared.
But now ask: what is the Bengali word for unhusked rice? ধান (dhān). And in Sanskrit? Vrīhi. And in Santali? Dā — close enough to notice.
The numbers did not travel across the language boundary. The word for rice may have. This is not a coincidence. It is a principle of how languages change under contact — a principle that, once you understand it, lets you read Bengali as a record of deep history.
A Language is Not a Single Thing
When linguists talk about language change, they are really talking about several different phenomena happening at very different speeds in very different parts of a vocabulary.
The most useful tool for thinking about this is something called the Swadesh list — named for the American linguist Morris Swadesh (not Indian — he was Jewish), who in the 1950s proposed a list of roughly 200 meanings that appear in every human language and are relatively resistant to borrowing. The list includes things like: body parts (hand, eye, mouth, blood), basic kinship terms (mother, father, child), basic environmental terms (sun, moon, water, fire, stone), basic verbs (eat, drink, sleep, die, walk), and small numbers (one, two, three).
You do not need to know the word “Swadesh list” to use the concept. The idea is simple: some things get named very early in a language’s life, and the names are resistant to replacement because the concepts are so basic, so universal, so cognitively prior to everything else, that borrowing a new word feels almost impossible. Every human community has a word for “hand.” The word for “hand” in a language almost never gets replaced by a foreign borrowing, because the hand is so present, so constantly referenced, that any foreign word is immediately disfavored against the existing one.
Bengali numbers are clean Indo-Aryan because counting is cognitively prior to language contact. You do not adopt another counting system unless you have been so completely absorbed into a new community that your old system is gone. The Munda-speaking people who were in Bengal before the Indo-Aryan arrival did not pass their counting system to the new language. But their words for what they were counting — the crops they grew, the pots they used, the ground they farmed — those traveled.
The Stable Layer: What Bengali Inherited Clean from Sanskrit
Bengali’s most stable vocabulary is its Sanskrit and Prākrit inheritance. These words arrived with the Indo-Aryan expansion, were reinforced by prestige and religion and text, and have remained essentially unchanged for two thousand years:
Numbers: এক, দুই, তিন (ek, dui, tin) — pure Indo-Aryan.
Basic body parts: হাত (hat, “hand”), চোখ (chokh, “eye”), মুখ (mukh, “mouth”), রক্ত (rakto, “blood”).
The sky, basic elements: আকাশ (akash, “sky”), জল (jol, “water”), আগুন (agun, “fire”), পাথর (pathor, “stone”).
Basic kinship: মা (ma, “mother”), বাবা (baba, “father”) — though baba is actually Persian, which tells you something about how completely Persian saturated the domestic vocabulary after the Sultanate.
These words are the Indo-Aryan skeleton of Bengali. They came with the language and have stayed with it. When you meet someone who speaks Bengali as a second language and wants to sound educated, these are the words they reach for first, because they are the words that feel “correct” in formal contexts — they are the prestige layer.
But the prestige layer is not where the story is.
The Semi-Stable Layer: Where the Substrate Survives
Now we reach the interesting territory. There is a set of Bengali words that are not Indo-Aryan, not Persian, not anything with a clean traceable etymology — and they cluster in very specific semantic domains.
The first domain is agricultural subsistence and food production. This is where the words from the first post live: ঢেঁকি (dheki, “rice-husking lever”), হাঁড়ি (hari, “earthen pot”), ঝিঙে (jhinge, “ridge gourd”), ডাঙা (danga, “raised ground above flood level”). These words have probable Munda-language cognates and no convincing Sanskrit etymology. They survived because they were learned not from texts or teachers but from doing — from watching the ঢেঁকি work at dawn, from cooking in the হাঁড়ি every day, from knowing which ডাঙা to head for when the river came.
The Sanskrit word for the rice-husking apparatus is ulūkhala (for the mortar) and musala (for the pestle). These words survived in texts, in Sanskrit dictionaries, in Brahminical ritual contexts where rice is purified by Vedic procedure. But every woman who pounded rice every morning called the whole thing a ঢেঁকি (dheki) — and dheki is what her daughter learned, and her daughter’s daughter. The text word was replaced when people stopped reading texts in Sanskrit. The kitchen word was replaced by nothing, because no one replaced the kitchen.
This principle — word clusters resist change in proportion to how functionally embedded they are in daily life, not in proportion to prestige — is the key to reading Bengali’s layered vocabulary. Prestige vocabulary lives in texts. Texts are fragile. Kitchen vocabulary lives in bodies. Bodies are conservative.
The second domain is topography. The Rāṛh landscape — ডাঙা (danga, “highland”), ঝাড় (jhar, “forest, bush”), ধু (dhu, “flat expanse”), বিল (bil, “shallow seasonal lake”) — needed words when Austroasiatic-speaking people first inhabited it. Those words were learned by the landscape’s next inhabitants, and the next. The Sanskrit vocabulary for geographic features is generic and abstract; the deshi vocabulary of Bengal is specific to this particular alluvial-laterite-deltaic landscape in a way that Sanskrit was not, and never could be, because Sanskrit was not composed here.
The Onomatopoeia Zoo
There is a third domain of semi-stable vocabulary that deserves special attention: sound-mimicry, or onomatopoeia.
Bengali has an astonishing density of words that are pure acoustic representation of the world. Not just the classic examples — ঝমঝম (jhomjhom, heavy rain on a tin roof), টিপটিপ (tip tip, light drizzle) — but an entire vocabulary of texture and atmosphere:
খটখট (khatkhot) — the sound of a wooden door being knocked on, or a cart on a paved road.
ঘরঘর (ghorghor) — a rattling, grinding rumble; the sound of a bad engine, or of snoring.
কলকল (kolkol) — the sound of moving water, a stream over stones.
ছলছল (chhol-chhol) — glistening, shimmering; eyes full of tears about to spill.
ঝনঝন (jhon-jhon) — the sound of metal on metal, coins in a jar, bangles on a wrist.
These words are not borrowed from Sanskrit. They are not borrowed from Persian or English. They are made — constructed from sound to represent sound — and they are among the most stable words in the language precisely because they bypass abstract meaning. A word like dharma can be replaced by another word for moral duty. The sound of rain on a tin roof cannot be replaced: ঝমঝম (jhomjhom) is that sound, and any replacement would just be a worse version of the same word.
One hypothesis worth holding (not asserting) is that Bengali’s unusual density of onomatopoeic vocabulary reflects its position at the intersection of three substrate traditions — Austroasiatic (Munda), Tibeto-Burman (Bodo/Koch), and North Dravidian — each independently rich in oral and sonic culture. The people who lived in this landscape before Bengali arrived had developed elaborate acoustic vocabularies for their environments: the monsoon, the river, the forest. Bengali absorbed these not as conscious loans but as the natural vocabulary of a place, transmitted along with the knowledge of how to live there.
Echo-Words: Morphology as Fossil
There is a grammatical feature of Bengali that is, in linguistic terms, a fossil — a structural remnant of an earlier language preserved inside the later one.
Bengali has a productive echo-word formation. Any noun can be roughly doubled with a phonologically altered second copy to produce a meaning of “X and things of that type”:
ঘোড়াটোড়া (ghora-tora): from ঘোড়া (ghora, “horse”) — “horses and such things, animals of that kind”
কাপড়চোপড় (kapur-chopur): from কাপড় (kapur, “cloth”) — “clothes and such things”
মাছটাছ (mach-tach): from মাছ (mach, “fish”) — “fish and similar things”
ওষুধটোষুধ (oshudh-toshudh): from ওষুধ (oshudh, “medicine”) — “medicines and such things”
This is not a Sanskrit feature. Hindi has it weakly; Bengali has it with remarkable productivity — speakers generate new echo-pairs spontaneously and immediately. Linguists identify echo compounding as a characteristic feature of Austroasiatic languages, found across Munda and Mon-Khmer language families from eastern India through Southeast Asia. It entered Bengali as a grammatical pattern — not as a loanword but as a structural loan — and was absorbed so completely that it now feels natively Bengali.
The Munda speakers did not just leave words in Bengali. They left grammar.
Numeral Classifiers: A Grammar Fingerprint from the Northeast
Bengali has a feature that Hindi does not: numeral classifiers.
In English, you say “three fish.” In Bengali, you must say তিনটে মাছ (tinte mach) — literally “three [classifier] fish.” The particle টা / টি (ta/ti) is obligatory when counting most objects. There are several different classifiers, each for a different semantic category:
| Classifier | Phonetic | Used for |
|---|---|---|
| টা / টি | ṭā / ṭi | general objects (informal/formal) |
| খানা | khānā | flat objects — paper, cloth, tiles |
| জন | jon | people |
| গাছা | gāchā | long/rope-like objects |
This system is not Sanskrit. Classical Sanskrit has no classifier system. Modern Hindi has none. The classifier system is characteristic of Tibeto-Burman languages — languages like Bodo, Garo, and Koch — and also appears in Austroasiatic (Mon-Khmer) languages of Southeast Asia.
Bengali’s classifier system is evidence of prolonged contact with Tibeto-Burman speakers along the northern and northeastern frontier of the Bengal delta — the Bodo, Garo, Koch, and related communities who inhabited what is now North Bengal, Assam, and Meghalaya. Their grammatical influence was absorbed into Bengali so early and so completely that it is now invisible as a borrowing: every Bengali speaker uses classifiers automatically, without being aware that Sanskrit speakers never did.
The same classifier logic appears in Odia, though with somewhat different particles — a sign that the contact with Tibeto-Burman speakers was not narrowly Bengali but extended across the eastern Indo-Aryan zone. Maithili, by contrast, preserves more archaic features of the Māgadhī Prākrit ancestor that all these languages share: fuller case endings, a retained -a terminal vowel that Bengali long ago dropped. Hearing Maithili is a little like hearing what Bengali’s great-grandmother might have sounded like.
The Permeable Layer: What Gets Replaced
To complete the picture: the vocabulary domains that do get replaced under language contact are the prestige domains — the ones where a new elite wants to signal its authority.
Religious and ritual vocabulary was the first major replacement. Sanskrit flooded this domain when the Brahmanical order expanded eastward: dharma, karma, puja, mantra, yoga, brahmin, temple terminology. Earlier religious vocabulary — whatever the Munda and Dravidian and Tibeto-Burman speakers used for their spiritual practices — was largely displaced. Some survived in the tribal religious traditions that remained outside the Brahmanical synthesis, but they did not pass into the shared Bengali vocabulary.
Administrative and legal vocabulary was replaced again after 1204 CE, when the Sultanate established Persian as the language of governance in Bengal. দফতর (daftar, “office/record,” from Persian), জমিদার (zamindar, “landowner,” from Persian), সিপাহি (sipahi, “soldier,” from Persian), খাজনা (khajna, “tax,” from Persian) — the whole apparatus of state moved into Persian-derived vocabulary. This layer is still visible in the formal registers of Bengali bureaucratic speech.
Modern material culture is being replaced by English in real time: মোবাইল (mobile), ইন্টারনেট (internet), কম্পিউটার (computer). The same process, running at its current speed.
The kitchen remains, as always, the most conservative domain.
Reading Bengali as Archaeology
The image I want to leave you with is this: Bengali is a geological section.
If you cut through a hillside, you see the strata — the layers of deposition, each one a different era, each one sitting on top of what came before. Some layers are thick, some thin. Some are clearly defined at their boundaries; others blur into the layer below. A geologist can read the history of a landscape from the section.
Bengali vocabulary is the same. The substrate layer — the deshi words, the onomatopoeia, the echo-words, the classifiers — is the oldest visible stratum, the layer that predates the Indo-Aryan arrival. Above it: the Indo-Aryan layer, the Prākrit transformation, the Sanskrit reinforcement. Above that: the Persian-Arabic layer of the Sultanate and Mughal periods. Above that: the Portuguese colonial layer (ফর্সা, forsa, “fair-skinned”; আলমারি, almari, “cupboard”; বালতি, balti, “bucket”). Above that: the English layer we are still accumulating.
Every word in Bengali lives at a particular depth in this section. When you know which depth you are reading, you know something about when that concept entered this landscape, and from whom.
The words that don’t move — ঢেঁকি (dheki), ডাঙা (danga), হাঁড়ি (hari), ঝমঝম (jhomjhom) — are the oldest stratum. They are the floor. They are what was here before anything we recognize as “Bengali” arrived.
They are the words that have been patiently waiting for someone to ask about them.
| Layer | Origin | Examples | Stability |
|---|---|---|---|
| Substrate (Deshi) | Munda / Austroasiatic | ঢেঁকি, হাঁড়ি, ডাঙা, ঝিঙে | Highest — kitchen & body words |
| Indo-Aryan core | Sanskrit via Prakrit | মা, চোখ, জল, এক দুই তিন | Very high — basic vocabulary |
| Sanskrit prestige | Literary Sanskrit | ধর্ম, কর্ম, বৃক্ষ | High — textual tradition |
| Persian-Arabic | Sultanate / Mughal | দফতর, জমিদার, সিপাহি | Medium — administrative layer |
| Portuguese | Colonial trade | আলমারি, বালতি, ফর্সা | Low-medium — domestic objects |
| English | British rule / global | মোবাইল, ইন্টারনেট, কম্পিউটার | Lowest — still arriving |
Swadesh List Comparison: Numbers and Core Vocabulary
The numbers confirm the Indo-Aryan inheritance cleanly. But look down the water, fire, and mother columns — the eastern languages (highlighted) share forms that diverge sharply from both Sanskrit and the western Indo-Aryan branch.
| Language / ভাষা | Family | 1 | 2 | 3 | water | fire | mother | hand | eye |
|---|---|---|---|---|---|---|---|---|---|
| Bengali | Indo-Aryan | এক (ek) | দুই (dui) | তিন (tin) | জল (jol) | আগুন (agun) | মা (ma) | হাত (hat) | চোখ (chokh) / আঁখি (ankhi) |
| Sanskrit | Indo-Aryan | eka | dvi | tri | jala | agni | mātṛ | hasta | cakṣu / akṣi |
| Hindi | Indo-Aryan | ek | do | tīn | pānī | āg | mā | hāth | ānkh (akṣi) / nayan |
| Odia | Indo-Aryan | eka | dui | tini | jala | agni | mā | hāta | āṅkhi (akṣi) / cakṣu |
| Santali | Austroasiatic (Munda) | mit' | bar | pe | dak | sengel | ayo | bahu | mẽt |
| Ho | Austroasiatic (Munda) | mɪʔ | baria | pia | daa | siŋgel | ayo | ba | met |
| Oraon / Kurukh | North Dravidian | ondo | irind | muund | pani | ci | ayo | ba | ɳeʈ |
| Tibetan | Sino-Tibetan | gcig | gnyis | gsum | chu | me | a-ma | lag-pa | mig |
| Burmese | Sino-Tibetan | tiq | hniq | thounq | yei | mi | a-mé | leq | myeq |
| Vietnamese | Austroasiatic (Mon-Khmer) | một | hai | ba | nước | lửa | mẹ | tay | mắt |
Rows 5–7 (Santali, Ho, Oraon — highlighted) are Austroasiatic and Dravidian — the substrate languages that shaped Bengali before and during the Indo-Aryan contact period. Tibetan, Burmese, and Vietnamese are included for wider comparison: Tibeto-Burman languages contributed Bengali’s classifier system; Vietnamese shares Austroasiatic roots with Santali and Ho.
Sources
- Morris Swadesh. "Towards Greater Accuracy in Lexicostatistic Dating". International Journal of American Linguistics (1955). Vol. 21, No. 2, pp. 121–137 doi:10.1086/464321
- Suniti Kumar Chatterji. The Origin and Development of the Bengali Language (1926). Calcutta University Press. 2 vols. ↗
- Colin Masica. The Indo-Aryan Languages (1991). Cambridge University Press. ISBN 978-0521234207
- Franklin Southworth. Linguistic Archaeology of South Asia (2005). Routledge Curzon, London doi:10.4324/9780203412916
- George van Driem. Languages of the Himalayas (2001). Brill, Leiden. 2 vols. On Tibeto-Burman contact doi:10.1163/9789004492530
Next in this series: Three Ways of Knowing the Same People