The Words That Feel Most Bengali


There is a dish that Bengalis eat at home that I cannot translate.

মাছের টক ঝোল (maacher tok jhol, “fish in sour thin broth”) — the name itself dissolves into something pre-linguistic when I try to explain it to someone who did not grow up eating it. Not a curry, which implies thickness and richness. Not a soup, which implies Europe and a spoon held the wrong way.

A jhol is thin. Almost clear. A little turmeric, a little chili — cumin tempering for rohu or katla, mustard tempering for pabda or hilsa. Sour if it’s a tok jhol: the sourness from tamarind or raw mango or sometimes just the sharpness of mustard oil. The fish — always freshwater, always a little bony — barely holds together in it. And underneath everything, in east Bengal, there is a fermented dried-fish paste called শিদল (shidal) that functions the way fish sauce functions in Vietnamese cooking: invisible, essential, unremarkable to those who grew up with it.

I mention maacher tok jhol because it is where this whole series starts for me. Not with grammar or cognates or scholarly apparatus, but with a bowl of something that has no Indo-Aryan explanation. What strikes me is the structural resemblance to things I have eaten elsewhere in Southeast Asia — the sour-thin-fish logic that appears in Vietnamese canh chua, or the rice-noodle broth of Burmese mohinga with its ngapi fermented-fish paste doing exactly what shidal does. I cannot prove these are related. But the pattern is notable enough to sit with.

Living in Berlin, when north Indian restaurants no longer satisfy, I find myself reaching for tom yum soup — the closest approximation available. The same thin-sour logic, the same fish-forward transparency. Only the souring agent differs: lemongrass and lime instead of raw mango.

The thin-sour-fish food tradition is not an adaptation of anything that came from the northwest, from Sanskrit-speaking priests, from the plains of the Gangetic heartland. It is older than Bengali. It predates every text we have. And শিদল (shidal) never once appears in a Brahminical cooking manual. Because it was not their food. It was the food of the people who were here before their words arrived.

The deepest layers of Bengali culture are not where we usually look for them. They are in the kitchen. They are in the body. They are in the words that feel the most Bengali — which are, as I will argue, probably the oldest non-Bengali words in the language.


The Comfort Words

Let me give you an inventory. These are words I grew up with, words that feel bone-deep, words I did not learn so much as absorb:

ঢেঁকি dheki · rice-husking lever (dheki, “rice-husking lever”) — the tall wooden contraption, worked by foot, that women used every morning to pound rice in villages. I was a townie — but I did grow up with the proverb: ঢেঁকি স্বর্গে গেলেও ধান ভানে (dheki shorge geleo dhaan bhaane, “even if the rice-husking lever goes to heaven, it pounds paddy there”). Some things do not stop being what they are.

পটল (potol, “pointed gourd”) — a vegetable so Bengali it barely exists anywhere else. Try explaining it to someone from Delhi. (Sanskrit paṭola exists in Ayurvedic texts, so potol may be a tadbhava form rather than a deshi word; I include it for its cultural feel, not its etymological opacity.)

ঝিঙে jhinge · ridge gourd (jhinge, “ridge gourd”) — another one. You can gesture at it. You cannot translate the way it tastes on a Tuesday.

ডাঙা danga · highland, raised ground above flood line (danga, “highland, raised ground above the flood line”) — we used to play কুমির-ডাঙা (kumir-danga, “crocodile-land”) in the schoolyard: one child is the kumir (crocodile, “it”), and everyone else is safe only on danga (raised ground, any elevated surface). The whole hydrological anxiety of Bengal — floods, crocodiles, the treacherous lowland — compressed into a children’s game.

হাঁড়ি hari · earthen pot (hari, “earthen pot”) — the vessel for cooking, for carrying water, for storing grain. The most domestic object in the Bengali universe has this name.

And then the onomatopoeia, which is its own world. Bengali has an astonishing number of words that are pure sound mimicry:

ঝমঝম (jhomjhom) — the sound of heavy monsoon rain on a tin roof.

টিপটিপ (tip tip) — the sound of gentle drizzle, almost apologetic.

গুড়গুড় (gurgur) — the low rumble of distant thunder.

These are not words you find in Sanskrit dictionaries. These are not words that Indo-Aryan etymology explains. They feel irreducibly Bengali — impossible to trace, impossible to fully translate, inexplicable to a Hindi speaker who encounters them for the first time.

I sometimes wonder if this is why Bengali doctors are such valuable exports to the Western world. The language has an extraordinary vocabulary for differentiating kinds of discomfort — every gradation of pain, fatigue, and minor suffering has its own precise word. That granularity shapes how a doctor listens. A quantitative and qualitative empathy for pain, if you will, built into the mother tongue.

And that, it turns out, is exactly the clue.


The Provocation

Here is the argument I want to make in this first post, because it reframes everything that follows in this series:

The words that feel most Bengali are probably the oldest non-Bengali words in the language.

The comfort words, the kitchen words, the words for topography and tools and rain — these are the residue. They are what survives when a language is replaced but a people are not. They are the ghost vocabulary of populations who lived in Bengal for thousands of years before any Indo-Aryan speaker arrived, whose languages were gradually displaced and then swallowed whole, but whose words — the ones used every morning in the kitchen, at the ঢেঁকি dheki · rice-husking lever, at the ঢাক and ঢোল (dhak and dhol, the festival drums), in the fields, in the rain — refused to leave.

To understand why this happens, you need to understand something about how vocabulary works under language contact. Technical terms from prestige domains — religion, law, philosophy, astronomy — get replaced fast. The elites adopt the new language’s terms, and the old words disappear within a generation or two. But kitchen vocabulary is different. Kitchen vocabulary is transmitted from the older cook to their younger family members in the middle of cooking, not in classrooms or temples. The word for the earthen pot is what you say when you tell a child to pick it up. It is not taught; it is demonstrated. Prestige vocabulary lives in texts. Kitchen vocabulary lives in bodies.

The Sanskrit word for the rice-husking apparatus is ulūkhala (for the mortar) and musala (for the pestle). These words survived — in texts, in Sanskrit dictionaries, in Brahminical ritual contexts. But every Bengali villager who pounded rice every morning called the whole contraption a ঢেঁকি dheki · rice-husking lever. And dheki is not Sanskrit. It has no Sanskrit etymology. It is, in the best current assessment of historical linguistics, a word inherited from the Austroasiatic-speaking peoples who were in Bengal before the Indo-Aryan languages arrived — peoples who spoke languages related to modern Munda languages like Santali, Mundari, and Ho, which survive today in the highlands of Jharkhand and western Odisha, spoken by communities who call themselves the original inhabitants of the land. They have oral traditions that say so. Genetic studies — particularly work by David Reich’s lab on ancient South Asian population history and Paul Sidwell’s 2018 synthesis on Austroasiatic origins — are beginning to corroborate the broad outline: a male-biased migration of Austroasiatic speakers into eastern India, arriving from the east, roughly 4,000 years ago.


The Vocabulary Breakdown

The linguist Suniti Kumar Chatterji published his monumental Origin and Development of the Bengali Language (abbreviated ODBL) in 1926 — still the foundational scholarly work on Bengali historical linguistics. His count of the Bengali lexicon produced a figure that is worth sitting with:

Approximately 44% of Bengali vocabulary is tatsama — তৎসম (tôtsôm, “same as that”) — words taken directly from Sanskrit with minimal alteration. Dharma. Karma. বৃক্ষ (briksha, “tree”). These are the prestige loans, the Brahminical layer, the words that came with the priests and the texts.

The remaining roughly 56% falls into two other categories: tadbhava — তদ্ভব (tôdbhôb, “derived from that”) — words that started in Sanskrit or Prakrit but have been modified beyond clean recognition — and deshi — দেশী (deşi, “of the land”) — words of unknown or non-Aryan origin that simply exist in Bengali without any Sanskrit explanation.

The Bengali Lexicon by Origin

That deshi category is where the comfort words live. ঢেঁকি dheki · rice-husking lever. হাঁড়ি hari · earthen pot. ঝিঙে jhinge · ridge gourd. ডাঙা danga · highland, raised ground. The words your grandmother used every day.


The Munda Layer

Linguists use the term “substrate” for the vocabulary, sounds, and grammatical features that survive in a language from an earlier language that was displaced. Bengali has a substantial Austroasiatic substrate, specifically from Munda languages, and a secondary substrate from Tibeto-Burman languages (the Bodo/Koch family) in the north and northeast of the region.

“Austroasiatic” is a language family that today includes the Munda languages of eastern India (Santali, Mundari, Ho) and the Mon-Khmer languages of Southeast Asia (Vietnamese, Khmer, Mon). The name signals the hypothesis that these languages once occupied a much larger territory stretching from India through Southeast Asia, before being displaced by later migrations. The Munda speakers of Jharkhand and Odisha are their westernmost living remnant in South Asia.

One structural feature worth noting: both Austroasiatic and Tibeto-Burman languages lack grammatical gender entirely. And the eastern Indian languages born from Magadhi Prakrit — Bengali, Assamese, Odia, Bhojpuri — all shed grammatical gender too, in stark contrast to Hindi and the western Indo-Aryan languages. Whether this is substrate influence or coincidence is debated, but the pattern runs deep.

Austroasiatic languages: the substrate beneath Bengali

The following words are HIGH-confidence candidates for Munda loanwords in Bengali — these appear in serious scholarly work on Indo-Aryan substrate studies, with no plausible Sanskrit etymology and with phonological or semantic parallels in Munda languages:

BengaliPhoneticMeaningSemantic domain
ঢেঁকিdhekirice-husking leverFood production
ঠাঙি / টাঙ্গিtangiaxeTools
ঝাড়jharforest, bushTopography
ডাঙাdangahighland, raised groundTopography
হাঁড়িhariearthen potDomestic objects
ঝিঙেjhingeridge gourdFood

Notice the semantic domains. These are not words for abstract concepts. They are not words for divine attributes or celestial bodies or philosophical categories. They are words for: where the trees are, what the ground is like, what you cook in, what you eat, what you use to process grain, what you cut with. These are the words of daily survival — and they are the words that did not get replaced.


The jh- Pattern

There is a phonological observation worth making here, with appropriate caution attached. Several proposed Munda loanwords and deshi words in Bengali share an initial jh- onset — that breathy, voiced cluster: ঝাড় (jhar), ঝিঙে (jhinge), and the food words ঝাল (jhal, “spicy/sharp”) and ঝোল (jhol, “thin broth”) that appear everywhere in Bengali cooking vocabulary.

The pattern is plausible as an Austroasiatic phonological inheritance. But I want to be direct about what the scholarly situation actually is: the specific etymologies of jhāl and jhol are genuinely unresolved. They are not from Sanskrit. They are possibly substrate. Their precise origin is not yet demonstrated to scholarly satisfaction. What can be said is that these words exist in a phonological neighborhood that looks consistent with what Munda substrate words look like. That is a pattern observation, not an etymology.

The same epistemic care applies to the onomatopoeia. ঝমঝম (jhomjhom), টিপটিপ (tip tip), গুড়গুড় (gurgur) — these are not loanwords in the ordinary sense. Sound-mimicry vocabulary is often independently invented in every language. But the density of this vocabulary in Bengali is notable. One plausible reading is that Bengali sits at the intersection of three substrate traditions — Austroasiatic (Munda), Tibeto-Burman (Bodo/Koch), and traces of North Dravidian — each independently rich in oral and sonic culture, and that this convergence produced a language with an unusual appetite for capturing texture and sound in words.


Echo-Words: A Morphological Fingerprint

One feature of Bengali that linguists point to as likely Austroasiatic in origin is the echo-word or reduplication pattern. Bengali has a productive construction where any noun can be approximately doubled with a phonetically altered second copy to mean “X and things of that type”:

ঘোড়াটোড়া (ghora-tora, from ghora, “horse”) — “horses and such things”

কাপড়চোপড় (kapur-chopur, from kapur, “cloth”) — “clothes and such things”

মাছটাছ (mach-tach, from mach, “fish”) — “fish and similar things”

This pattern — technically called echo compounding — is highly characteristic of Austroasiatic languages and is found across Munda and Mon-Khmer language families. It is not a Sanskrit or Indo-Aryan feature. It got absorbed so deeply into Bengali morphology that speakers generate new echo-pairs spontaneously, without thinking. It is not a loan of vocabulary; it is a loan of grammatical structure. The Munda speakers left their morphological DNA inside the language even after the language itself was gone.

Bengali also has what linguists call numeral classifiers — the particles you must use when counting certain types of nouns. In English, you say “three fish.” In Bengali, you say তিনটে মাছ (tinte mach) — literally “three [classifier for objects] fish.” This classifier system (ṭā, ṭi, khānā, jon, and others) is another feature not found in Sanskrit but common across Tibeto-Burman and Austroasiatic language families. One more structural inheritance from the substrate.


The Odia Connection

A brief note that will matter more in later posts: the thin-sour-spicy food axis I described — jhol, jhal, tok (টক, “sour”), ombol (অম্বল, “sourness; a sour preparation”) — has structural cognates in Odia as well. The same thin broth, the same fermented-fish sensibility, the same balance of sourness and heat appear in Odia cooking with similar vocabulary. This is not a case of one language borrowing from the other recently. It is a shared inheritance — a food culture that was already present across the Bengal-Odisha-northeastern coastal zone before either Bengali or Odia existed as distinct literary languages, and that persisted in kitchens long after the prestige languages changed. The sour-thin-fish axis is an eastern substrate tradition, visible across Odia, Bengali, and northeastern cuisines, older than any of them.


The Familiar Is the Ancient

Let me end here, with the inversion at the heart of this series.

When we think about what is “authentically Bengali,” we typically reach for the high cultural markers: Tagore, the Charyapadas (the approximately tenth- to twelfth-century Buddhist devotional songs that are the oldest surviving Bengali literature), the literary tradition, the philosophical vocabulary. These feel like the deep Bengali. But linguistically, they are the recent layer — they are the Indo-Aryan stratum, the Sanskritic inheritance, the prestige vocabulary that arrived with a particular cultural-religious formation and was recorded in texts because texts survive.

The deepest Bengali — the oldest, the most continuous — is in the proverb about the ঢেঁকি dheki · rice-husking lever that keeps pounding paddy even in heaven. It is in the kumir-danga game where the flood is always coming and you scramble for high ground. It is in the হাঁড়ি hari · earthen pot and the ঝিঙে jhinge · ridge gourd and the thin sour fish broth that has no name in any language that came from the northwest. These words are older than the language itself. They are the vocabulary of people whose names we do not know, whose own languages are gone, but whose daily routines — pounding rice, cooking fish in thin sour broth, reading the ground for where the flood would not reach — survived inside the language that displaced them.

The words that feel most Bengali are probably the least Bengali in origin.

That is where this series begins.

Sources

  1. Suniti Kumar Chatterji. The Origin and Development of the Bengali Language (1926). Calcutta University Press. 2 vols.
  2. F.B.J. Kuiper. Aryans in the Rigveda (1991). Rodopi, Amsterdam. Vol. 1. ISBN 978-9051833072
  3. Michael Witzel. "Early Sources for South Asian Substrate Languages". Mother Tongue (ASLIP) (1999). Special Extra Number, Oct. 1999, pp. 1–70
  4. Franklin Southworth. Linguistic Archaeology of South Asia (2005). Routledge Curzon, London doi:10.4324/9780203412916
  5. David Reich et al.. "Reconstructing Indian Population History". Nature (2009). Vol. 461, pp. 489–494 doi:10.1038/nature08365
  6. Paul Sidwell. "The Austroasiatic Language Family". The Oxford Handbook of the Languages of South Asia (2018). Oxford University Press. Ed. Cardona & Jain

Next in this series: Before We Were Bengali — The People of the Red Earth