Indian journalists have processed nearly 22 million voter records, constructed interactive election results interfaces, and deployed low-cost sensors to measure worker heat exposure – all with significant assistance from large language models (LLMs), according to a presentation delivered at the AI in Media Forum Bangalore 2026.
Srinivasan Ramani, Deputy National Editor and Senior Associate Editor at The Hindu, detailed how LLMs are being integrated into the newsroom’s data journalism workflow, not to automate writing, but to accelerate investigations and expand the scale of reporting. “AI,” Ramani stated, “is a highly sophisticated intern. You tell it exactly what to do. It does it. But you remain in control.”
One major undertaking involved analyzing data from India’s Special Intensive Revision (SIR) of voter rolls. Authorities released records detailing voter deletions and the stated reasons. The data, however, arrived as image-based PDFs in Hindi, requiring substantial processing. The Hindu team tackled approximately 90,000 files covering 6.5 million records in Bihar, 78,000 files and 9.7 million records in Tamil Nadu, and 80,000 files encompassing 5.8 million records in West Bengal, totaling roughly 22 million records across the three states.
The process involved using optical character recognition (OCR) to convert the images into machine-readable text, translating the text into English, and storing the data in databases. Ramani’s team then leveraged LLMs to generate SQL queries using natural language prompts, bypassing the need for manual database coding. This analysis revealed anomalies, including a higher rate of female voter deletions in Bihar despite documented male out-migration, and instances where a significant proportion of deleted voters were marked as deceased despite being under the age of 50.
The release of full deletion records was prompted by a directive from the Supreme Court of India. The Hindu responded by creating a searchable database of deleted names and reasons, publishing state-level investigations based on the findings. These investigations were subsequently discussed in parliamentary proceedings and court cases, and led to some corrections of voter rolls in Bihar following public scrutiny and on-the-ground reporting.
Beyond document processing, LLMs assisted in building interactive maps for the 2019 and 2024 general elections, allowing users to filter results by region, state, rural-urban classification, and urban clusters. Remarkably, Ramani confirmed he did not write a single line of code for these applications, relying instead on prompts to ChatGPT, Gemini, and Claude to generate annotated code for each interface component over a two-week period. Previously, such projects would have required dedicated in-house engineers or external volunteers.
The team also used AI-assisted guidance to assemble low-cost, Arduino-based devices to measure heat stress experienced by workers in Chennai. Four devices were deployed with a cook, a fisherman, an industrial worker, and an autorickshaw driver, recording temperature and humidity every 10 seconds. The data revealed a heat index peaking at 69°C (156.2 F) in one instance, highlighting disparities in heat exposure. Following publication, the Tamil Nadu government announced a heat management plan and is exploring the use of similar devices for further study.
Ramani emphasized that AI tools are integrated into an existing data journalism pipeline – hypothesis formation, data collection, cleaning, analysis, visualization, and publication. He categorized the team’s work into five types: simple trend analysis, correlation studies, factor analysis, causal investigations, and deep-dive accountability reporting. AI assists in web scraping, document processing, query suggestion, and front-end development, but human oversight remains crucial. He cited an instance where an AI-generated script slowed analysis due to sequential processing, requiring human intervention to implement multi-threading for improved efficiency.
Ramani traced the evolution of data journalism at The Hindu over the past decade, from visual enhancements to traditional reporting to a dedicated function with data journalists, designers, and editorial coders. A significant project involved an analysis of excess deaths during the COVID-19 pandemic, estimating that official death counts were underreported by a factor of five to six, a finding that was initially contested but later supported by analyses from the World Health Organization and revised official data. Data-driven reporting is now integrated across both print and digital operations, contributing to increased subscriptions and engagement for premium stories.
“We want a more informed audience. This kind of work helps us move in that direction. Across projects, AI does not replace journalistic judgement. It expands the scale at which it can operate,” Ramani said.