Research Blog · Consumer Analytics

From Grocery Cart to Health Dashboard: What Your Purchases Predict

Sakira Afrose Toma  ·  2025  ·  sakiraatoma.com

Every time you scan a loyalty card at a grocery store, you generate a data point. Multiply that by the 90 million American households enrolled in retail loyalty programs, and you have the most granular, real-time behavioral health dataset that exists — and almost no one is using it for public health surveillance.

This is one of the most striking gaps I have encountered as a marketing analytics researcher: the tools that know the most about American consumer behavior are almost entirely disconnected from the agencies that are trying to understand American health behavior. My research proposes to bridge that gap.

The Problem with Current Health Surveillance

The CDC's primary obesity and chronic disease surveillance tools — the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health and Nutrition Examination Survey (NHANES) — are invaluable. But they have a fundamental limitation: they rely on self-reported data with 12 to 18 month reporting lags. By the time the data is published, the underlying behavior patterns have already shifted.

Consumer purchase data, by contrast, is updated in real time. It is not self-reported — it is behavioral. And it covers dietary behavior at a scale and granularity that no survey could match.

"Surveillance systems built on what people say they eat will always lag behind surveillance systems built on what people actually buy. The data exists. The question is whether we will use it."

The Consumer Health Analytics (CHAD) Framework

My research proposes the Consumer Health Analytics Dashboard (CHAD) — a machine learning pipeline that uses regional consumer purchase data to generate real-time, county-level metabolic disease risk scores for the United States.

The CHAD framework constructs a Dietary Health Score for each metropolitan statistical area (MSA) from four purchase behavior dimensions: the ratio of fresh produce to processed food spending; the sugar-sweetened beverage purchase rate; the fast food expenditure share; and the organic product adoption rate. These scores are then used to train machine learning classifiers that predict regional obesity and diabetes risk with significantly shorter temporal lags than current survey-based methods.

Why This Is a Marketing Analytics Contribution

The CHAD framework is not a public health study in the traditional sense. It is a marketing analytics methodology applied to a public health challenge. The core innovation — using consumer purchase behavior as a predictive health risk signal — comes directly from the marketing analytics literature on consumer behavioral modeling and CRM analytics.

This is exactly the kind of cross-domain translation that I believe defines the most important research opportunities in marketing analytics today. The methods are not new. The application is.

Proposed Research

Paper 2 of the health analytics program examines whether machine learning models trained on MSA-level consumer purchase data can predict 2022–2023 CDC chronic disease rates from 2019–2021 purchase patterns, with greater temporal accuracy than current survey-based methods. Target journals: Journal of the American Medical Informatics Association (JAMIA), Preventive Medicine.

The Equity Dimension

One critical question in this research is whether a purchase-based health prediction model performs equally well across racial and income subgroups. If it is less accurate for minority communities, this reveals a systemic gap in retail data coverage — itself a form of data inequity that reflects and reinforces broader structural inequalities. The fairness audit of the CHAD model is as important as its accuracy metrics.

The grocery cart knows more about America's health than most doctors. It is time we built the analytical infrastructure to listen to it.

About the Author

Sakira Afrose Toma is a Marketing Analytics researcher at Wright State University. Her research focuses on consumer behavior analytics, health-linked data science, workforce analytics, and consumer data privacy.

View Research → Get in Touch