How to evaluate whether your business problem needs ML or just better SQL

Most projects pitched to us as AI engagements are really SQL problems with a logistic regression on top. A four-question framework for telling the difference, with worked examples — and the discipline of giving the smaller answer when it is the right one.

A meeting we have had a dozen times: a director of analytics or a VP of product describes a business problem that is keeping them up at night. Halfway through, they say, “so we want to build an AI to predict X.” We let them finish. Then we ask the four questions in this post.

In our experience, eight times out of ten, the answer to the four questions is that they do not need an AI to predict X. They need a SQL query, a dashboard, a reasonable baseline, and someone to read the result. The remaining two times out of ten, they have a real ML problem, and we are happy to take it on. But we tell people the SQL answer when it is the right one, even though the SQL answer is a smaller engagement for us, because the alternative — selling someone a six-figure ML project they did not need — is the kind of thing that ends careers.

The four questions are about the structure of the problem, not the buzzwords used to describe it. Run them in order.

1. Is the relationship genuinely nonlinear?

Most business relationships, in the data, look like a line with noise. Revenue goes up roughly proportionally to spend. Conversion goes down roughly linearly with price. Churn goes up roughly with the time since the last engagement. There is variance around the trend; the trend itself is approximately linear over the operating range.

When the relationship is approximately linear, a linear regression — five lines of code, runs in milliseconds, produces coefficients with confidence intervals you can defend — does the entire job. There is no ML problem to solve.

The way to know whether the relationship is genuinely nonlinear is to look at the data. Plot the input against the output. If you see a roughly straight line with scatter, you are done. If you see a clear S-curve, a step function with an inflection point, a saturation effect, or strongly different behavior in different regions of the input space, you might want a more flexible model — gradient-boosted trees, perhaps. If you see something that looks like noise, you do not have a predictable relationship at all, and no model will save you.

The mistake that turns linear problems into “AI projects” is skipping the plot. People reach for ML because the problem feels complex, not because they have looked at the data and seen complexity.

2. Is the input structured?

The single most important distinction in modern ML is between structured input and unstructured input.

Structured input is what lives in a database: rows and columns, numbers and categories, a clean schema. Almost everything that matters in a business is structured input. Customer demographics. Transaction history. Order quantities. Server response times. Marketing-channel attribution.

For structured input, classical methods — logistic regression, gradient-boosted trees, the small handful of techniques you can fit in a hundred lines of scikit-learn — get you most of the way to the achievable accuracy on most problems. Sometimes a more sophisticated model adds three or four percentage points; rarely does it add ten.

Unstructured input is text, images, audio, video. Free-text fields in a CRM. Photographs of products. Phone-call recordings. Free-form chat history. For unstructured input, you have a real ML problem, because the only way to do anything useful with the input is to first extract a representation of it — embeddings, in the modern toolkit — and that extraction is the part that requires non-trivial models.

The discipline question is: which kind of input is your problem actually using? If your “predict customer churn” problem uses tenure, plan tier, last login, and number of support tickets, that is a structured-input problem. The data is in your database. The right first model is logistic regression on a few well-chosen features. If you discover that the support tickets, as free-text bodies, contain a strong signal that the structured fields don’t capture, then you have an unstructured-input problem worth attacking with a model that reads the text. Most teams never check whether the structured signal is enough before reaching for the unstructured one.

// a useful rule of thumb

Build the SQL-and-logistic-regression baseline first. Measure how well it does. If it solves the business problem to a tolerable level, ship it. If it falls short, then you know what the gap is — and you can make a defensible case for the ML investment that closes it. If you skip the baseline and build the ML model first, you have no idea whether the additional capability of the model is what is solving the problem, or whether the SQL would have done it for free.

3. Are you predicting, or explaining?

People ask for predictive models when what they really want is to understand why something is happening.

“We want to predict which customers will churn” is often, on closer inspection, “we want to understand what causes customers to churn so we can stop it.” Those are different problems. A predictive model gives you a score for each customer; an explanatory model gives you the answer to “why.”

A logistic regression with carefully chosen features gives you both: a churn probability, plus coefficients that say “customers on the basic tier are 3.2× more likely to churn, customers without integration installed are 2.1× more likely, customers who haven’t logged in in 14+ days are 4.5× more likely.” The coefficients are the explanation. They are also the thing the product team can act on.

A black-box model — a deep neural network, or an XGBoost ensemble with a hundred features and no constraints — gives you a marginally better churn score and no explanation worth acting on. SHAP values approximate one, but the answer is always “it’s complicated.”

The question to ask the business stakeholder, before you start: when you say you want to predict X, do you want the score, or do you want to know what to do about it? If they want to know what to do, the answer is almost always a model so simple it has interpretable coefficients. ML is not the right tool when the goal is understanding.

4. How much error can the decision tolerate?

Models have error rates. Even very good models have error rates. The question is whether the decision the model feeds into can absorb the error.

Decisions made at human speed — where a person reads the model’s output, looks at the underlying record, and decides what to do — can absorb high error rates. A churn-risk score that’s wrong half the time is still useful if a human looks at the top 10% of scored customers and decides which ones to call.

Decisions made by automation cannot absorb high error rates. A fraud-detection model that auto-blocks transactions has to be very accurate, because every false positive is a customer who calls support angry, and every false negative is fraud you didn’t catch. The acceptable error budget is small enough that the model has to be carefully calibrated, monitored, and continuously retrained — and that is a real ML engagement, not a SQL query.

The rule: the further down the decision chain a model’s output goes — the closer to “this transaction is auto-blocked” or “this loan is auto-approved” — the harder the engineering problem and the more the investment is justified. Models that feed into “a human reads the top of the list” are usually solvable to 80% of the achievable accuracy with classical methods.

Worked examples

“We want to predict which leads will convert.” Almost always, ranking leads by three or four observable features — title seniority, company size, recency of engagement, source channel — gets you 70–80% of the achievable accuracy. The remaining 20–30% is in unstructured inputs (the content of their inbound message, the LinkedIn profile) and is genuinely useful only at scale. If you have ten thousand leads a month, the ML model is worth it. If you have a hundred, sorting on three features in SQL is the right answer and the salesperson should be the model.

“We want to forecast demand for next quarter.” Look at the historical data. If demand is roughly linear with seasonality, a moving average plus a seasonality decomposition — implementable in SQL or in a thirty-line Python script — is the baseline. ARIMA buys you something incremental. A hierarchical Bayesian model with promotional effects is genuinely better when you have enough data and the promotional effects are large and structured. Pick the simplest one that beats the naive baseline by a meaningful margin.

“We want to classify customer support tickets.” Free-text input — this is a real ML problem. But: a regex-and-keyword classifier built in two days is often the right first step, because half the tickets fall into categories that the regex catches reliably. Then a fine-tuned classifier or a prompt-driven LLM handles the rest. Skipping the regex and going straight to the LLM is more expensive, harder to maintain, and not noticeably more accurate on the easy cases.

“We want to predict churn.” Logistic regression on five well-chosen features, calibrated on three years of historical data. This will get you to 75–85% of the AUC ceiling. The XGBoost model adds two to four points; the neural network adds zero to one. Build the simple model first, decide whether the gap to the achievable ceiling is worth pursuing, and only then commit to the engineering of the more complex model.

The discipline of the smaller answer

A consultant who tells you that you do not need their expensive engagement is giving up revenue in the short term. We do it because clients who get told the truth on the small problem come back with the big one, and the big one is sometimes a ₹50 lakh engagement that we would not have been considered for if we had not earned the trust on the ₹2 lakh question.

Before you spend money on an ML engagement, build the SQL-and-spreadsheet baseline. Ninety percent of the time, the baseline is sufficient.

Of the remaining ten percent, half are real ML problems and worth solving with real ML; the other half are problems where the data simply does not support a useful prediction, and no amount of model sophistication will save them. The hardest call we make for clients is the second one — telling them that the data they have cannot answer the question they are asking. We make it anyway, in writing, with the analysis behind it, because no model will fix a question the data cannot answer.

If you have a problem you have been calling an “AI project” and you would like an honest second opinion on whether it is a SQL problem, a real ML problem, or an unanswerable one, write to us with one paragraph describing it. We will tell you which it is, in writing, within two business days, free.