You bought Copilot licenses. You gave the team ChatGPT or Claude access. You sent a few people to a half-day workshop. Six months on, productivity has not visibly moved.
We have walked into this exact situation a dozen times now, in companies of every size, and the pattern is always the same. The licenses are paid for. The tools are open in browser tabs. People say nice things about them in surveys. And the work is shipping at exactly the same pace it was a year ago.
The problem is rarely that the tools are bad. The problem is that “having access” and “using effectively” are two different states, separated by an investment most teams never make.
What shallow adoption looks like
When we audit a team’s actual AI usage — and we mean audit: pull the analytics out of the admin console, sit beside engineers for an hour, look at the prompt history — the picture is consistent.
Copilot is open in the IDE and produces autocomplete suggestions. Most engineers accept maybe one in five. They use it the way they used IntelliSense in 2014: as a fancy completion of the next line. The number of cases where it generated a function from scratch and the engineer kept that function is low.
ChatGPT is open in a tab. It is used for three things: drafting an email or Slack message, summarizing a meeting transcript or a long document, and explaining an unfamiliar error message or library API. The conversations average three messages.
Claude or Claude Code, if installed, is used for slightly deeper sessions — engineers who try it tend to push the model harder — but is rarely woven into a recurring workflow. It is the chatbot they ask once or twice a week.
In every case, what is missing is the thing that turns a model from a clever assistant into a productivity multiplier: someone on the team has sat down with a specific recurring workflow, figured out how to make the model genuinely good at it, written down the prompts and scaffolding, and changed how the team does the work.
That work is not free. It costs maybe two engineering days to do well for a single workflow. Most teams have not paid that cost for a single workflow yet, which is why they have not seen the gain that justifies the license spend.
Where depth comes from
The teams that do see real gains from AI tools all share one feature: someone — a tech lead, a senior engineer, an internal advocate — owns the integration. They have decided that for their team, AI is going to make a specific workflow demonstrably faster, and they have done the unglamorous work of making that true.
The pattern is roughly:
- Pick one recurring workflow that the team does at least weekly.
- Spend half a day prototyping AI-assisted versions of it. Most attempts will be worse than the human baseline.
- Find one or two attempts that are genuinely better. Write down what made them work — the system prompt, the context bundle, the human review step.
- Package it. A custom slash command, a snippet, an MCP server, an internal doc — whatever the team will actually use.
- Roll it out, watch what happens, tune.
This is the work. It does not look like AI; it looks like writing carefully and integrating tools. But the output is that the team has a reliably good way to do something they used to do laboriously.
Without an owner doing this work, the team gets exactly the productivity gain you would expect from a slightly better autocomplete: real but small.
Four examples of the depth gap
We have run this exercise enough times to know which workflows tend to repay the investment. Four examples, in increasing order of integration depth.
// automated test generation
The shallow version: an engineer writes a function, then asks Copilot to “generate tests for this.” Copilot produces three tests that exercise the happy path. The engineer accepts them, commits, moves on.
The deep version: the team has a slash command — /tests-for — that pulls in the function, the surrounding module, the test conventions document, and one or two example test files from elsewhere in the codebase. It produces a test file that uses the team’s actual fixtures, follows the team’s actual naming conventions, includes the edge cases that an experienced reviewer would have asked for, and runs cleanly without manual fixup. The engineer reviews and edits in three minutes instead of writing in twenty.
The difference between the two versions is one afternoon of work by someone who knows the codebase.
// code review with the team’s actual conventions
The shallow version: an engineer pastes a diff into ChatGPT and asks “is this code good?” The model responds with bullet points about variable naming and adding type hints. Roughly half of the suggestions are wrong for this codebase.
The deep version: the team has Claude Code (or an equivalent) configured with a CLAUDE.md that names the team’s conventions, the parts of the codebase that are load-bearing and shouldn’t be touched casually, the testing approach, and the deploy story. They have a review-PR command that takes a diff, runs the relevant tests, reads the changed files in context, and produces a review comment that points at the three things a senior engineer on this team would point at.
This first-pass review catches roughly 60–70% of what would have been raised in human review. It does not replace the human reviewer; it makes the human reviewer’s first read much faster, and it catches things that human reviewers consistently miss because they are tired or in a rush. Teams that adopt it report PR cycle time dropping by a third.
// data-pipeline scaffolding
The shallow version: a data engineer starts a new dbt model. They open the file. They use Copilot to autocomplete CTEs. The model takes them an hour and a half.
The deep version: the team has a workflow — implemented as a Claude Code agent, or a custom CLI — that takes a description of the desired output, the relevant upstream tables, and the team’s modeling style guide, and produces a full dbt model with tests, schema docs, and a freshness check. The engineer reviews the generated SQL, makes targeted edits, and runs dbt build. The model is in code review in twenty minutes.
The reason this one repays so quickly: dbt models are stylistically homogeneous within a team. Once you have shown the model what your team’s models look like — three or four good examples — it can produce the next one to a quality very close to what a senior engineer would produce by hand. The unique thinking is in deciding what the model should do, not in typing it.
// document workflows
The shallow version: someone in operations gets an invoice, opens ChatGPT, pastes the invoice text, asks for a summary, copies the summary into a ticketing system, files the invoice, and moves on.
The deep version: there is no ChatGPT in the loop. The invoice arrives in a shared inbox. A workflow — implemented as a Lambda, or in n8n, or in Make, or as a small bespoke script — picks it up, runs it through a model that has been specifically prompted for the team’s vendors and approval rules, populates the ticketing system, attaches the document, and either auto-approves it (for small recurring vendors below a threshold) or routes it for human review with the relevant context already attached.
This is the workflow integration most teams never reach. It requires plumbing. It requires someone to think about edge cases. It requires monitoring, because the model will occasionally extract the wrong amount and someone has to notice. But once it is in place, the workflow runs at zero marginal human cost, and the human attention that used to go into invoice triage goes elsewhere.
What we do in training engagements
The reason teams stall at shallow adoption is not laziness or lack of skill. It is that nobody has been given the time, the budget, and the explicit charter to build the integrations. The team is shipping product. The team’s leadership has signed off on the licenses but has not signed off on someone spending two weeks building internal tooling around them.
Our training engagements exist to close that loop. We come in with a working knowledge of the tools — Copilot, Claude Code, Cursor, the major MCP servers, the workflow runners — and the discipline of someone who has done the workflow-integration work many times. Over a two- or three-day engagement, we sit with your team, pick three or four recurring workflows that look promising, and build the deep version of each one with your team in the room.
What you get out of the engagement is not certifications. It is three or four AI-integrated workflows that survive after we leave, owned by named people on your team, with the documentation and the prompts and the scaffolding that makes them durable. And you get an internal advocate or two — usually emerges naturally from the cohort — who has now seen what the deep version of AI adoption looks like and can build the next one without us.
If the Copilot licenses you are paying for are not producing visible results, the issue is almost never the tool. It is the missing layer of integration work.
We are happy to come in and do that integration work with your team, and to leave you with the patterns to do it yourselves. Book a discovery call.