AI hits classic SaaS reality: limits, capacity squeeze, and "run it at night"

Get A Free Assessment

Leave your phone number and we will contact you!

Or you can call us:

+1 (516) 942-2021

By clicking the button you agree to our Privacy Policy

AI & SaaS

AI hits classic SaaS reality: limits, capacity squeeze, and "run it at night"

AI & SaaS

AI hits classic SaaS reality: limits, capacity squeeze, and "run it at night"

3 min read

We are starting to run into service availability problems. What SaaS startups went through in their time is starting to show up in the current period of AI hype as well. If you cannot estimate how many tokens you will need to complete a task — because your revision cycles are not under your control and cannot be predicted — the problem can grow into systematic budget or schedule overruns. And that is before anyone even starts implementing Huang's advice about $250,000 worth of tokens for each developer.

Anthropic changed how Claude limits work so that during peak hours a "five-hour session" is consumed faster, and off-peak it is consumed more slowly. This does not look like a direct "they cut the limit," because the weekly limits did not formally change — what changed is the burn rate at different times of day.

The key window is weekday peak hours: 05:00–11:00 PT (13:00–19:00 GMT). In that window, users can burn through their session limit faster than before; but in the remaining hours, the same plan lets you get more work done. For Europe, that is exactly the unpleasant part of the working day.

The change, according to Anthropic employee Tarik Shihipar, will affect about 7% of users — primarily on paid plans, especially those who run token-heavy background tasks. The recommendation is predictable: move such tasks to off-peak hours, where sessions stretch further.

Why is this even possible? Because subscription plans (Free/Pro/Max) describe limits as five-hour windows and a weekly budget, but Anthropic does not disclose the exact number of tokens inside those limits: consumption depends on the length and complexity of the dialogue, the functions used, the chosen model, and the plan. As a result, the limit is effectively elastic — during peak hours the company provides less compute for the same session, and off-peak, more.

It is also worth noting a known tendency toward AI degradation when models are trained further on their own generated content. In the process of writing code and training on the results, that is exactly what happens. The more degradation, the more tokens are consumed.

Sergei Isakov | COO, Co-Founder