Microsoft Graph API Rate Limits: What Every Enterprise Architect Needs to Know
If you are building a Slack-to-Teams bridge, a compliance archive pipeline, or any system that writes to Microsoft Teams at volume, you will eventually hit a Graph API rate limit. The experience is always the same: messages start arriving out of order, some are silently dropped, and the investigation takes a day you did not budget.
This post is the documentation Microsoft does not provide in one place.
The Three Rate Limit Tiers
Graph API rate limiting operates on three distinct tiers, each with different throttling behavior:
Per-app-per-tenant limits are the most common throttling vector for enterprise integrations. When a single Azure AD application makes too many requests against a single tenant within a rolling time window, Microsoft returns HTTP 429 with a Retry-After header.
Per-tenant limits apply to the total traffic against a tenant regardless of which application is generating it. In practice, this matters during migrations — when multiple integration tools are simultaneously reading and writing to the same tenant, aggregate traffic can exceed the per-tenant ceiling even if each individual app is well within its own limits.
Service-level limits are global Microsoft back-end throttles that activate during infrastructure incidents or unexpected regional load spikes. These are non-deterministic and cannot be architected around — the only mitigation is exponential backoff with jitter.
Specific Limits for Messaging Operations
For Teams messaging operations specifically, the documented limits are:
- Send message to channel: 4 requests per second per tenant (as of 2026)
- Send message to chat: 10 requests per second per app per tenant
- List channel messages: 30 requests per 30 seconds
- Get channel message replies: 30 requests per 30 seconds
- Delta query (message sync): 10 requests per 10 seconds
These numbers sound generous until you run the math on a 10,000-employee enterprise where the Slack-to-Teams bridge is routing messages from 200 active channels simultaneously.
The Silent Drop Problem
The dangerous failure mode is not the HTTP 429 response itself — that is handleable. The dangerous failure mode is when the calling code treats a 429 as a terminal error and discards the request rather than retrying it.
We have seen production integrations where a library's internal retry logic was silently failing: the retry was firing, but the payload had already been garbage-collected from the in-memory queue before the retry timer expired.
The result: perfectly healthy HTTP exchange logs (429 → retry → 200) but missing messages in Teams channels, with no application-level error emitted.
Correct Retry Architecture
The correct pattern for Graph API messaging at enterprise scale:
interface MessageJob {
payload: SendMessagePayload;
retries: number;
nextRetryAt: number;
}
async function sendWithRetry(job: MessageJob): Promise<void> {
const response = await graphClient
.api(`/teams/${teamId}/channels/${channelId}/messages`)
.post(job.payload);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '5', 10);
const jitter = Math.random() * 2000; // up to 2s of jitter
if (job.retries < MAX_RETRIES) {
await persistentQueue.schedule({
...job,
retries: job.retries + 1,
nextRetryAt: Date.now() + (retryAfter * 1000) + jitter
});
} else {
await deadLetterQueue.push(job);
metrics.increment('graph_api.dead_letter');
}
return;
}
}
Three things are non-negotiable:
- Persistent queue — not in-memory. If the process restarts during a throttle window, the message must survive.
- Dead letter queue — after MAX_RETRIES, the message is preserved for manual review, not silently dropped.
- Jitter — without randomization on the retry delay, all workers in a distributed system will fire their retries simultaneously, causing a retry storm that re-triggers the throttle immediately.
Burst Handling: Token Bucket vs Leaky Bucket
For sustained throughput, a leaky bucket implementation that drips requests at a constant rate is safer than token bucket for Teams integration. Token bucket allows instantaneous bursts up to the bucket capacity — which is exactly the behavior that triggers throttling.
The leaky bucket approach smooths traffic, accepting that individual messages may be delayed by 200–500ms in exchange for never generating a throttle event.
At SyncRivo, we implement an adaptive rate controller that measures actual 429 frequency over a 60-second rolling window and dynamically adjusts the send rate, tightening headroom when throttle events increase and expanding when the window clears.
Monitoring Recommendations
Add these to your observability stack before you go live on any Graph API integration:
- graph_api.throttle_rate — percentage of requests returning 429 over 5m windows
- message_queue.depth — messages waiting to be dispatched; should be near zero at steady state
- dead_letter_queue.depth — any growth here requires immediate investigation
- p99_end_to_end_latency — time from Slack message received to Teams message confirmed
A p99 above 3,000ms during normal load is an early signal that your send rate is approaching the throttle ceiling.
See SyncRivo's engineering approach to rate limit handling → | Read the Teams architecture deep dive →