Skip to main content
Back to blog
API Architecture Scalability

Designing APIs That Scale: 150+ Project Lessons

Patterns for scalable APIs: versioning, pagination, rate limiting and idempotency. Lessons from 150+ production projects.

JM
Javier Manzano
CEO & Co-founder • June 28, 2026

After more than 150 projects with APIs in production, we have identified patterns that repeat over and over. These are not academic theories or best practices from a tutorial. They are lessons learned in the field, sometimes from production incidents, sometimes from painful refactorings we could have avoided with better initial decisions.

This article compiles the API design principles we have distilled at Soamee over years of building systems that process millions of daily requests. From the digital certification API for eEvidence to the advertising data API for InfoAdex, through the integration APIs for Cawa and Xceed.

Principle 1: Design the contract before writing code

The most common error we see is starting to write code without a defined API contract. The result: inconsistent endpoints, different response formats depending on who programmed each part, and a chaotic integration experience.

We follow the API-first approach:

  1. Discovery workshop with stakeholders: what data who needs, how frequently, what operations are required
  2. Complete OpenAPI 3.1 specification: endpoints, request/response schemas, error codes, examples
  3. Review with consumers: the frontend team or integrators validate the contract before development
  4. Development against the spec: code is automatically validated against the specification in CI

This approach reduces integration time by 40% and eliminates post-development “that is not what I expected” moments.

Principle 2: Correct pagination makes the difference

We have seen APIs in production with endpoints that return arrays of 50,000 elements without pagination. The server handles it… until it does not.

Cursor-based vs offset-based pagination

Offset-based pagination (?page=5&limit=20) is simple but has serious problems when data changes between requests: duplicated or lost records if items are inserted or deleted.

Cursor-based pagination (?after=cursor_abc&limit=20) solves these problems and is more efficient on large databases (no need to count rows to calculate the offset):

{
  "data": [...],
  "pagination": {
    "has_next": true,
    "next_cursor": "eyJpZCI6MTIzNH0=",
    "has_prev": true,
    "prev_cursor": "eyJpZCI6MTIwMH0="
  }
}

In InfoAdex, with over 55 million records, cursor-based pagination was the difference between 2-second queries (offset with large OFFSET) and 50ms queries (cursor doing a WHERE id > last_seen_id).

Rule: always define a maximum limit

Even with pagination, always define a max_limit. If the client requests ?limit=1000000, your API should respond with the maximum allowed (e.g., 100) and a header indicating truncation. Never trust that the consumer will be reasonable.

Principle 3: Idempotency or you will regret it

In distributed systems, requests can be retried (network timeouts, proxy failures, client retries). If your POST /payments endpoint is not idempotent, a retry can generate a duplicate charge.

The solution: Idempotency Keys.

The client sends an Idempotency-Key: unique-request-id header. The server:

  1. Checks if it already processed that key
  2. If yes: returns the original stored response
  3. If no: processes the request and stores the response alongside the key

This pattern was popularized by Stripe, and we apply it to all endpoints with side effects (create, update, delete). Idempotency keys expire after 24-48 hours.

In the eEvidence project, idempotency was critical: a digital certificate cannot be issued twice for the same evidence, even if the client retries due to a network timeout.

Principle 4: Multi-dimensional rate limiting

Rate limiting is not just “100 requests per minute.” In real APIs you need multiple dimensions:

  • Per user/API key: Consumer’s global limit
  • Per endpoint: Expensive endpoints (reports, exports) with lower limits
  • Per tier: Free (100 req/min), Pro (1000 req/min), Enterprise (10000 req/min)
  • Per IP: Abuse protection even without authentication

We use the token bucket algorithm with sliding windows implemented in Redis. Response headers always include:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1719302400
Retry-After: 30

Retry-After only appears when the limit has been exceeded (HTTP 429). This allows clients to implement automatic backoff without guessing how long to wait.

Principle 5: Errors that help, not frustrate

A generic error {"error": "Bad request"} is useless for an integrator. Our standard error format:

{
  "error": {
    "code": "VALIDATION_FAILED",
    "message": "The request body contains validation errors",
    "details": [
      {
        "field": "email",
        "code": "INVALID_FORMAT",
        "message": "Must be a valid email address",
        "value": "not-an-email"
      },
      {
        "field": "amount",
        "code": "OUT_OF_RANGE",
        "message": "Must be between 1 and 1000000",
        "value": 0
      }
    ],
    "request_id": "req_abc123def456",
    "documentation_url": "https://docs.api.com/errors/VALIDATION_FAILED"
  }
}

Each error includes:

  • code: Machine-parseable code (integrators can do switch/case)
  • message: Human description for debugging
  • details: Array with specific field errors
  • request_id: Unique ID to correlate with server logs
  • documentation_url: Direct link to error documentation

HTTP codes follow the standard strictly: 400 for client errors, 401 for auth, 403 for permissions, 404 for not found, 409 for conflicts, 422 for semantic validation, 429 for rate limit, 500 for server errors.

Principle 6: Evolve without breaking

Versioning is inevitable. Your API will change. The question is how to manage it without breaking existing integrations.

Our evolution strategy:

  1. Additive changes without new version: Adding optional fields, new endpoints, new enum values. These never break existing clients.

  2. Breaking changes = new major version: Removing fields, changing types, renaming endpoints, modifying behavior. Only in /v2/, /v3/…

  3. 12-month deprecation period: When a version is deprecated, consumers have 12 months to migrate. Deprecation and Sunset headers communicate dates.

  4. Automated changelog: Each deploy generates a changelog describing what changed, with links to migration documentation.

  5. Feature flags, not versions: For optional functionality (new response format, new sorting algorithm), we use opt-in headers instead of creating new versions.

In Xceed, we maintain v1 and v2 of the ticketing API in parallel during partner migration. Over 200 integrators cannot migrate overnight.

Principle 7: Design for observability

Each request generates a unique request_id that travels through the entire system (distributed tracing). This enables:

  • Correlating an error reported by the client with server logs
  • Following a request across multiple microservices
  • Measuring latency of each component in the chain

Additionally, we instrument each endpoint with metrics:

  • Latency (p50, p95, p99)
  • Error rate by status code
  • Requests per second
  • Response size

Grafana dashboards with automatic alerts when an endpoint degrades. On more than one occasion, we have detected problems before users report them thanks to p99 latency alerts.

Principle 8: Protect against yourself

The most dangerous APIs are not those attacked by third parties, but those an integrator uses incorrectly (or that your own frontend abuses without realizing).

Protections we always implement:

  • Strict timeouts: Each operation has a timeout. If the database does not respond in 5s, we return 503 instead of blocking the thread indefinitely.
  • Circuit breakers: If an external service fails, we stop calling it temporarily instead of accumulating timeouts.
  • Request size limits: Maximum 10MB per request. Large uploads go via signed URLs direct to S3.
  • Query complexity limits: In GraphQL, queries with depth > 5 or cost > 1000 are automatically rejected.
  • Mandatory pagination: List endpoints have no option to return everything. Always paginated.

Principle 9: Webhooks as first-class citizens

In modern APIs, webhooks are as important as endpoints. Integrators should not poll to know if something changed.

Our webhook design includes:

  • Automatic retry with exponential backoff (up to 72 hours of retries)
  • HMAC signature on each payload so the receiver can verify authenticity
  • Idempotency: The receiver may receive the same event more than once (due to retries). We include a unique event_id.
  • Delivery logs: Dashboard where the integrator sees delivery history, failures, and retries
  • Test mode: Endpoint to simulate event delivery without real actions

Principle 10: Documentation IS the API

An API without documentation does not exist. But documentation that becomes outdated is worse than no documentation, because it generates false confidence.

Our documentation stack:

  • OpenAPI 3.1 as source of truth: Everything is generated from here
  • Redoc or Swagger UI: Auto-generated interactive documentation
  • Postman collection: Automatic export for manual testing
  • Auto-generated SDKs: TypeScript, Python, Go from the OpenAPI spec
  • Quick-start guides: In under 5 minutes, the integrator makes their first call
  • Living changelog: Automatically updated with each deploy

Documentation is validated in CI: if code does not match the OpenAPI spec, the deploy fails. This guarantees documentation is always up to date.

Conclusion: APIs are contracts, not endpoints

Designing APIs that scale is not exclusively a technical problem. It is a product design problem: your API is a product that other developers consume. The integration experience determines whether those developers (internal or external) are productive or frustrated.

The 10 principles we have described are not optional for serious APIs. They are the baseline that every team should implement before exposing an endpoint to the world. The difference between a mediocre API and an excellent one is not in the underlying technology, but in the attention to detail in contract design, error handling, documentation, and production operations.

If you want us to review your current API architecture or design one from scratch with these principles, we offer free technical consulting where we analyze your case and propose concrete improvements.

Don't miss a thing

JM

Javier Manzano

CEO & Co-founder at Soamee

Passionate about technology and software development. Sharing knowledge and experiences to help other developers grow.

Did you enjoy this article?

If you need help with your development project, we are here for you.

Book a free call →