$ cd ../
PUBLISHED: MAY 2026
#fastapi#next.js#redis#system design

How I Built JanSamadhan: Serving 300+ Citizens with 99.9% Uptime

A deep dive into the engineering choices, performance bottlenecks, and database optimizations required to run a live municipal service pilot.

// summary (tldr)

How proper database indexing, connection pooling, and asynchronous task offloading kept a civic tech pilot performant under peak mock stress testing.

The Reality of Civic Deployments

Many software engineering tutorials show how to build clean dashboards or make simple mock APIs. However, when you deploy a system to be used by actual people in a live environment, you face real-world constraints: database crashes, webhook payload spikes, resource exhaustion, and memory leaks.

JanSamadhan is a civic grievance routing system deployed as a live pilot in a Delhi ward. Serving 300+ registered citizens and routing hundreds of municipal complaints required a highly resilient backend architecture.

This article details the optimizations implemented to ensure 99.9% sustained uptime and stable performance under peak load.

---

1. Webhook Concurrency and Event Queueing

In a civic system, user actions are bursty. When a localized incident occurs (e.g., a power line sparks or a local road collapses), reports are sent in rapid succession.

Initially, our FastAPI endpoint received the webhook request from the WhatsApp Cloud API, queried the database to find duplicates, contacted the LLM API, and sent a confirmation message back, all within a single request-response lifecycle. This resulted in:

  • High response latencies (>2.5 seconds).
  • Database connection pool exhaustion under concurrent requests.
  • Connection timeouts from the WhatsApp webhook gateway.
  • To solve this, we decoupled the reception from the processing:

  • FastAPI acts as a thin routing layer. When a webhook hits, it validates the signature, pushes the raw body to Redis, and returns `200 OK` immediately (taking <15ms).
  • A background worker pool (implemented via an async process queue) consumes messages from Redis, processes them, calls the LLM for routing taxonomy, and writes to the DB asynchronously.
  • This shift isolated the ingestion API, allowing us to absorb huge spikes in user traffic without database pressure.

    ---

    2. SQL Optimization and Spatial Indexes

    With location-aware features like PostGIS radius matching, query optimization is paramount. Checking for duplicates within 50 meters of every new grievance requires scanning the database coordinates.

    Without optimization, a simple `ST_DWithin` query scans the entire table sequentially. At a few hundred rows, it's fast. At thousands of entries, it becomes a bottleneck.

    We added a spatial index (GIST) on our geometry column:

    CREATE INDEX idx_grievances_geom ON grievances USING GIST (geom);

    This transformed sequential scans into index scans, reducing the duplicate checking query time from 240ms to 8ms.

    ---

    3. Advanced Connection Pooling

    We configured SQLAlchemy's engine with explicit pool tuning parameters to prevent connection starvation:

    engine = create_engine(
        DATABASE_URL,
        pool_size=20,
        max_overflow=10,
        pool_timeout=30,
        pool_recycle=1800
    )
  • `pool_size=20` holds 20 persistent connections ready to use.
  • `max_overflow=10` allows up to 10 additional connections during peak surges.
  • `pool_recycle=1800` closes connections older than 30 minutes to prevent stale socket leaks.
  • ---

    Conclusion

    Deploying civic systems requires a shift in engineering mindset: prioritize reliability, isolate ingress from processing, and tune indexes early. These basic optimizations allowed JanSamadhan to operate with 99.9% availability.