Backend & Cloud Infrastructure Engineer

Bangalore | HyderabadFull-time

Mission

Build the data backbone, analytics engine, and SaaS platform that stores every inspection, computes every trend, and serves every dashboard. The SaaS model (₹3L/unit/year) depends on analytics delivering ongoing value – not just a one-time MR experience.

System Ownership

  • Primary: Cloud API layer (REST + WebSocket APIs for all client-server communication)
  • Primary: Inspection data storage (point clouds, deviation maps, QC results, audit trails)
  • Primary: Multi-tenant SaaS architecture (workspace isolation, subscription management, enterprise SSO)
  • Primary: Analytics data pipeline (raw inspection data → aggregated trend metrics)
  • Secondary interface: Edge AI team (you provide ingestion APIs for edge-to-cloud sync)
  • Secondary interface: MR team (you serve session state and multi-user anchor data)
  • Secondary interface: Full Stack team (you provide APIs they build dashboards on)
  • Does NOT own: ML model training (AI team), on-device inference (Edge AI team), MR rendering (MR team), front-end dashboards (Full Stack team)

What You Will Build

  • Cloud inspection storage – Store LiDAR point clouds (100MB–2GB per scan), deviation maps, QC pass/fail results, and operator metadata. Handle ≥ 50GB new scan data per day across all tenants.
  • Analytics engine – Deviation trend analysis over time. Which machines/structures show degrading quality? Which tolerance bands are consistently violated? Compute and cache aggregated metrics.
  • Digital twin update pipeline – As new scans arrive, update the digital twin representation. Version both CAD revisions and scan history. Support rollback and historical comparison.
  • Audit trail & compliance reporting – Generate tamper-evident inspection logs. Export as PDF and JSON for ISO/ASME compliance. Every QC decision must be traceable to a specific scan, alignment, and operator.
  • Multi-tenant SaaS platform – Workspace isolation (no tenant can see another's data), role-based access, subscription tiers, usage metering, enterprise SSO (SAML/OIDC).
  • Enterprise security – Data encryption at rest and in transit, API authentication (JWT + API keys), audit logging, SOC 2 readiness foundations, GDPR-aware data retention policies.

Core Technical Responsibilities

  • Design and implement the REST + WebSocket API layer using FastAPI or equivalent – strict OpenAPI spec, versioned endpoints, rate limiting
  • Build the point cloud storage layer: evaluate columnar storage (Parquet), spatial databases (PostGIS), and object storage (S3/MinIO) tradeoffs for query patterns vs. cost
  • Implement the edge-to-cloud sync ingestion pipeline: receive compressed point clouds from edge devices, decompress, validate, store, trigger downstream processing
  • Build the multi-tenant data isolation layer: schema-per-tenant vs. row-level security – choose based on tenant count and query patterns, implement and test isolation guarantees
  • Implement the audit trail system: append-only log, cryptographic hash chain for tamper evidence, export to PDF/JSON with digital signatures
  • Design the analytics aggregation pipeline: scheduled batch jobs + real-time incremental updates for deviation trend metrics
  • Set up infrastructure: containerised microservices (Docker), orchestration (Kubernetes), infrastructure-as-code (Terraform), CI/CD pipelines

Required Technical Mastery

  • API design: REST with OpenAPI, WebSocket for real-time updates, gRPC for internal service communication. Versioning, pagination, error handling standards
  • Cloud platforms: AWS (primary) or GCP – production experience with EC2/ECS/EKS, S3, RDS, SQS/SNS, CloudFront, IAM, VPC networking
  • Databases: PostgreSQL (primary), TimescaleDB or equivalent for time-series analytics, Redis for caching and session state, understanding of spatial query patterns (PostGIS)
  • Microservices: Service decomposition, inter-service communication (sync + async), distributed tracing, circuit breakers, saga pattern for multi-service transactions
  • Containerisation: Docker (multi-stage builds, security scanning), Kubernetes (deployments, services, ingress, HPA, resource limits, persistent volumes)
  • Infrastructure as Code: Terraform or Pulumi – not ClickOps
  • Security: OAuth 2.0 / OIDC, JWT management, SAML for enterprise SSO, encryption (AES-256 at rest, TLS 1.3 in transit), secrets management (Vault or AWS Secrets Manager)
  • Languages: Python (primary – FastAPI, SQLAlchemy, Celery), Go (desirable for high-throughput services), SQL

Production Challenges You'll Solve

  1. 50GB/day scan ingestion – 100 edge devices each uploading 500MB of compressed point cloud data daily. Your ingestion pipeline must decompress, validate, store, and trigger processing – without dropping data or blocking the edge device's sync queue.
  2. Schema migration with production data – You need to add a new column to the inspection results table. 50 million rows. 12 tenants. Zero downtime. No data corruption. Build a migration strategy that handles this.
  3. Tenant data leakage – A new developer writes a query that accidentally returns Tenant B's data to Tenant A's API call. Your isolation layer must make this architecturally impossible, not just "please be careful."
  4. Audit trail tampering – A customer's compliance auditor asks: "prove this inspection result hasn't been modified since recording." Your append-only hash-chain audit log must answer this definitively.
  5. Cost explosion – Uncompressed point cloud storage is growing at 1.5TB/month. At S3 standard pricing, this becomes unsustainable. Design a tiered storage strategy: hot (recent scans) → warm (last 90 days) → cold (archive), with transparent access across tiers.

Success KPIs

KPITargetMeasurement
API availability99.9% uptimeMeasured by external health checks, monthly
API latency (P95)< 200ms for data queries, < 2s for analyticsApplication performance monitoring
Data ingestion throughput≥ 50GB/dayMeasured at peak load across all tenants
Dashboard query latency< 2s P95Measured on analytics aggregation queries
Tenant data isolation0 cross-tenant data leaksAutomated isolation tests in CI/CD
Audit trail integrity100% verifiableHash chain validation on every export
Deployment frequency≥ 2 production deploys/weekCI/CD pipeline metrics

Failure If Underperforming

  • API goes down → every edge device queues unsyncable data, every dashboard is blank, every MR multi-user session fails. Single point of failure for the entire platform.
  • Tenant data leaks → immediate contract termination, potential legal liability, destroyed enterprise trust. One incident can kill the company at seed stage.
  • Analytics are slow or wrong → the SaaS value proposition (ongoing insights, not just MR) collapses. Customers question why they're paying ₹3L/year/unit.
  • Audit trail is tamperable → ISO/ASME compliance fails. Cannot sell to regulated industries (energy, aerospace, automotive). Addressable market shrinks by 70%.

Collaboration Interfaces

WithInterface
Edge AI EngineerThey send compressed scans + telemetry. You provide ingestion APIs + sync acknowledgement. Protobuf schema jointly defined.
MR Systems EngineerYou serve multi-user session state and spatial anchor persistence. WebSocket API for real-time sync.
Full Stack EngineerThey build dashboards on your APIs. OpenAPI spec is the contract. You own the data; they own the presentation.
Applied AI EngineerYou provide historical inspection data for model training. Data export format and access patterns jointly defined.
DevOps EngineerThey manage infrastructure. You define resource requirements, scaling policies, and deployment configurations.

Why This Role Is Mission-Critical

Our SaaS revenue model depends on the backend delivering continuous value. The MR experience is the hook – analytics, compliance reporting, and digital twin updates are the retention engine. Without a reliable, secure, performant backend, every edge device is an isolated tool (not a platform), every customer churns after the novelty wears off, and the ₹3L/unit/year subscription cannot be justified.

About Us

Building the D2R (Design-to-Reality) platform – sub-millimetre CAD alignment + edge AI + mixed-reality overlay for industrial field workers. Venture-backed, seed-stage, < 20 engineers.

  • Location: Bangalore / Hyderabad
  • Stage: Seed / Pre-Series A (venture-backed)
  • Industries: Construction, Manufacturing, Infrastructure, Energy