Mission
Design and own the on-device AI engine that runs perception and alignment models on AR headsets and ruggedised edge hardware – in environments with unreliable or zero connectivity. Factories have bad WiFi. Construction sites have none. If your system fails offline, the product is dead.
System Ownership
- Primary: On-device inference runtime (model loading, execution, resource management)
- Primary: Model optimisation pipeline (quantization, pruning, graph optimisation → deployment-ready artefacts)
- Primary: Offline-first architecture (local queue, retry sync, conflict resolution)
- Primary: Edge data compression (point cloud + telemetry compression before cloud sync)
- Secondary interface: CV team (you deploy and optimise their perception models on-device)
- Secondary interface: Cloud/Backend team (you sync data to their ingestion APIs when connectivity resumes)
- Does NOT own: Model training/architecture design (CV/AI teams), MR rendering (MR team), cloud infrastructure (Backend team)
What You Will Build
- On-device inference engine – Serve TensorRT/CoreML/ONNX models on heterogeneous edge hardware (ARM SoCs, Apple Neural Engine, Qualcomm Hexagon DSP). Manage model lifecycle: load, warm-up, execute, swap.
- Model quantization & optimisation pipeline – Take FP32 trained models → INT8/FP16 quantized, graph-optimised, target-specific artefacts. Maintain accuracy within 1% of FP32 baseline while hitting latency targets.
- Offline-first data architecture – Local SQLite/LevelDB store for scan data, inspection results, telemetry. Queue-based sync with exponential backoff. Conflict resolution when offline edits collide with cloud state.
- Edge data compression – Compress point clouds (Draco, custom octree encoding) and telemetry streams before upload. Target: 10× compression ratio with < 0.5mm geometric loss.
- Latency reduction architecture – Profile end-to-end inference pipeline (pre-processing → inference → post-processing). Identify and eliminate bottlenecks. Hit the 200ms total budget.
- Thermal management – Monitor device temperature during sustained inference. Implement throttling strategies that degrade gracefully (reduce resolution, increase skip frames) rather than crash.
Core Technical Responsibilities
- Build the inference serving layer that abstracts across TensorRT (NVIDIA), CoreML (Apple), ONNX Runtime (cross-platform), and potential future runtimes
- Implement INT8 quantization with calibration dataset management – track accuracy regression per quantization profile
- Design the offline-first sync protocol: local writes are always available, cloud sync is opportunistic, conflict resolution is deterministic
- Build the data compression pipeline for point clouds: evaluate Draco, custom octree, and lossy compression with configurable geometric tolerance
- Profile and optimise CUDA kernel performance for inference pre/post-processing on NVIDIA Jetson-class hardware
- Implement cross-compilation pipelines (ARM64 targets) with deterministic builds and automated hardware-in-the-loop testing
- Build the device health monitoring system: battery, temperature, memory, storage, connectivity status – exposed as telemetry to cloud
Required Technical Mastery
- Inference runtimes: TensorRT, CoreML, ONNX Runtime – not just API usage, but understanding of graph optimisation passes, layer fusion, memory planning
- Quantization: Post-training quantization (PTQ), quantization-aware training (QAT), calibration dataset selection, accuracy/latency tradeoff analysis
- ARM optimisation: NEON intrinsics, memory hierarchy awareness, cache-friendly data layout for ARM Cortex-A / Apple M-series / Qualcomm Snapdragon
- Languages: C++ (primary – systems programming, memory management, RAII, no exceptions), Rust (desirable for safety-critical paths), Swift (Apple ecosystem), Python (tooling/scripting)
- Embedded systems: Cross-compilation (CMake, Bazel), hardware-in-the-loop testing, deterministic memory allocation (no runtime malloc in hot paths)
- Offline-first patterns: Local-first databases, CRDTs or operational transform for conflict resolution, queue-based sync with idempotent operations
- Compression: Point cloud compression algorithms, protocol buffer / flatbuffer serialisation, delta encoding for time-series telemetry
Production Challenges You'll Solve
- Thermal throttling – The AR headset runs sustained inference for 2 hours on a factory floor at 35°C ambient. The SoC hits 85°C and starts throttling. Your inference latency spikes from 180ms to 400ms. Build adaptive quality control that reduces model resolution before the device throttles.
- Offline for 6 hours – A construction site has zero connectivity for an entire shift. 50 inspection scans are completed offline. Connectivity resumes. Your sync system must upload 8GB of compressed scan data, resolve 3 conflicting edits, and update cloud state – without data loss or duplication.
- Model swap without restart – The CV team ships a new perception model. Your system must hot-swap the model on-device without restarting the application or losing the current session state.
- Memory pressure – The device has 6GB RAM. Your inference engine, the MR renderer, and the OS are all competing. You have a 1.5GB budget. The perception model alone is 800MB in FP32. Quantize, shard, or page – but hit your latency target within the memory budget.
- Heterogeneous hardware – Today it's an iPad Pro with LiDAR. Tomorrow it's a Meta Quest Pro. Next quarter it's a custom ruggedised device with a Jetson Orin. Your inference engine must abstract across all three with a single model pipeline.
Success KPIs
| KPI | Target | Measurement |
|---|
| Inference latency | < 200ms end-to-end | Measured on target hardware, P95 over 1000 inferences |
| Quantization accuracy loss | < 1% vs. FP32 baseline | mAP/alignment error measured on validation set |
| Offline sync reliability | 100% data delivery | No scan data loss after offline period, verified by audit log |
| Point cloud compression | ≥ 10× ratio, < 0.5mm loss | Measured on standard test point clouds |
| Device uptime | ≥ 8 hours continuous operation | No crashes or OOM during full-shift field test |
| Thermal headroom | Sustained operation at < 80°C | SoC temperature monitored over 4-hour stress test |
Failure If Underperforming
- Inference exceeds 200ms → perception pipeline stalls, MR overlay lags, operator experience degrades instantly.
- Offline data lost → an entire shift of inspection work vanishes. Operator must redo everything. Customer churn guaranteed.
- Thermal crash → device shuts down mid-inspection on a factory floor. Operator loses trust. Device goes back in the box permanently.
- Memory leak → application crashes after 90 minutes. Unreliable for production shifts. Cannot sign enterprise contracts.
Collaboration Interfaces
| With | Interface |
|---|
| Lead CV Engineer | They train models. You deploy them. Joint responsibility for quantization accuracy validation. Shared latency budget (their preprocessing + your inference). |
| MR Systems Engineer | You share the same device hardware. GPU/CPU resource allocation must be coordinated – inference vs. rendering budget. |
| Backend Engineer | Your sync protocol talks to their ingestion APIs. Data format contract (protobuf schemas) must be jointly owned. |
| DevOps Engineer | They manage fleet-wide OTA updates. You define the update payload format and rollback strategy for model + runtime updates. |
Why This Role Is Mission-Critical
The D2R platform is designed for industrial environments – places where cloud connectivity is a luxury, not a given. If the system only works when connected, it's a demo, not a product. You ensure that every core function (scan, align, measure, inspect) works at full capability with zero connectivity. You also own the 200ms inference budget that makes real-time perception possible. Without you, the system is either too slow or too fragile for production use.
About Us
Building the D2R (Design-to-Reality) platform – sub-millimetre CAD alignment + edge AI + mixed-reality overlay for industrial field workers. Venture-backed, seed-stage, < 20 engineers.
- Location: Bangalore / Hyderabad
- Stage: Seed / Pre-Series A (venture-backed)
- Industries: Construction, Manufacturing, Infrastructure, Energy