Edge AI & Embedded Systems Engineer | Careers

Mission

Design and own the on-device AI engine that runs perception and alignment models on AR headsets and ruggedised edge hardware – in environments with unreliable or zero connectivity. Factories have bad WiFi. Construction sites have none. If your system fails offline, the product is dead.

System Ownership

Primary: On-device inference runtime (model loading, execution, resource management)
Primary: Model optimisation pipeline (quantization, pruning, graph optimisation → deployment-ready artefacts)
Primary: Offline-first architecture (local queue, retry sync, conflict resolution)
Primary: Edge data compression (point cloud + telemetry compression before cloud sync)
Secondary interface: CV team (you deploy and optimise their perception models on-device)
Secondary interface: Cloud/Backend team (you sync data to their ingestion APIs when connectivity resumes)
Does NOT own: Model training/architecture design (CV/AI teams), MR rendering (MR team), cloud infrastructure (Backend team)

What You Will Build

On-device inference engine – Serve TensorRT/CoreML/ONNX models on heterogeneous edge hardware (ARM SoCs, Apple Neural Engine, Qualcomm Hexagon DSP). Manage model lifecycle: load, warm-up, execute, swap.
Model quantization & optimisation pipeline – Take FP32 trained models → INT8/FP16 quantized, graph-optimised, target-specific artefacts. Maintain accuracy within 1% of FP32 baseline while hitting latency targets.
Offline-first data architecture – Local SQLite/LevelDB store for scan data, inspection results, telemetry. Queue-based sync with exponential backoff. Conflict resolution when offline edits collide with cloud state.
Edge data compression – Compress point clouds (Draco, custom octree encoding) and telemetry streams before upload. Target: 10× compression ratio with < 0.5mm geometric loss.
Latency reduction architecture – Profile end-to-end inference pipeline (pre-processing → inference → post-processing). Identify and eliminate bottlenecks. Hit the 200ms total budget.
Thermal management – Monitor device temperature during sustained inference. Implement throttling strategies that degrade gracefully (reduce resolution, increase skip frames) rather than crash.

Core Technical Responsibilities

Build the inference serving layer that abstracts across TensorRT (NVIDIA), CoreML (Apple), ONNX Runtime (cross-platform), and potential future runtimes
Implement INT8 quantization with calibration dataset management – track accuracy regression per quantization profile
Design the offline-first sync protocol: local writes are always available, cloud sync is opportunistic, conflict resolution is deterministic
Build the data compression pipeline for point clouds: evaluate Draco, custom octree, and lossy compression with configurable geometric tolerance
Profile and optimise CUDA kernel performance for inference pre/post-processing on NVIDIA Jetson-class hardware
Implement cross-compilation pipelines (ARM64 targets) with deterministic builds and automated hardware-in-the-loop testing
Build the device health monitoring system: battery, temperature, memory, storage, connectivity status – exposed as telemetry to cloud

Required Technical Mastery

Inference runtimes: TensorRT, CoreML, ONNX Runtime – not just API usage, but understanding of graph optimisation passes, layer fusion, memory planning
Quantization: Post-training quantization (PTQ), quantization-aware training (QAT), calibration dataset selection, accuracy/latency tradeoff analysis
ARM optimisation: NEON intrinsics, memory hierarchy awareness, cache-friendly data layout for ARM Cortex-A / Apple M-series / Qualcomm Snapdragon
Languages: C++ (primary – systems programming, memory management, RAII, no exceptions), Rust (desirable for safety-critical paths), Swift (Apple ecosystem), Python (tooling/scripting)
Embedded systems: Cross-compilation (CMake, Bazel), hardware-in-the-loop testing, deterministic memory allocation (no runtime malloc in hot paths)
Offline-first patterns: Local-first databases, CRDTs or operational transform for conflict resolution, queue-based sync with idempotent operations
Compression: Point cloud compression algorithms, protocol buffer / flatbuffer serialisation, delta encoding for time-series telemetry

Production Challenges You'll Solve

Thermal throttling – The AR headset runs sustained inference for 2 hours on a factory floor at 35°C ambient. The SoC hits 85°C and starts throttling. Your inference latency spikes from 180ms to 400ms. Build adaptive quality control that reduces model resolution before the device throttles.
Offline for 6 hours – A construction site has zero connectivity for an entire shift. 50 inspection scans are completed offline. Connectivity resumes. Your sync system must upload 8GB of compressed scan data, resolve 3 conflicting edits, and update cloud state – without data loss or duplication.
Model swap without restart – The CV team ships a new perception model. Your system must hot-swap the model on-device without restarting the application or losing the current session state.
Memory pressure – The device has 6GB RAM. Your inference engine, the MR renderer, and the OS are all competing. You have a 1.5GB budget. The perception model alone is 800MB in FP32. Quantize, shard, or page – but hit your latency target within the memory budget.
Heterogeneous hardware – Today it's an iPad Pro with LiDAR. Tomorrow it's a Meta Quest Pro. Next quarter it's a custom ruggedised device with a Jetson Orin. Your inference engine must abstract across all three with a single model pipeline.

Success KPIs

KPI	Target	Measurement
Inference latency	< 200ms end-to-end	Measured on target hardware, P95 over 1000 inferences
Quantization accuracy loss	< 1% vs. FP32 baseline	mAP/alignment error measured on validation set
Offline sync reliability	100% data delivery	No scan data loss after offline period, verified by audit log
Point cloud compression	≥ 10× ratio, < 0.5mm loss	Measured on standard test point clouds
Device uptime	≥ 8 hours continuous operation	No crashes or OOM during full-shift field test
Thermal headroom	Sustained operation at < 80°C	SoC temperature monitored over 4-hour stress test

Failure If Underperforming

Inference exceeds 200ms → perception pipeline stalls, MR overlay lags, operator experience degrades instantly.
Offline data lost → an entire shift of inspection work vanishes. Operator must redo everything. Customer churn guaranteed.
Thermal crash → device shuts down mid-inspection on a factory floor. Operator loses trust. Device goes back in the box permanently.
Memory leak → application crashes after 90 minutes. Unreliable for production shifts. Cannot sign enterprise contracts.

Collaboration Interfaces

With	Interface
Lead CV Engineer	They train models. You deploy them. Joint responsibility for quantization accuracy validation. Shared latency budget (their preprocessing + your inference).
MR Systems Engineer	You share the same device hardware. GPU/CPU resource allocation must be coordinated – inference vs. rendering budget.
Backend Engineer	Your sync protocol talks to their ingestion APIs. Data format contract (protobuf schemas) must be jointly owned.
DevOps Engineer	They manage fleet-wide OTA updates. You define the update payload format and rollback strategy for model + runtime updates.

Why This Role Is Mission-Critical

The D2R platform is designed for industrial environments – places where cloud connectivity is a luxury, not a given. If the system only works when connected, it's a demo, not a product. You ensure that every core function (scan, align, measure, inspect) works at full capability with zero connectivity. You also own the 200ms inference budget that makes real-time perception possible. Without you, the system is either too slow or too fragile for production use.

About Us

Building the D2R (Design-to-Reality) platform – sub-millimetre CAD alignment + edge AI + mixed-reality overlay for industrial field workers. Venture-backed, seed-stage, < 20 engineers.

Location: Bangalore / Hyderabad
Stage: Seed / Pre-Series A (venture-backed)
Industries: Construction, Manufacturing, Infrastructure, Energy