Applied AI | Anomaly Detection Engineer

Bangalore | HyderabadFull-time

Mission

Build the automated pass/fail logic that turns raw deviation data into actionable QC decisions. Your anomaly detection models sit at the decision boundary – they determine whether a weld passes, a beam is misaligned, or a machined part is out of spec. This drives the upsell from "AR inspection viewer" to "AI-powered QC automation."

System Ownership

  • Primary: Anomaly detection models (deviation classification: pass / marginal / fail)
  • Primary: Time-series quality trend analysis (degradation detection across inspection history)
  • Primary: Automated QC decision engine (tolerance-aware pass/fail from 3D deviation data)
  • Primary: Model training and evaluation pipeline (data prep, training, validation, A/B testing)
  • Secondary interface: CAD Geometry Engineer (they provide GD&T tolerances and feature maps you evaluate against)
  • Secondary interface: Backend Engineer (they store inspection data you train on and serve predictions through)
  • Does NOT own: Point cloud registration (CV team), edge inference runtime (Edge AI team), dashboard visualisation (Full Stack team), data storage (Backend team)

What You Will Build

  • Deviation classification model – Given a 3D deviation map (per-point distances between aligned scan and CAD reference), classify each region as pass, marginal, or fail. Account for GD&T tolerances: a 2mm deviation on a rough surface is pass; a 0.5mm deviation on a precision bore is fail.
  • Anomaly detection for inspection time-series – Track deviation patterns across serial inspections. Detect: systematic drift (a machine tool wearing down), sudden shifts (fixture misalignment), seasonal patterns (thermal expansion in summer).
  • Root cause suggestion engine – When an anomaly is detected, correlate with contextual data (operator, shift, machine, material batch, ambient temperature) to suggest probable root causes.
  • Model training pipeline – Automated pipeline: data collection from inspection history → feature engineering → model training → offline evaluation → A/B test deployment → production rollout. MLOps-grade, not notebook-grade.
  • Confidence scoring and explainability – Every pass/fail decision must include a confidence score and a spatial heatmap showing which regions drove the decision. Operators and QC managers need to understand and trust the output.
  • Feedback loop integration – Operators can override AI decisions (mark false positives/negatives). These overrides feed back into training data with human-verified labels. Active learning loop.

Core Technical Responsibilities

  • Design and train deviation classification models: input is per-point deviation vectors + feature type labels + GD&T tolerance bands → output is region-level pass / marginal / fail with confidence
  • Build the time-series anomaly detection system using statistical methods (CUSUM, Shewhart charts) and ML methods (isolation forests, autoencoders, LSTM-based anomaly detectors) – compare and select based on real data
  • Implement GD&T-aware evaluation: a deviation is not just a number – it must be interpreted against the specific tolerance for that feature type (flatness vs. concentricity vs. position vs. true position)
  • Build the model training pipeline: data versioning (DVC), experiment tracking (MLflow or Weights & Biases), automated retraining triggers, model registry
  • Implement model serving for both cloud and edge: cloud inference for batch analytics, ONNX/TFLite export for edge deployment
  • Build the explainability layer: SHAP values or attention maps projected back onto 3D geometry for spatial explainability
  • Design and run A/B tests: shadow mode deployment, statistical significance testing, automated rollback on regression

Required Technical Mastery

  • Deep learning for industrial data: CNNs for 2D inspection images, PointNet/PointNet++ for 3D point clouds, autoencoders for unsupervised anomaly detection. Production deployment experience, not just paper reproduction
  • Time-series and anomaly detection: Statistical process control (SPC), CUSUM, EWMA, isolation forests, LSTM autoencoders, variational autoencoders. Understanding of false positive / false negative tradeoffs in industrial QC
  • MLOps: Model training pipelines (not notebooks), experiment tracking (MLflow, W&B), data versioning (DVC), model registry, CI/CD for models, automated retraining, A/B testing, canary deployments
  • GD&T interpretation: ASME Y14.5 / ISO 1101 tolerance types (form, orientation, location, runout). Understanding which tolerance type applies to which geometric feature, and how to evaluate conformance computationally
  • Frameworks: PyTorch (primary), scikit-learn (classical ML), ONNX Runtime (edge deployment), Open3D or PyTorch3D (3D data processing)
  • Languages: Python (primary), SQL (data extraction), basic C++ (for edge-optimised inference if needed)
  • Data engineering: Pandas, Apache Arrow, Parquet for large-scale tabular data. Understanding of data drift, label noise, class imbalance in industrial datasets

Production Challenges You'll Solve

  1. Class imbalance – 95% of inspections pass. Only 5% fail. Of those, 1% are critical failures. Your model must catch the rare critical failures while not drowning operators in false alarms. Precision-recall tradeoff, not accuracy.
  2. No labelled data at launch – New customer onboarding: zero historical labelled inspections. Build unsupervised anomaly detection that works from day 1, then transitions to supervised models as the operator feedback loop generates labels.
  3. GD&T-aware thresholding – A 1mm deviation means different things for different features. Your model can't use a single threshold. It must query the GD&T tolerance map from the CAD Geometry Engineer's output and apply feature-specific thresholds.
  4. Explainability for non-technical operators – A QC manager asks: "Why did the AI fail this part?" Your system must show a 3D heatmap with regions highlighted and a plain-language explanation: "The flange flatness deviation of 0.8mm exceeds the specified 0.5mm tolerance at these locations."
  5. Model drift – After 6 months, a customer's manufacturing process changes (new supplier, new machine). Your model's accuracy drops. Build automated drift detection that triggers retraining and alerts before accuracy degrades below acceptable thresholds.
  6. Edge inference constraints – The anomaly model runs on-site for real-time pass/fail during inspection. NVIDIA Jetson Orin: 40 TOPS. Model must run in < 500ms per scan segment. Quantise and optimise without sacrificing accuracy on critical failures.

Success KPIs

KPITargetMeasurement
Pass/Fail classification accuracy≥ 95% (precision + recall balanced)Validated against human QC inspector decisions
Critical failure detection recall≥ 99%Zero missed critical failures in production
False positive rate< 5%Operator override rate (false alarms)
Inference latency (edge)< 500ms per scan segmentMeasured on Jetson Orin
Time-to-first-anomaly-detectionDay 1 (unsupervised)Functional without labelled data
Model retraining cycle≤ 24 hours from trigger to deploymentMLOps pipeline metrics
Explainability coverage100% of fail decisions include spatial heatmapAutomated check on every prediction

Failure If Underperforming

  • Missed critical failure → defective part ships → customer's customer files a warranty claim or, in infrastructure/energy, a safety incident. Single missed defect can destroy trust in the entire platform.
  • Too many false positives → operators start ignoring AI decisions → adoption drops → the system becomes shelfware. If operators don't trust it, they won't use it.
  • No explainability → QC managers can't justify AI decisions to auditors → cannot sell into regulated industries → addressable market shrinks.
  • Without AI-powered QC automation, the platform is just an AR overlay tool. The upsell from ₹1L to ₹3L per unit per year depends entirely on automated QC intelligence.

Collaboration Interfaces

WithInterface
CAD Geometry EngineerThey provide GD&T tolerance maps and feature classifications. You consume these to define per-feature pass/fail thresholds.
Lead CV EngineerThey provide registered point clouds and deviation maps. Your models consume their alignment output.
Edge AI EngineerThey manage the on-device inference runtime. You provide ONNX/TFLite models that fit within their compute budget.
Backend EngineerYou provide model predictions via API. They store results and serve to dashboards. Training data comes from their inspection database.
Full Stack EngineerYour explainability outputs (heatmaps, confidence scores, root cause suggestions) must be API-ready for dashboard rendering.

Why This Role Is Mission-Critical

D2R without AI-driven QC is an expensive tape measure with AR goggles. The anomaly detection layer converts raw deviation data into actionable intelligence: pass/fail decisions, trend warnings, root cause hypotheses. This is the layer that justifies the SaaS subscription, drives customer retention, and enables the platform to replace manual QC workflows entirely. Every upsell conversation with enterprise buyers starts with: "Your AI caught 14 deviations last month that your inspectors missed."

About Us

Building the D2R (Design-to-Reality) platform – sub-millimetre CAD alignment + edge AI + mixed-reality overlay for industrial field workers. Venture-backed, seed-stage, < 20 engineers.

  • Location: Bangalore / Hyderabad
  • Stage: Seed / Pre-Series A (venture-backed)
  • Industries: Construction, Manufacturing, Infrastructure, Energy