Edge AI: How it works

  1. Infrastructure: Kubernetes and containers were the obvious choices for high availability, ultra-low latency and fast deployment of AI/ML models to the edge. Infrastructure agnostic, Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services. Our containers are based on the Docker platform, an efficient way to package and deliver software, and work on managed Kubernetes services provided by leading cloud providers like AWS, Microsoft Azure and Google.
  2. Data ingestion: For AI/ML models to evolve and achieve their potential, data must flow from ingest to multiple downstream systems such as a dashboard for analytics and monitoring or Apache Hadoop-based files for model training. For this function, we’re using Apache Kafka, which offers real-time data ingestion, data integration, messaging and pub/sub at scale. The resulting multi-party data ingestion layer provides millisecond latency, guaranteed delivery and support for throttling.
  3. Low-latency data storage: Data storage plays an important role in Edge AI due to its need for sub-second latency, high throughput, and a low footprint data storage layer, along with the ability to sync back to various cloud platforms for storage and historical insights. Here, we turned to the Redis NoSQL database system. NoSQL databases such as Redis are less structured than relational databases. Plus, they are more flexible and can scale better — making them the ideal solution for this application.
  4. Data processing: Real-time stream processing is required in Edge AI to capture events from diverse sources, detect complex conditions and publish to diverse endpoints in real time. We’re using the Siddhi Complex Event Processor (CEP). It is an open source, cloud-native, scalable, micro-streaming, CEP system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management and adaptive decision-making.
  5. AI/ML serving: The Edge AI platform provides complete AI/ML deployment and lifecycle management across the cloud and edge infrastructure in real time through the use of the Seldon.io open source framework. It supports multiple heterogeneous toolkits and languages.
  6. Data visualization: Visualizations for real-time analytics and dashboarding are built using the Grafana dashboard and custom-developed Node.js REST services for real-time queries of Redis datastores.
  7. ML training and use cases: The Edge AI platform supports the most popular ML frameworks, including scikit-learn, TensorFlow, Keras, and PyTorch and provides complete model lifecycle management. Once models are developed and tested, they are trained using large data sets, packaged, and ultimately deployed seamlessly on the edge.
  8. Security and governance: Security is built-in across the entire Edge AI platform. It can accommodate customizable security frameworks and is agnostic to customer deployment scenarios and interoperable across a multi-cloud strategy.
  9. Monitoring and orchestration: We achieve orchestration from the cloud to the edge via the CI/CD pipeline using tools such as Argo CD, a continuous delivery tool for Kubernetes. Our objective was to make Edge AI application deployment and lifecycle management automated, auditable and easy to understand.
  1. Data ingestion and processing
  2. Model training
  3. Model deployment and serving
  1. Real-time streaming workflow: This is where the main function of the application takes place. A CEP captures and processes streaming data and intelligently scans for insights or error conditions. The CEP extracts features or noteworthy information from the raw stream of incoming data and sends it to the trained models for analysis. In real time, predictions are sent back to the CEP rules engine for aggregation. If certain conditions are met, actions are taken, such as shutting down an external system or alerting a machine operator of a potential failure. All the real-time predictions and inferences are passed to the offline cloud for further monitoring and evaluation. This area is where features are updated based on evolving data enabling customers to do feature engineering integrated with the machine learning pipeline described in Figure 4 below.
  2. On-demand workflow with batches of data: External systems such as recommendation or personalization can embed models within the edge platform. These are exposed as REST or gRPC endpoints via an embedded API gateway, allowing real-time inference calls and predictions.
  3. Historical insights workflow: All data (raw, aggregated and predictions) is stored within an in-memory store in the edge platform. This data is synchronized periodically to cloud platforms via cloud connectors. Once the data lands in the cloud, it’s used to retrain and evolve models for continuous improvement. Retrained models follow a complete model lifecycle from training to tracking to publishing on the cloud. Published models are then seamlessly served to the edge platform in continuous deployment. Historical insights and batch inferencing are done in the cloud.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store