If someone who is not too deep into analytics is given a very simple definition of streaming edge data (i.e. it is a real-time, continuous stream of data) and asked to identify the critical component(s) you need to decide on, to get meaningful insights from that data, the answer in all probability, will be: The critical decision that now needs to be taken is which algorithm needs to be leveraged on the data to generate the insights.

However, that is not entirely true.

Let us try to understand the concept of latency in real-time analytics. We generally think about latency in terms of the time it takes to capture the data. However, remember that this data must go through an algorithm and get processed to generate insight(s). This step introduces another type of latency, insights latency.

We need to minimize latency, not only the data-capture latency but also the insight generation latency. If you keep technical details apart, the more data you feed to an algorithm, the more time it will take to process and spit out results. However, more data may, in some cases, mean more meaningful insights.

Obviously, we need to then limit the data that is being fed to the algorithm in such a way that:

  • The time it takes to generate insights is minimized but also,
  • The data size is large enough to generate meaningful insights

While I have simplified it above, it is no simple task. And the difficulty increases with continuous real-time digital signal streams. These continuous streams mean that dividing data into chunks is imperative. But because of the nature of these signals, you also need to ensure that you define chunks in a way that does not distort the embedded insights.

The rate at which we capture and process the data for these systems is known as the frame rate of the system. Note that these frames need not always be sequential. We can have overlapping frames as well. The key is, the information we seek to obtain should not get distorted due to our frame selection approach.

So essentially, there is a triad of components that need to be balanced:

  • Processing time limit (Window size)
  • Frame rate
  • Algorithm

And these three need to be optimized in tandem and should not be determined in silos. The lower the latency, the more data windows you can analyze in a given period. And the more windows you analyze, the more your results become more precise.

For example, imagine using a machine learning model to recognize an irregular pattern of vibrations from machinery on a shop floor. If the windows are too far apart (frame rate latency), the algorithm might miss critical vibration parts and may not recognize it.

As indicated in a previous section, if the window is large, processing the data within that window will take longer. But a larger window contains more information, making signal processing and insights generation easier.

There are a plethora of algorithms that can be used for edge analytics. Some AI algorithms are more sensitive to window size, whereas some (those that maintain an internal memory of what is occurring in a signal) work well with very small window sizes. For some algorithms, the signal size must be large to correctly parse a signal. As we know, the choice of the algorithm impacts insight generation latency, which in turn constrains window size.

We are looking at a complex triad of trade-offs between window size, latency, and algorithm. Many solutions help you determine the optimal triad, but if you are building something from scratch, you must determine these yourself, depending on the project. For example, if you build a vibration detection solution, the choices will be customized to the vibration signals being generated and captured.

This is just one example of why edge analytics excites me. It opens the doors to new challenges that analytics professionals face. And with every challenge comes the gratification of solving that challenge.

References:

https://ocw.mit.edu/courses/res-6-008-digital-signal-processing-spring-2011/

https://www.sciencedirect.com/science/article/pii/S2352864822002255


Leave a comment