When we say that data is the foundation we need for successful AI or, actually, any digital transformation implementation, we mean good, quality data.

The good news is that AI can help with data quality challenges as well. This paper is a good example.

In this paper (https://lnkd.in/gSsiX5_D), researchers address the common problem of missing data in datasets (e.g., sensor failures, incomplete records) and the bias/inefficiency that results.

They basically propose a novel method that they call MMISVAE (Missing-data Multiple Importance Sampling Variational Autoencoder). The method has two key steps:

Step 1: Learning step: They train a VAE architecture with multiple separate encoder networks (mixture components) and then average them to get a more expressive latent representation under missing‐data conditions.

Step 2: Imputation step: They perform imputation by drawing from multiple proposal distributions (via multiple encoders) and then resampling based on importance weights (multiple importance sampling) to estimate missing values more accurately (conditional expectation) in an unsupervised way.

The research shows that this method outperforms existing VAE-based imputation approaches (like MIWAE) in experiments on several image‐based datasets (binary versions of MNIST, Omniglot, Fashion MNIST) under missing‐completely‐at‐random (MCAR) and missing‐at‐random (MAR) settings.

Just another illustration of how AI can help in your data strategy.


Leave a comment