One of my hobbies is collecting toys, specifically superhero action figures. The child inside me refuses to die, it seems. If you wonder if I play with these toys, maybe I do. If posing these toys to mimic scenes in movies and taking pictures is akin to playing, then I do, as you can see in the picture below.

But isn’t this article about Generative AI? It is. But I could not resist sharing the picture above, which I thought was great. Also, the genesis of this article is in an experience I had during the shipping process of one of the toys I ordered. So now let us get into the other area that excites me. Linking day-to-day observations with how technology can help alleviate issues we observe daily.
Many of these action figures that I buy are shipped from China and Japan. I recently ordered one that was shipped from China. Eagerly waiting for the toy to arrive, I would check the status every morning using the tracking number. I noticed something unusual: a data entry error once the shipment was in the United States. Look at the screengrab of the tracking history, and you will figure out the error.

Now, assume that you are a logistics service provider that can map the routes of your parcels on a map. A program parses the country field from the tracking data to show how the parcels travel. Imagine how weird this transit will look plotted on a map, showing a path from China to the US, to Israel, then showing delivered again in the U.S? For large carriers that handle millions of shipments, the number of errors, even a tiny percentage, will be high in terms of absolute figures. And this is where Generative AI can help, in real-time, to help keep the data clean and accurate.
Before we touch upon all the areas where Generative AI can help, let us explore how it can help in this specific example since it fits perfectly within the core “learning” aspect of Gen AI. An extremely simplified way to highlight how Large Language Models (LLMs) work is in the context of a “fill in the blank” situation. If you ask a Gen AI model to complete a sentence with a word missing, it will leverage its “learning” to score the best-fit word. That score is based on which word has occurred most frequently, with the other words in the sentence, in that specific context.
In this case, if you plot the final list of countries as a travel path, the path becomes: China>U.S>Israel>U.S.
It will take less than a millisecond for an LLM trained on historical data to flag and correct the discrepancy. But it is not just about looking at the tracking data once the shipment has been delivered. The model can keep a real-time track. So before a customer sees that a shipment that arrives in the U.S. is now showing that it has arrived in Chicago D.C. in Israel, the LLM model can either flag it or ideally, fix it.
Data quality is not the only area where Generative AI or LLMs can make a difference. Some other areas include:
- Exploration: No matter how large your dataset is, you can leverage LLMs to explore the data in a jiffy. You may also cover insights that the typical exploration methods may miss. The current way of exploration is very much “bookish,” where we leverage a particular set of statistical approaches or look for a standard set of patterns. LLMs can explore every possible pattern, correlation, and distribution embedded within the data. This may also uncover data quality issues.
- Masking: If the data contains confidential information, but you need to share it with external partners, LLMs can help. Using LLMs, you can build a representative dataset that still captures the properties that partners can use but will mask personal or confidential information.
- Augmentation: This is a much-known and more widely used application of LLMs. You can use LLMs to create synthetic data in instances where you do not have sufficient data for analysis purposes.
In addition to these, other applications include:
- Data Governance
- Data Compression
- Data Cataloging
There is so much discussion these days about which data architecture is optimal. So many buzzwords are being thrown around-lake, mesh, fabric, etc. Regardless of your architecture strategy, you can use LLMs to turbo charge your data architecture. It can help you manage your data in a customized way that aligns with your specific data needs.

