The reason small language models will play a key role in tomorrow’s Enterprise AI systems is that the ability to place computation closer to end users/devices, rather than relying exclusively on centralized cloud servers, will allow us to design more innovative and practical solutions.
A robust end-to-end Enterprise AI solution will be a portfolio of LLMs and SLMs, in addition to a plethora of other deep learning and ML models.
The combination of generative AI + mobile edge computing has the potential to enable many more interactive, privacy-friendly, and real-time AI capabilities in consumer and industrial contexts (e.g. AR/VR, smart sensors, mobile assistants).
The research community and industry need better tools, benchmarks, and standards for deploying GenAI applications at the edge. Performance metrics beyond just accuracy are central: latency, energy, privacy, cost, and reliability.
Here is an interesting study in this arena. https://lnkd.in/gcxprA6d
The research outlines the challenges associated with deploying GenAI on edge devices and presents possible solutions to effectively address these obstacles.
It also introduces an intelligent mobile edge computing paradigm, which may be able to reduce response latency, improve efficiency, strengthen security and privacy preservation, and conserve energy.
There will likely be a tradeoff continuum: some deployments will accept simpler models but low latency/on-device operation, others will rely heavily on cloud/offload. Glossing over this tradeoff can lead to suboptimal designs.
Empowering GenAI With Edge Computing

