Large language models (LLMs) use vast amounts of data and computing power to create answers to queries that look and sometimes even feel “human”. LLMs can also generate music, images or video, write code, and scan for security breaches among a host of other tasks.

This capability has led to the rapid adoption of generative artificial intelligence (GenAI) and a new generation of digital assistants and “chatbots”. GenAI has grown faster than any other technology. ChatGPT, the best-known LLM, reached 100 million users in just two months, according to the investment bank UBS. It took the mobile phone 16 years to reach that scale.

LLMs, however, are not the only way to run GenAI. Small language models (SLMs), usually defined as using no more than 10 to 15 billion parameters, are attracting interest, both from commercial enterprises and in the public sector.

Small, or smaller, language models should be more cost-effective to deploy than LLMs, and offer greater privacy and – potentially – security. While LLMs have become popular due to their wide range of capacities, SLMs can perform better than LLMs, at least for specific or tightly defined tasks.

At the same time, SLMs avoid some of the disadvantages of LLMs. These include the vast resources they demand either on-premise or in the cloud, and their associated environmental impact, the mounting costs of a “pay-as-you-go” service, and the risks associated with moving sensitive information to third-party cloud infrastructure.

Less is more

SLMs are also becoming more powerful and are able to rival LLMs in some use cases. This is allowing organisations to run SLMs on less powerful infrastructure – some models can even run on personal devices including phones and tablets.

“In the small language space, we are seeing small getting smaller,” says Birgi Tamersoy, a member of the AI strategy team at Gartner. “From an application perspective, we still see the 10 to 15 billion range as small, and there is a mid-range category.

“But at the same time, we are seeing a lot of billion parameter models and subdivisions of fewer than a billion parameters. You might not need the capability [of an LLM], and as you reduce the model size, you benefit from task specialisation.”

For reference, ChatGPT 4.0 is estimated to run around 1.8 trillion parameters.

Tamersoy is seeing smaller, specialist models emerging to handle Indic languages, reasoning, or vision and audio processing. But he also sees applications in healthcare and other areas where regulations make it harder to use a cloud-based LLM, adding: “In a hospital, it allows you to run it on a machine right there.”

SLM advantages

A further distinction is that LLMs are trained on publicly available information. SLMs can be trained on private, and often sensitive, data. Even where data is not confidential, using an SLM with a tailored data source avoids some of the errors, or hallucinations, which can affect even the best LLMs.

“For a small language model, they have been designed to absorb and learn from a certain area of knowledge,” says Jith M, CTO at technology consulting firm Hexaware.

“If someone wants an interpretation of legal norms in North America, they could go to ChatGPT, but instead of the US, it could give you information from Canada or Mexico. But if you have a foundation model that is small, and you train it very specifically, it will respond with the right data set because it doesn’t know anything else.”

A model trained on a more limited data set is less likely to produce some of the ambiguous and occasionally embarrassing results attributed to LLMs.

Performance and efficiency can also favour the SLM. Microsoft, for example, trained its Phi-1 transformer-based model to write Python code with a high level of accuracy – by some estimates, it was 25 times better.

Although Microsoft refers to its Phi series as large language models, Phi-1 used only 1.3bn parameters. Microsoft says its latest Phi-3 models outperform LLMs twice their size. The Chinese-based LLM DeepSeek is also, by some measures, a smaller language model. Researchers believe it has 70bn parameters, but Deepseek only uses 37bn at a time.

“It’s the Pareto principle, 80% of the gain for 20% of the work,” says Dominik Tomicevik, co-founder at Memgraph. “If you have public data, you can ask large, broad questions to a large language model in various different different domains of life. It’s kind of a personal assistant.

“But a lot of the interesting applications within the enterprise are really constrained in terms of domain, and the model doesn’t need to know all of Shakespeare. You can make models much more efficient if they are suited for a specific purpose.”

Another factor driving the interest in small language models is their lower cost. Most LLMs operate on a pay-as-you-go, cloud-based model, and users are charged per token (a number of characters) sent or received. As LLM usage increases, so do the fees paid by the organisation. And if that usage is not tied into business processes, it can be hard for CIOs to determine whether it is value for money.

With smaller language models, the option to run on local hardware brings a measure of cost control. The up-front costs are capital expenditure, development and training. But once the model is built, there should not be significant cost increases due to usage.

“There is a need for cost evaluation. LLMs tend to be more costly to run than SLMs,” says Gianluca Barletta, a data and analytics expert at PA Consulting. He expects to see a mix of options, with LLMs working alongside smaller models.

“The experimentation on SLMs is really around the computational power they require, which is much less than an LLM. So, they lend themselves to the more specific, on the edge uses. It can be on an IoT [internet of things] device, an AI-enabled TV, or a smartphone as the computational power is much less.”

Deploying SLMs at the edge

Tal Zarfati, lead architect at JFrog, a software supply chain supplier making use of AI, agrees. But Zarfati also draws a distinction between smaller models running in a datacentre or on private cloud infrastructure and those running on an edge device. This includes both personal devices and more specialist equipment, such as security appliances and firewalls.

“My experience from discussing small language models with enterprise clients is they differentiate by whether they can run that model internally and get a similar experience to a hosted large language model,” says Zarfati. “When we are talking about models with millions of parameters, such as the smaller Llama models, they are very small compared to ChatGPT4.5, but still not small enough to run fully on edge devices.”

Moore’s Law, though, is pushing SLMs to the edge, he adds: “Smaller models can be hosted internally by an organisation and the smallest will be able to run on edge devices, but the definition of ‘small’ will probably become larger as time goes by.”

Hardware suppliers are investing in “AI-ready” devices, including desktops and laptops, including by adding neural processing units (NPUs) to their products. As Gartner’s Tamersoy points out, companies such as Apple have patents on a number of specialist AI models, adding; “We are seeing some examples on the mobile side of being able to run some of these algorithms on the device itself, without going to the cloud.”

This is driven both by regulatory needs to protect data, and a need to carry out processing as close to the data as possible, to minimise connectivity issues and latency. This approach has been adopted by SciBite, a division of Elsevier focused on life sciences data.

“We are seeing a lot of focus on generative AI throughout the drugs discovery process. We are talking about LLMs and SLMs, as well as machine learning,” says Tamersoy.

“In what scenario would you want to use an SLM? You’d want to know there is a specific problem you can define. If it’s a broad, more complex task where there is heavy reasoning required and a need to understand context, that is maybe where you would stick to an LLM.

“If you have a specific problem and you have good data to train the model, you need it to be cheaper to run, where privacy is important and potentially efficiency is more important than accuracy, that is where you would be looking at an SLM.” Tamersoy is seeing smaller models being used in early stage R&D, such as molecular property prediction, right through to analysing regulatory requirements.

At PA Consulting, the firm has worked with the Sellafield nuclear processing site to help them keep up to date with regulations.

“We built a small language model to help them reduce the administrative burden,” says Barletta. “There’s constant regulatory changes that need to be taken into account. We created a model to reduce that from weeks to minutes. The model determines which changes are relevant and which documents are affected, giving the engineers something to evaluate. It is a classic example of a specific use case with limited data sets.”

As devices grow in power and SLMs become more efficient, the trend is to push more powerful models ever closer to the end user.

“It’s an evolving space,” says Hexaware’s Jith M. “I wouldn’t have believed two years ago that I could run a 70 billion parameter model on a footprint that was just the size of my palm…personal devices will have NPUs to accelerate AI. Chips will allow us to run local models very fast. You will be able to take decisions at wire speed.”


By itnews