Managing GenerativeAI Cost with Open Source LLMs

When talking about GenerativeAI (GenAI) and Large Language Models (LLM) the most typical scenario is conversational – you interact with it like with a chatbot. It is very natural, organic, and understandable. However, given how expensive implementing GenAI can be in its current state, committing to that level of investment makes the most sense as a part of high-value flows. Using public, commercial LLMs like OpenAI can be very compelling because of the ability to create high-impact flows very quickly, but this also creates technical dependencies and risks for tech lock-in.

Bringing GenAI to other scenarios beyond the initial, high-value flows raises several critical questions around risk mitigation:

How much will it cost if we start to use it for more volume-intensive scenarios like data enrichment or search?
What happens if a commercial LLM dramatically changes pricing, making it no longer commercially viable for some of the scenarios we’ve already implemented?
Do we have strategic control over it if in the future commercial LLMs sunset some models?

This brings to light an important balancing effort across utility, development costs, and the expense to maintain of your LLM implementation, all while mitigating the impact of limited control over the future of those models. All of these considerations can have severe impacts on your products and business.

When the current wave of GenAI started, open-source LLMs were very new, with some usage limitations and not very powerful. But now, a year later, they are starting to catch up quickly in functionality and capabilities to commercial models. The constant evolution of open-source LLMs brings a constant flow of inference engine improvements, making even existing models with the current generation of architectures perform better. This makes it much more compelling to adopt version updates whenever a new version is released, rapidly accelerating the improvement of open-source LLM capabilities.

With open-source LLMs, you can safely introduce them to chatbot-type use cases like data enrichment or search, where the amount of data and LLM calls could be huge, driving up your utilization. In these scenarios, the cost to run genAI infrastructure powered by open-source LLM is manageable and predictable. You have full control over the source code to make strategic decisions that aren’t limited by cost or terms of use from a commercially licensed LLM. This level of control and predictability will create opportunities for more use cases to adopt significant usage of LLMs.

Some of the business scenarios which become viable in an open-source model include:

Processing of the entire product catalog with an LLM to create a database for GenAI
Data enrichment of product catalog in full scale to make descriptions more compelling
Introduce LLM as a part of key search flows

These additional scenarios work on significantly larger datasets and their potential impact is proportionally bigger.

While the usage of open-source LLMs for e-commerce scenarios is compelling, there are still some limitations to it. Performance is good, but not great, which means that extra steps have to be taken to coordinate access to the inference engine during idle times. As the speed of answers from LLM is not yet blazingly fast, this means that extra optimizations like semantic caching are very important to be engineered to include them as a part of fast flows like search.

A simple way to understand the value and potential impact of genAI for e-commerce is to start with smaller, less critical scenarios using open-source LLMs. Choose scenarios like product discovery or data enrichment where open-source LLM can already provide high-quality data and guidance. These scenarios can be isolated without disrupting existing functionality and key revenue-generating flows, while giving you an opportunity to specifically measure the value returned by genAI. These gains can be reinvested to fund the expansion of genAI into other critical flows.

Related Posts