8 Key Insights on AI Infra from the co-founder of SiliconFlow

Nov 14, 2025

Pan Yang, co-founder of SiliconFlow, delivered a speech entitled “AI Infra: For Whom and Why?” at “Real-Time AI Infra Session” of Convo AI & RTE 2025. There are 8 core insights into the field of AI Infra.

TL;DR

8 key insights from Pan Yang’s speech on AI Infrastructure:

Inference First — The shift toward inference computing is driven by exponential growth in AI customers and computation needs.
Open-Source Opportunities — Open-source models catching up with 3–5 month gap, with breakthrough potential in multimodal areas.
The Calling for MaaS — One-stop platforms providing single API access to multiple models.
Three Major MaaS Challenges — Availability issues, performance variations, and the cost reduction illusion.
Do the Difficult but Right Thing — SiliconFlow’s commitment to delivering faster, better, and more cost-effective AI Infra.
Four AI Scenarios 2025 — Content generation, Agentic AI (Year of Agent), Coding, and Multimodal applications.
AI is Work, Not Tool — Jensen Huang’s paradigm shift emphasizing building for Agents rather than humans.
AI Infra — No Bubble— The market reality is showing massive unfulfilled demand proving there’s no bubble, only supply shortage.

Inference first

SiliconFlow predicted that “in the future, the vast majority of computing power will be used for inference, rather than training” in 2023. This trend is becoming a reality in 2025, mainly driven by two factors: the exponential growth in the number and usage of AI customers, and the exponential growth in the amount of computation required to complete a single task.

The opportunities of open-source models

Open-source models are rapidly catching up with closed-source models at a dynamic gap of 3–5 months. Currently, the open-source ecosystem for LLMs is close to state-of-the-art (SOTA), while for multimodal models such as image, audio and video, there are still significant opportunities for breakthroughs.

The calling for Model as a Service (MaaS)

This year, we witnessed frequent model updates, diverse specifications, varied architectures, and multiple modalities, no single company can independently deploy and maintain all models. Therefore, a one-stop MaaS platform capable of integrating various models has become an indispensable entry point for developers. This is precisely the direction that SiliconFlow continues to focus on, allowing users to quickly experience various models with just one API.

MaaS platforms currently face three major challenges

Availability and reliability challenges: Issues such as insufficient resources and 429/503 errors have occurred.
Performance and quality vary significantly: the same open-source model provided by different service providers exhibits significant differences in actual performance, reflecting the varying levels of model quantization and optimization, which directly affect the model’s final capabilities.
The illusion of decreasing costs: Although the cost of a single model may decrease tenfold annually, users always seek the latest and most powerful state-of-the-art (SOTA) models, while the invocation prices of these top-tier models remain relatively stable. Meanwhile, the number of tokens consumed to complete a task increases exponentially, resulting in no significant decrease in actual application costs.

Do the difficult but right thing

SiliconFlow has always been deeply rooted in the AI Infra field, deeply understanding the challenges involved, and continuously committed to promoting the implementation of solutions to provide users with faster, better-performing, and lower-cost AI Infra services.

Four highly consensus AI scenarios by 2025

Content generation: generating an article, providing customer service by chatbot, or building a knowledge base, everything revolves around language.
Agentic AI: This year has been called the year of Agent. Although there are various understandings of the concept of Agent, there have been some changes. For example, Manus has made great efforts to promote how to define Agent.
Coding: The first thing the mainstream models released this year did was to align with Agent and Coding capabilities. The industry generally agrees that Agent and Coding are the areas that consume the most tokens.
Multimodal: Especially in the Chinese Internet environment, the model consumption of multimodal is far greater than that of other forms.

“AI is Work, Not Tool”

Jensen Huang proposed that “AI is Work, Not Tool”, which is essentially a paradigm shift. AI will proactively operate tools to complete tasks, rather than passively responding to instructions. This will trigger a paradigm shift: building for agents, rather than for humans. Humans will increasingly delegate tasks to agents, operating less directly on software interfaces.

AI Infra — No Bubble

The entire AI infrastructure industry is free of bubbles, and it is actually in a state of “far from insufficient” supply. The global top technology companies have planned to purchase infrastructure worth hundreds of billions of dollars that have not yet been delivered. The current bottlenecks in the industry are the inability to produce chips and the lack of energy. Demand far exceeds supply capacity, proving the market’s authenticity and enormous potential.