AI Big Model Era Calls for Next-Generation Infrastructure, DataOps 2.0 and Scheduling Orchestration Increasingly Important

In the age of AI, DataOps 2.0 represents a new data management and operations model that accelerates the data analysis and decision-making process through automated data pipelines, real-time data processing and cross-team collaboration. It incorporates artificial intelligence and machine learning technologies to make data acquisition, cleansing and analysis more efficient, driving organizations to maintain a competitive edge in a rapidly changing marketplace.

On the other hand, in the AI era, scheduling technology is becoming more and more important, becoming the core of efficient resource management and task automation. Through intelligent algorithms and machine learning, scheduling can analyze the system load in real time, optimize resource allocation, and dynamically adjust the order of task execution according to demand. This not only improves the operational efficiency of the system, but also reduces the need for human intervention and improves response speed and flexibility.

Bessemer Venture Partners has a long history of infrastructure investments. As a result of this firm's long-term observations, they have found that a new infrastructure paradigm tailored for AI is emerging to augment the next wave of enterprise data software in the AI era. Among these, the development of DataOps 2.0 and scheduling orchestration technologies and industries are also in the spotlight.

The following are Bessemer Venture Partners' observations and predictions about the state of the art of new infrastructures in the age of AI, for informational purposes only:

Catalog:

I. The AI revolution is catalyzing the evolution of the data stack

II. Emerging Infrastructure Stacks Tailored for AI

1. Innovations in extensions, innovative model architectures and specialized base models

2. Innovations in model deployment and reasoning

3, Cutting-edge model training and development techniques

4、DataOps 2.0 in the Age of AI

5、Next Generation Observability

6. Orchestration

III. Huge Opportunities in the AI Infrastructure Business

I. The AI revolution is catalyzing the evolution of the data stack

Machine learning has made remarkable progress in recent years - since the 2017 breakthrough paper "Attention is all you need" ((/abs/1706.03762) laid the foundation for the Since the foundations of the transformer deep learning architecture were laid, we've had a "Cambrian explosion" of AI research, with new papers being published every day and accumulating at an alarming rate.

AI Infrastructure arXiv Paper Chart

This structural shift in AI innovation is catalyzing the evolution of data infrastructure in many ways.

First, AI is driving the modern data stack, and existing data infrastructure companies have begun to integrate AI capabilities into the synthesis, retrieval, and enrichment aspects of data management.

In addition, recognizing the strategic importance of the AI wave as a business opportunity, some incumbents have even released entirely new products to support AI workloads and AI-first users.

For example, many database companies now support embedding as a data type, either as a new feature or as a standalone product.

Second, data and AI are inextricably linked. Data is growing at an extraordinary rate and is pushing the limits of current infrastructure tools.

In particular, the generation of unstructured data is projected to soar to 612 zettabytes by 2030 (one zettabyte equals one trillion gigabytes or one billion megabytes.) (One zettabyte equals one trillion gigabytes or one billion megabytes;

This growth is driven by the machine learning/AI boom and the synthetic data generated by generative models in all types of modalities; in addition to the volume of data, the complexity and diversity of data types and sources is increasing.

Companies are addressing these challenges by developing new hardware, including more powerful processors (e.g., GPUs, TPUs), better networking hardware to facilitate efficient data transfer, and next-generation storage devices.

Finally, a new wave of AI-native and AI-embedded startups is emerging based on recent advances in machine learning and hardware - companies that are either leveraging AI/ML from the get-go or augmenting existing capabilities with it.

Unfortunately, much of the current data infrastructure and tools are still not optimized for AI use cases. Like forcing a square peg into a round hole, AI engineers have to create workarounds or tricks within the existing infrastructure.

II. Emerging Infrastructure Stacks Tailored for AI

As multiple "why now" drivers have accumulated in recent years, the lack of native and purpose-built tools has led to a new AI infrastructure stack built for AI-native and embedded AI companies.

We are in the midst of a massive technological change - innovation within this emerging AI infrastructure stack is advancing at an unprecedented pace.

Even as we write this roadmap and develop our ideas, researchers are publishing new papers every day, rendering previous ideas obsolete.

The rapidly changing environment is daunting, but despite the unknown variables, the potential and opportunities for startups are vast.

With the revolution of AI, we unfold our investments. With new cutting-edge research being released daily, it sometimes feels like the ground is shifting beneath your feet. We are constantly incorporating the latest developments into our theories. Here are a few topics that interest us:

1. Innovations in extensions, innovative model architectures and specialized base models

The modeling layer is becoming the most dynamic and competitive layer in the AI infrastructure stack.

Base models are the new "oil," and given the strategic importance of this part of the stack, the winners here could define the future of downstream applications for years to come as more and more companies build applications based on their heuristics.

We have seen a surge of activity in the modeling layer - from open source models to small language models. A significant amount of activity and capital has been focused on extending transformer-based models (e.g., pass-through data, model parallelism, hybrid modality, etc.) or trying to push these models on various performance attributes (e.g., cost, latency, deployment, memory footprint, context windows, etc.).

For example, several teams are improving the building blocks (primitives) of generative models, such as attentional and convolutional mechanisms, to create more powerful and efficient AI techniques.

Since model training requires significant capital, many of these need to be funded by venture capital. In addition to the cost of training, research talent, engineering talent and specialized resources need to be available to innovate at this level.

But "attention is not all you need" - researchers are also developing non-transformer architectures and pushing the boundaries of what is possible with the underlying models.

For example, state-space models (SSMs), such as Mamba, as well as various recursive architectures, are expanding the frontiers of fundamental models that are less computationally intensive, have lower latency, and may provide cheaper and faster alternatives to traditional transformers for training and reasoning.

SSM focusing on dynamic, continuous systems has existed since the 1960s, but has only recently been applied to discrete end-to-end sequence modeling.

Linear complexity also makes SSM a great choice for long context modeling, and we see several companies thriving in this area.

While early results show impressive efficiencies on a variety of properties, researchers still need to demonstrate a variety of properties now taken for granted in the transformer ecosystem (e.g., control, alignment, reasoning).

In addition, groundbreaking research in the field of geometric deep learning, including category deep learning and graph neural networks, is providing researchers with methods for structured reasoning.

While this field has been around for quite some time, it has seen renewed interest in this new wave of AI, as geometric approaches often enable deep learning algorithms to take into account geometric structures embedded in real-world data (e.g., abstract syntax trees in code, biological pathways, etc.) and can be applied to a variety of domains.

Furthermore, in addition to general-purpose models, there are many teams currently training models for specific purposes such as code generation, biology, video, images, speech, robotics, music, physics, brain waves, etc., which adds another vector of diversity and flexibility to the modeling layer.

2. Innovations in model deployment and reasoning

The compute layer is one of the most complex layers in the AI infrastructure stack. Large enterprises and startups alike are innovating in the area of the compute layer, exacerbating its complexity. The compute layer is complex not only because it is a core layer, but also because it powers the rest of the stack:

It incorporates innovations and interactions between hardware (e.g., GPUs and customized hardware), software (e.g., operating systems, drivers, configuration tools, frameworks, compilers, and monitoring and management software), and business models.

At the hardware level, GPU costs are falling as supply chain shortages ease. Next-generation GPUs, such as NVIDIA's H100 and B100 series, combine with advances in interconnect technology to extend data and GPU parallelism at the model level.

In addition to hardware, various algorithmic and infrastructure innovations are enabling new AI capabilities. For example, the self-attention mechanism in the TRANSFORMER architecture has become a key bottleneck due to its high computational requirements, especially the secondary time and space complexity.

To address these challenges, the machine learning systems community has published research on various models and infrastructure layers: evolution of self-attention mechanisms (e.g., Ring Attention), KV Cache optimizations (e.g., channel quantization, pruning, approximation), and so on.

These innovations reduce the memory footprint of the LLM decoding step, enabling faster inference, longer contexts, and cost-effectiveness.

There are still many questions to be answered as we move towards a personalized, cheaper approach to fine-tuning.

Methods such as LoRA freed up memory for cost-effective fine-tuning, but it proved difficult to manage GPU resources in a scalable way to serve the fine-tuning model (GPU utilization is often low, and copying weights into and out of memory reduces arithmetic strength).

While improvements at the higher levels of batch, quantization, and serverless information stacks have made the infrastructure easier and more accessible, there are still many "open" questions.

Programs like Skypilot and vLLM, as well as companies like Modal, Together AI, Fireworks, and Databricks, are pushing the envelope.

Vendors in this tier have had a significant impact on the unit economics (especially gross margins) of AI application companies utilizing their services, and we expect these dynamics to continue to drive innovation based on downstream application needs.

3. Cutting-edge model training and development techniques

As mentioned earlier, AI research is advancing at an alarming rate, especially since we are in an exciting period where new AI methods and technologies are booming in terms of pre-training, training, and development.

New methods are being developed every day, parallel to the evolution of existing methods, meaning that the AI infrastructure stack is being dynamically defined and redefined.

We are seeing a proliferation of these technologies across the spectrum, advancing the output of LLMs and diffusion models in terms of foundational performance parameters such as accuracy and latency, up to pushing the limits of new frontiers (e.g., inference, multimodality, vertically-specific knowledge, and even agentic AI or emergent capabilities).

We highlighted some architectural paradigms in Section 1, but other examples of techniques are listed below:

Fine-tuning and alignment: supervised feedback, specialized training data, or refinement of weights to fit specific tasks (e.g., RLHF, constitutional AI, PEFT)
Retrieval Enhanced Generation (RAG): connecting LLMs to external knowledge sources through a retrieval mechanism, combining generation functionality with the ability to search and/or integrate relevant knowledge base data
Cueing paradigm: an interactive process in which the LLM is instructed and guided to achieve a desired outcome (e.g., sample less learning, multi-sample contextual learning, backward cueing, CoT, ToT)
Model blending and merging: machine learning methods that blend separate sub-networks of AI models to perform tasks together (e.g., MoE, SLERP, DARE, TIES, frankenmerging)
Training stability: decisions about normalization methods (e.g. LayerNorm vs. RMSNorm), normalization, activation, and other attributes affect training stability and performance
Parameter efficiency: Various methods that affect model capability and efficiency, such as efficient continuous pretraining

Despite the tradeoffs between experimental simplicity and effectiveness of these approaches, we predict that these techniques will inspire new developments as researchers iterate more quickly and address real-world scalability and applicability issues.

Additionally, it is common to deploy a mix or combination of technologies in Applied AI, but ultimately, the approach that delivers the greatest benefits is likely to dominate the Applied AI space.

In addition, the landscape is dynamically evolving as the underlying models continue to improve and more AI-driven solutions are deployed in production and within real-world constraints.

Ultimately, we think it's still early days and has yet to really establish hegemony, especially in the enterprise AI space.

As a result, we are excited to work with companies developing, enabling, or commercializing these technologies as they reshape and reimagine how we build, develop, operate, and deploy AI models and applications in reality, and form a critical layer of tools for AI companies.

4、DataOps 2.0 in the Age of AI

We mentioned at the beginning of the article that data and AI output are inextricably linked.

We see this reflected in many ways, from data quality impacting AI output (garbage in garbage out), to recent AI innovations unlocking insights from previously untapped data sources (e.g., unstructured data), to proprietary data serving as a competitive advantage and moat for AI-native companies.

We explored this relationship in our Data Shift Right article and highlighted new data strategies companies are utilizing to optimize AI for competitive advantage in our recent Data Guide.

Given these catalysts, Data Ops is facing new demands, leading to the emergence of new methods and frameworks for storage, labeling, pipelining, preparation, and transformation. Some exciting examples include:

In the preprocessing phase, we have seen the rise of data management and ETL solutions designed specifically to manipulate LLM data.
The emergence of new data types (e.g., Embedding) has inspired entirely new classes of data operations, such as vector databases.

Data annotation continues to evolve in the age of AI, incorporating advanced data-centric approaches, which accelerates previous manual or weakly supervised methods and attracts more non-technical end users.
The AI revolution has driven the mainstream adoption of tools for processing a wide range of data modalities (especially unstructured data, such as video and images). Many of these state-of-the-art tools are now integrated into everyday workflows. Previously, processing these modalities was challenging and often customized, resulting in organizations not being able to fully derive value from these rich data sources.
As organizations leverage innovations in model training and inference techniques (see Section III), new enterprise tool chains and data workflows (e.g., RAG stacks) are emerging.

Just as the modern data stack has fueled the rise of the iconic decagonists (defined as companies less than 10 years old but with a market capitalization of more than $10 billion) in the data Ops space, we believe that a new generation of data Ops giants focused on AI workflows will emerge.

5. Next-generation observability

With each new wave of technology, observability has taken various forms (e.g., data observability in modern data stacks, APM for cloud application development).

Similarly, we're seeing observability evolve in the age of AI - a new set of vendors are emerging to help companies monitor the performance of models and AI applications.

While we have seen many companies enter the market to address a key issue, whether in pre-production (e.g., LLM evaluation, testing), post-production (e.g., monitoring, capturing deviations and biases, interpretability), or even extending to adjacent functionality, such as model security and compliance, intelligent routing, and caching;

We anticipate (and have seen) that the long-term roadmaps of these companies will converge to create end-to-end observability platforms that create a single source of truth for model performance in both pre- and post-production environments.

We're excited about Datadog-like results in the AI observability space - however, given the ever-changing landscape, new models, new training/fine-tuning techniques, and new types of applications, winning in the observability space may require a team that can deliver high product velocity quickly, perhaps more so than in other areas! .

As we learned from Datadog's rise, the company was able to stand out among a dozen or so (similar) competitors because of their focus:

Rapid implementation of a broad set of products and capabilities;
Build deep coverage that Datadog can monitor;
Provide extensive integration support in order to bring as many neighboring systems as possible into their ecosystem.

We're excited to work with this generation of startups who are taking on such tasks in the AI stack.

6. Arrangement

As emerging LLM and generative AI application companies continue to grow, we see significant opportunities for companies in the orchestration layer to become pillars of AI development.

As the "conductor of the band" in the AI development lifecycle, and responsible for ensuring and coordinating the development, deployment, integration, and general management of AI applications, orchestrating the vendors is a key (and importantly, vendor-neutral, i.e., all information is absolutely safe and secure on a neutral emulation platform, and any party in the collaborative project can only access information that is relevant to them). access the information that is relevant to them) The centralized hub that orchestrates the extensions of the various AI tools that developers encounter.

Companies such as Langchain and LlamaIndex emerged early in the LLM space, and a strong open source ecosystem drove adoption.

They created frameworks that provide developers with a set of best practices and toolkits for developing their own LLM applications, abstracting away the complexity of connecting the right data sources to the model, implementing retrieval methods, and more.

In addition to LLM, we are seeing an ecosystem of vendors creating orchestration solutions for agent-based applications, further simplifying the development process for new and innovative agent-based AI applications.

Similar to the success of ReAct in simplifying web development, we anticipate similar opportunities for AI orchestration vendors to simplify development and empower the masses to develop a variety of AI applications (e.g., LLM, agents, computer vision, etc.).

III. Huge Opportunities in the AI Infrastructure Business

As Mark Twain once said, "When everyone is looking for gold, it's a good time to be in the pick and shovel business."

We believe there is a huge opportunity to build "picks and shovels" for machine learning, which will lead to a large number of multi-billion-dollar companies that will provide enterprises with the tools and infrastructure to operationalize AI.