MosaicML is a San Francisco-based startup focused on making efficient training of machine learning models accessible to organizations of all sizes. Founded in 2021 by AI pioneer Naveen Rao and led by technologists from places like Intel, Google Brain, and OpenAI, MosaicML aims to democratize access to the latest advancements in generative AI.
The Origin Story
MosaicML was founded by Naveen Rao, former corporate vice president and general manager of Intel’s Artificial Intelligence Products Group. Rao has an extensive background in AI, previously founding neural network chip startup Nervana Systems which was acquired by Intel in 2016.
Seeing how costly and computationally intensive training modern deep learning models had become, Rao aimed to tackle the inefficiencies with MosaicML. The company’s core belief is that every organization should be able to benefit from AI and train models on their own data.
To realize this vision, MosaicML has assembled a team of experts across machine learning, high performance computing, and infrastructure. The founding team includes engineers from Intel, Google Brain, OpenAI, Uber AI Labs, and other leading organizations.
At its core, MosaicML is developing techniques to dramatically improve the efficiency of training large AI models. Their proprietary methods can deliver up to 15x cost savings compared to training models using standard frameworks like PyTorch and TensorFlow.
MosaicML’s technology builds on recent research into areas like model parallelism, optimizer improvements, and hardware-aware training techniques. By combining many incremental efficiency gains, MosaicML unlocks new levels of performance.
The company also employs a modular approach, allowing users to pick and choose components like:
- Composer – MosaicML’s library for optimizing PyTorch code to train models faster. Includes collection of speedup methods like gradient accumulation, floating point optimizations, and dynamic loss scaling.
- StreamingDataset – Tool for streaming data directly from cloud storage instead of needing to store full dataset locally. Enables multi-node distributed training with cloud data.
- Model parallelism – Splits model across multiple GPUs/machines to train bigger models and speed up training.
- Foundation models – Pretrained models like MPT-7B and MPT-30B which users can download, customize, and deploy.
MosaicML open sources many of these technologies so the whole community can benefit. Companies can then build on MosaicML’s stack to create proprietary IP and customized models.
While MosaicML open sources many core libraries for public use, its main commercial product is the MosaicML Platform. This platform gives enterprises an easy way to train and deploy large AI models on their own data, within their secure environment.
The managed platform handles all the infrastructure, optimizations, and machine learning engineering required for production model training. This allows companies to skip straight to building with large language models and generative AI.
With the MosaicML Platform, enterprises can:
- Quickly pretrain models on their private data, with full control and visibility.
- Orchestrate training across low-cost spot instances and multiple cloud providers.
- Track experiments and model lineage through integrations with popular MLOps tools.
- Deploy trained models for inference while maintaining data privacy and compliance.
- Scale elastically while only paying for resources used.
The platform architecture separates storage, compute, and service layers for flexibility across on-prem, multi-cloud, hybrid, and edge environments.
MosaicML handles the complex model training workflows so enterprises can focus on their data and use cases. The platform’s ease of use combined with MosaicML’s efficiency breakthroughs open up access to generative AI.
MosaicML Use Cases
Enterprises across many industries leverage MosaicML to build AI solutions tailored to their specific data. Example use cases include:
Financial services – Banks train models on historical customer transaction data to power next-gen chatbots and personalization.
Healthcare – Clinicians create diagnostic models based on medical images and electronic health records unique to their facility.
Manufacturing – Factories optimize predictive maintenance models with data from their equipment sensors and IoT devices.
Technology – Tech companies enhance code intelligence with LLMs trained on internal codebases and documentation.
Retail – Retailers better forecast demand and analyze supply chain disruptions using store and inventory data.
Cybersecurity – Security teams develop anomaly detection models trained on network activity and threat intelligence.
Government – Agencies like the NIH use MosaicML to create biomedical language models on research literature.
Across verticals, MosaicML delivers the performance needed to make building with large models practical.
MosaicML integrates with the popular tools enterprises already use for data and model management. This includes support for:
- MLOps – Tools like Weights & Biases, Comet, and Neptune allow tracking experiments run on MosaicML. Models can be registered, compared, and promoted.
- Data infrastructure – Bring your own data from internal databases, data lakes on AWS S3 or GCS, and other storage systems.
- AI frameworks – Interoperable with all the major frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
- Cloud providers – Train across low-cost spot instances on AWS, GCP, and Microsoft Azure.
- AI deployment – Package trained models and serve predictions through SageMaker, Azure Machine Learning, and other tools.
These integrations provide flexibility to use MosaicML as part of existing infrastructure.
While MosaicML is a relatively new startup, it already counts numerous enterprises as customers. Early adopters span industries including:
- Replit – Used MosaicML to train Codex model for powering AI coding assistant.
- Anthem – Leading health insurance company using MosaicML to build clinical language models.
- Twilio – Communications platform company improving customer support models with MosaicML.
- Karius – Life science startup training pathogen detection models for infection testing.
- Allen Institute for AI – Nonprofit research institute using MosaicML to create biomedical LLMs.
- Samsung Next – MosaicML investor and customer focused on bringing AI to next gen consumer devices.
These customers highlight the breadth of enterprise use cases MosaicML can support. The startup is poised for rapid growth as more companies embrace AI and leverage its platform.
Funding and Investors
As an enterprise startup selling to major corporations, MosaicML has attracted significant investor interest:
- June 2022 – $38M Series A led by AME Cloud Ventures with participation from DCVC, Playground Global, and Samsung Next.
- October 2021 – $9M Seed led by Lux Capital and Frontline Capital.
Additional investors include OpenAI co-founder Greg Brockman, Segment co-founder Calvin French Owen, and angel investor Elad Gil.
Top venture capital firms are eager to back MosaicML’s vision of empowering every company to benefit from AI. The substantial capital allows MosaicML to continue expanding its platform capabilities.
The Future of MosaicML
MosaicML sits at the intersection of two major trends – the democratization of generative AI and rise of custom corporate language models. Enterprises want to leverage large language models, but struggles with the difficulty of training them.
MosaicML introduces efficiencies that allow every organization to develop tailored LLMs with full control of their data. This aligns with growing concerns over ethics, bias, and privacy in AI applications.
As more businesses embrace AI-first strategies, MosaicML provides the tools they need to be AI leaders. Backed by world-class technologists and investors, MosaicML is poised to power the next generation of enterprise AI.
By making large language model training accessible, MosaicML aims to put every company in the driver’s seat of the AI revolution.
So in summary, MosaicML combines cutting-edge R&D with an enterprise-focused platform for efficient LLM training. Their technology and expertise helps organizations operationalize generative AI with their own data. As AI adoption grows, expect MosaicML to play an integral role empowering businesses to customize models aligning with their values.