- Published on
ML Model Hosting: Deploying Simple ML Models on Vercel or Serverless Functions
- Authors
- Name
- Wasif Ali
Introduction
Deploying Machine Learning (ML) models on Vercel or serverless functions enables scalable, low-latency predictions without managing infrastructure. This guide covers:
- Packaging an ML model.
- Deploying it as a serverless API.
- Using it from a Next.js app.
- 1. Choosing the Right Model for Deployment
- 2. Converting ML Models to a Deployable Format
- 3. Deploying the Model on Vercel Serverless Functions
- 4. Calling the API from a Next.js App
- 5. Optimizing for Serverless Performance
- Conclusion
- Support
- License
1. Choosing the Right Model for Deployment
Serverless platforms work best with lightweight models due to execution time limits. Suitable models include:
✔️ Linear Regression (e.g., sales forecasting).
✔️ Image Classification (lightweight CNN models).
✔️ Sentiment Analysis (text-based models like Naïve Bayes).
2. Converting ML Models to a Deployable Format
Most ML models are trained using Python frameworks like TensorFlow, Scikit-Learn, or PyTorch. Convert them to a format compatible with serverless functions.
Example: Exporting a Scikit-Learn Model to ONNX
from sklearn.linear_model import LogisticRegression
import joblib
# Train a simple model
model = LogisticRegression()
X, y = [[0], [1], [2]], [0, 1, 1]
model.fit(X, y)
# Save the model
joblib.dump(model, "model.pkl")
🔹 Alternative: Use ONNX format for TensorFlow/PyTorch models.
import torch
import torch.onnx as onnx
model = torch.nn.Linear(1, 1)
onxx.export(model, torch.randn(1, 1), "model.onnx")
3. Deploying the Model on Vercel Serverless Functions
Step 1: Install Dependencies
Create a Next.js project and install Flask and NumPy (for Python-based inference):
npm install vercel flask numpy
api/predict.py
)
Step 2: Create a Serverless API Route (Inside the api/
directory, create predict.py
:
from flask import Flask, request, jsonify
import joblib
import numpy as np
app = Flask(__name__)
# Load model
model = joblib.load("model.pkl")
@app.route("/api/predict", methods=["POST"])
def predict():
data = request.get_json()
prediction = model.predict(np.array(data["input"]).reshape(-1, 1))
return jsonify({"prediction": prediction.tolist()})
if __name__ == "__main__":
app.run(debug=True)
vercel.json
Step 3: Configure {
"functions": {
"api/predict.py": {
"runtime": "python3.9"
}
}
}
Step 4: Deploy to Vercel
vercel deploy
Your API will be live at:
https://your-app.vercel.app/api/predict
4. Calling the API from a Next.js App
Modify your Next.js frontend to send requests to the ML model.
import { useState } from "react";
export default function PredictPage() {
const [input, setInput] = useState(0);
const [prediction, setPrediction] = useState(null);
async function getPrediction() {
const res = await fetch("/api/predict", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ input: [input] })
});
const data = await res.json();
setPrediction(data.prediction);
}
return (
<div>
<h1>ML Prediction</h1>
<input type="number" value={input} onChange={(e) => setInput(e.target.value)} />
<button onClick={getPrediction}>Predict</button>
{prediction && <p>Prediction: {prediction}</p>}
</div>
);
}
🔹 Benefit: Users can interact with the ML model in real-time.
5. Optimizing for Serverless Performance
- Reduce Model Size → Convert models to ONNX.
- Cache Predictions → Use Redis or local storage.
- Optimize Cold Starts → Keep the model lightweight.
- Batch Inference → Process multiple inputs in one request.
Conclusion
By deploying ML models on Vercel or serverless functions, you achieve scalable, real-time predictions with minimal infrastructure management.
Key Takeaways:
✅ Convert ML models to ONNX or lightweight formats.
✅ Deploy on Vercel using serverless functions.
✅ Optimize performance with caching and batch inference.
Support
If you found this guide helpful, consider sharing it with your network!