Published on

ML Model Hosting: Deploying Simple ML Models on Vercel or Serverless Functions

Authors

Introduction

Deploying Machine Learning (ML) models on Vercel or serverless functions enables scalable, low-latency predictions without managing infrastructure. This guide covers:

  • Packaging an ML model.
  • Deploying it as a serverless API.
  • Using it from a Next.js app.

1. Choosing the Right Model for Deployment

Serverless platforms work best with lightweight models due to execution time limits. Suitable models include:

✔️ Linear Regression (e.g., sales forecasting).
✔️ Image Classification (lightweight CNN models).
✔️ Sentiment Analysis (text-based models like Naïve Bayes).


2. Converting ML Models to a Deployable Format

Most ML models are trained using Python frameworks like TensorFlow, Scikit-Learn, or PyTorch. Convert them to a format compatible with serverless functions.

Example: Exporting a Scikit-Learn Model to ONNX

from sklearn.linear_model import LogisticRegression
import joblib

# Train a simple model
model = LogisticRegression()
X, y = [[0], [1], [2]], [0, 1, 1]
model.fit(X, y)

# Save the model
joblib.dump(model, "model.pkl")

🔹 Alternative: Use ONNX format for TensorFlow/PyTorch models.

import torch
import torch.onnx as onnx

model = torch.nn.Linear(1, 1)
onxx.export(model, torch.randn(1, 1), "model.onnx")

3. Deploying the Model on Vercel Serverless Functions

Step 1: Install Dependencies

Create a Next.js project and install Flask and NumPy (for Python-based inference):

npm install vercel flask numpy

Step 2: Create a Serverless API Route (api/predict.py)

Inside the api/ directory, create predict.py:

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model
model = joblib.load("model.pkl")

@app.route("/api/predict", methods=["POST"])
def predict():
    data = request.get_json()
    prediction = model.predict(np.array(data["input"]).reshape(-1, 1))
    return jsonify({"prediction": prediction.tolist()})

if __name__ == "__main__":
    app.run(debug=True)

Step 3: Configure vercel.json

{
  "functions": {
    "api/predict.py": {
      "runtime": "python3.9"
    }
  }
}

Step 4: Deploy to Vercel

vercel deploy

Your API will be live at:

https://your-app.vercel.app/api/predict

4. Calling the API from a Next.js App

Modify your Next.js frontend to send requests to the ML model.

import { useState } from "react";

export default function PredictPage() {
  const [input, setInput] = useState(0);
  const [prediction, setPrediction] = useState(null);

  async function getPrediction() {
    const res = await fetch("/api/predict", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ input: [input] })
    });
    const data = await res.json();
    setPrediction(data.prediction);
  }

  return (
    <div>
      <h1>ML Prediction</h1>
      <input type="number" value={input} onChange={(e) => setInput(e.target.value)} />
      <button onClick={getPrediction}>Predict</button>
      {prediction && <p>Prediction: {prediction}</p>}
    </div>
  );
}

🔹 Benefit: Users can interact with the ML model in real-time.


5. Optimizing for Serverless Performance

  • Reduce Model Size → Convert models to ONNX.
  • Cache Predictions → Use Redis or local storage.
  • Optimize Cold Starts → Keep the model lightweight.
  • Batch Inference → Process multiple inputs in one request.

Conclusion

By deploying ML models on Vercel or serverless functions, you achieve scalable, real-time predictions with minimal infrastructure management.

Key Takeaways:

✅ Convert ML models to ONNX or lightweight formats.
✅ Deploy on Vercel using serverless functions.
✅ Optimize performance with caching and batch inference.

Support

If you found this guide helpful, consider sharing it with your network!

License

MIT