How to Deploy a Local Embedding Model?

Background

In a privatized large model system, to avoid external dependencies, reduce the risk of data leakage, and enhance controllability, the Embedding model is deployed locally. Additionally, a standardized interface service is implemented to support external systems in invoking the Embedding functionality via HTTP requests, with fault tolerance, scalability, and concurrency support.

Tech Stack Choice

FastAPI + Uvicorn + EmbeddingModel (e5-large)

Project Structure (Local Embedding Model Wrapped with FastAPI)

embedding-service/
│
├── app/
│   ├── main.py          # FastAPI entry point
│   ├── model.py         # Model loading
│   ├── schemas.py       # Input/output schemas
│   ├── router.py        # Model routing
│   ├── services.py      # Request services
│   └── config.py        # Configuration
│
├── requirements.txt
└── run.sh

Detailed Explanation

config.py

Contains configuration settings for the embedding microservice, such as embedding_name, device, etc.

MODEL_NAME = "intfloat/e5-large"
DEVICE = "cpu"   # Change to "cuda" if you have a GPU

router.py

Defines API routes for the model: /embed, /embed_batch

router = APIRouter()

@router.get("/")
def root():
    return {"message": "Welcome to the E5 Embedding Service!"}  

@router.post("/embed", response_model=EmbeddingResponse)
def embed(request: EmbeddingRequest):
    vector = embed_query(request.text)
    return {"vector": vector}

@router.post("/embed_batch", response_model=BatchEmbeddingResponse)
def batch_embed(request: BatchEmbeddingRequest):
    vectors = embed_batch(request.texts, is_query=request.is_query)
    return {"vectors": vectors}

main.py

The local embedding model API interface, following the OpenAI API style at api/v1/embeddings

from fastapi import FastAPI
from .router import router

app = FastAPI(title="E5 Embedding Service")
app.include_router(router, prefix="api/v1/embeddings")

model.py

Embedding model structure.

from .config import MODEL_NAME, DEVICE
class EmbeddingModel:
    def __init__(self):
        self.model = SentenceTransformer(MODEL_NAME, device=DEVICE)

    def embed_query(self, text: str):
        # ⚠️ E5 requires adding a prefix
        text = "query: " + text
        return self.model.encode(text, normalize_embeddings=True).tolist()

    def embed_passage(self, text: str):
        text = "passage: " + text
        return self.model.encode(text, normalize_embeddings=True).tolist()

    def embed_batch(self, texts: list[str], is_query=False):
        ...
        ).tolist()


# Singleton (to avoid reloading multiple times)
embedding_model = EmbeddingModel()

schemas.py

Defines the input/output schemas for the Embedding model.

from pydantic import BaseModel
from typing import List

class EmbeddingRequest(BaseModel):
    text: str

class BatchEmbeddingRequest(BaseModel):
    texts: List[str]
    is_query: bool = False

class EmbeddingResponse(BaseModel):
    vector: List[float]

class BatchEmbeddingResponse(BaseModel):
    vectors: List[List[float]]

services.py

Defines embedding model request methods.

from .model import embedding_model

def embed_query(text: str):
    return embedding_model.embed_query(text)

def embed_passage(text: str):
    return embedding_model.embed_passage(text)

def embed_batch(texts: list[str], is_query: bool = True):
    return embedding_model.embed_batch(texts, is_query)

run.sh

Command to start the project with Uvicorn for faster startup:

uvicorn app.main:app --reload

Summary

In local model deployment, whether it’s the Embedding model, reranker model, or local LLM model, the project structure remains consistent. This article focuses on explaining the differences in each component.

Future Work

The next step will cover high concurrency modules, which I will discuss in future articles.

关于作者

我是Louis,一名长期从事iOS与AI相关工程实践的工程师,也是一个正在探索产品与商业可能性的准创始人.

这里的文章,更多是我在项目中用过,踩过坑,反复验证过的东西,而不是为了流量而写的“快内容”.

☕ 打赏

如果这篇文章对你有帮助,欢迎请我喝一杯咖啡☕️

PayPal
https://www.paypal.me/luochuan188

PayPay

You can support my work via PayPay by searching my PayPay ID:

PayPay ID: luochuan

微信支付

支付宝

你的支持会让我有更多时间,把真实项目中的经验持续整理和分享出来.

不打赏也完全没关系,感谢你读到这里.

联系与合作

如果你:

· 正在做iOS App / AI / 自动化相关的项目

· 对技术选型、架构设计、产品落地有困惑

· 或希望进行技术交流、合作探讨

欢迎通过以下邮箱联系我:

luochuanad@gmail.com

AI Infra Layer