AIインフラ層の資源スケジューリング戦略とは？ - Louis

背景

プライベート大規模モデルシステムにおいて、Embeddingモデル、Rerankerモデル、LLMをローカルにデプロイし、高並列バッチ処理を実現しました。

設定 (config.py)

複数のサービス（Embeddingモデル、Rerankerモデル、LLMなど）をローカルにデプロイした場合、ユーザーがこれらのマイクロサービスにリクエストを送る際に、どのように config.py を使ってリソーススケジューリング戦略を実現すれば良いでしょうか？

区別

サービス	特徴	パラメータ調整戦略
embedding	高スループット、軽量計算	大きなバッチ + 多数ワーカー
reranker	中程度の計算負荷	中サイズのバッチ
LLM	超重量級計算	小バッチ + レート制限

設定パラメータ

以下はローカルモデルの設定パラメータの例です:

LLM_SERVICE_CONFIG = {

    "llm_model_name": "xxx",
    
    "batch_size": 4,
    "batch_timeout": 0.05,

    "max_queue_size": 50,
    "worker_count": 1,

    "queue_timeout": 0.2,
    "inference_timeout": 2.0,
    "total_timeout": 3.0,

    "rate_limit": 10,
    "enable_cache": False,

}

エンタープライズアーキテクチャ (config.yaml)

分離設計（Model Layer / Service Layer の分割）

models:
  embedding_v1:
    type: embedding
    model_name: "xxxxxemdeddingModelxxx"
    device: "cpu"

  reranker_v1:
    type: reranker
    model_name: "xxxxrerankerxxxx"

  llm_v1:
    type: llm
    model_name: "xxx"  

services:
  embedding_service:
    model: embedding_v1
    
    runtime:
      batch_size: 64
      batch_timeout: 0.01
      max_queue_size: 500
      worker_count: 4
      queue_timeout: 0.05
      inference_timeout: 0.3
      total_timeout: 0.5
      rate_limit: 100
      enable_cache: true

  reranker_service:
    model: reranker_v1
    
    runtime:
      batch_size: 16
      batch_timeout: 0.02
      max_queue_size: 200
      worker_count: 2
      queue_timeout: 0.1
      inference_timeout: 0.5
      total_timeout: 0.8
      rate_limit: 50
      enable_cache: false

  llm_service:
    model: llm_v1
    
    runtime:
      batch_size: 4
      batch_timeout: 0.05
      max_queue_size: 50
      worker_count: 1
      queue_timeout: 0.2
      inference_timeout: 2.0
      total_timeout: 3.0
      rate_limit: 10
      enable_cache: false

关于作者

我是Louis,一名长期从事iOS与AI相关工程实践的工程师,也是一个正在探索产品与商业可能性的准创始人.

这里的文章,更多是我在项目中用过,踩过坑,反复验证过的东西,而不是为了流量而写的“快内容”.

☕ 打赏

如果这篇文章对你有帮助,欢迎请我喝一杯咖啡☕️

PayPal
https://www.paypal.me/luochuan188

PayPay

You can support my work via PayPay by searching my PayPay ID:

PayPay ID: luochuan

微信支付

支付宝

你的支持会让我有更多时间,把真实项目中的经验持续整理和分享出来.

不打赏也完全没关系,感谢你读到这里.

联系与合作

如果你:

· 正在做iOS App / AI / 自动化相关的项目

· 对技术选型、架构设计、产品落地有困惑

· 或希望进行技术交流、合作探讨

欢迎通过以下邮箱联系我:

luochuanad@gmail.com

背景

設定 (config.py)

エンタープライズアーキテクチャ (config.yaml)

CATALOG

关于作者

☕ 打赏

联系与合作