nlp_structbert_siamese-uninlu_chinese-base部署教程：SLURM集群批量部署+任务分发调度配置-育师

nlp_structbert_siamese-uninlu_chinese-base部署教程：SLURM集群批量部署+任务分发调度配置

1. 模型定位与核心价值

nlp_structbert_siamese-uninlu_chinese-base 是一个专为中文场景优化的通用自然语言理解特征提取模型。它不是简单套用现成架构，而是基于StructBERT结构进行二次构建，重点强化了语义结构建模能力。这个模型最特别的地方在于——它不为每个任务单独训练一个模型，而是用一套统一框架处理多种NLU任务。

你可能遇到过这样的问题：做命名实体识别要一个模型，做情感分析又要换另一个，关系抽取还得重新适配。每次都要调数据、改代码、等训练，效率很低。而SiameseUniNLU换了一种思路：用“提示（Prompt）+文本”的方式，把不同任务都变成同一个形式的问题。比如你想抽人名和地点，就告诉模型：“请找出文本中的人物和地理位置”；想判断情感，就写：“请判断这段话的情感是正向还是负向”。模型内部通过指针网络自动定位答案在原文中的位置，真正实现“一模型多用”。

这种设计带来的实际好处很实在：部署一次，就能支持命名实体识别、关系抽取、事件抽取、属性情感抽取、情感分类、文本分类、文本匹配、自然语言推理、阅读理解等九类任务。对团队来说，意味着模型管理成本大幅下降，服务接口更统一，后续扩展新任务也更容易。

2. SLURM集群环境准备与基础依赖安装

2.1 集群节点角色划分建议

在真实生产环境中，我们不建议所有功能都堆在一个节点上。SLURM集群天然适合分工协作，推荐按以下角色部署：

登录节点（Login Node）：仅用于提交作业、查看状态、上传文件，不运行模型服务
计算节点（Compute Nodes）：实际承载模型服务的GPU节点，数量根据并发量决定（建议至少2台起）
调度节点（Scheduler Node）：运行SLURM的slurmctld服务，负责任务分发（通常与登录节点合一）

确认各节点已安装SLURM并正常通信：

# 在登录节点执行，检查集群状态 sinfo -l # 查看可用GPU节点（假设使用NVIDIA GPU） scontrol show node | grep -E "(NodeName|Gres)"

2.2 统一环境初始化（所有计算节点执行）

为避免各节点Python环境不一致导致模型加载失败，我们采用conda统一管理：

# 安装Miniconda（如未安装） wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3 source $HOME/miniconda3/bin/activate conda init bash # 创建专用环境（注意：必须指定python=3.9，因Transformers 4.35+要求） conda create -n uninlu-env python=3.9 -y conda activate uninlu-env # 安装核心依赖（PyTorch需匹配CUDA版本） # 假设集群使用CUDA 11.8（请根据nvidia-smi输出确认） pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 pip install transformers==4.35.2 datasets==2.15.0 scikit-learn==1.3.2 requests==2.31.0 # 验证GPU可用性 python -c "import torch; print(f'GPU可用: {torch.cuda.is_available()}'); print(f'GPU数量: {torch.cuda.device_count()}')"

关键提醒：所有计算节点必须使用完全相同的conda环境名称（uninlu-env）和Python版本。SLURM作业脚本中将直接调用该环境，名称不一致会导致任务启动失败。

3. 模型文件分发与缓存预热

3.1 模型文件集中存储与同步

模型文件390MB较大，若每台计算节点单独下载会浪费带宽且易出错。推荐使用共享存储（如NFS）或批量分发：

方案A：NFS共享（推荐）

# 在调度节点挂载NFS（假设NFS服务器IP为192.168.1.100） sudo mkdir -p /mnt/ai-models sudo mount -t nfs 192.168.1.100:/export/ai-models /mnt/ai-models # 创建软链接指向统一路径（所有节点执行） sudo ln -sf /mnt/ai-models/iic /root/ai-models/iic

方案B：rsync批量分发（无NFS时）

# 在登录节点准备模型包 cd /root tar -czf nlp_structbert_siamese-uninlu_chinese-base.tar.gz nlp_structbert_siamese-uninlu_chinese-base/ # 分发到所有计算节点（替换node[01-04]为你的节点名） for node in node01 node02 node03 node04; do echo "正在分发到 $node..." scp nlp_structbert_siamese-uninlu_chinese-base.tar.gz $node:/root/ ssh $node "tar -xzf /root/nlp_structbert_siamese-uninlu_chinese-base.tar.gz -C /root/" done

3.2 缓存预热：避免首次请求延迟过高

模型首次加载需解压权重、构建计算图，耗时可达30秒以上。我们通过预热脚本在服务启动前完成：

# 创建预热脚本 /root/nlp_structbert_siamese-uninlu_chinese-base/warmup.py cat > /root/nlp_structbert_siamese-uninlu_chinese-base/warmup.py << 'EOF' import torch from transformers import AutoModel, AutoTokenizer model_path = "/root/ai-models/iic/nlp_structbert_siamese-uninlu_chinese-base" print("正在加载模型...") model = AutoModel.from_pretrained(model_path, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) # 构造简单输入进行前向传播 text = "今天天气很好" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) with torch.no_grad(): outputs = model(**inputs) print("预热完成！") EOF # 在所有计算节点执行预热（使用conda环境） for node in $(sinfo -h -r -o "%N" | tr '\n' ' '); do ssh $node "source ~/miniconda3/bin/activate && conda activate uninlu-env && python /root/nlp_structbert_siamese-uninlu_chinese-base/warmup.py" done

4. SLURM作业脚本编写与服务启动

4.1 单节点服务启动脚本（slurm_uninlu.sh）

#!/bin/bash #SBATCH --job-name=uninlu-service #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=4 #SBATCH --mem=16G #SBATCH --time=24:00:00 #SBATCH --output=/root/nlp_structbert_siamese-uninlu_chinese-base/slurm-%j.out #SBATCH --error=/root/nlp_structbert_siamese-uninlu_chinese-base/slurm-%j.err #SBATCH --requeue # 加载环境 source ~/miniconda3/bin/activate conda activate uninlu-env # 切换到模型目录 cd /root/nlp_structbert_siamese-uninlu_chinese-base # 启动服务（后台运行，日志重定向） nohup python3 app.py > server.log 2>&1 & # 记录PID便于后续管理 echo $! > service.pid # 等待服务端口就绪（最多等待60秒） timeout 60 bash -c 'until nc -z localhost 7860; do sleep 1; done' # 输出服务状态 echo "服务已启动，监听端口 7860" echo "日志路径: /root/nlp_structbert_siamese-uninlu_chinese-base/server.log"

4.2 批量部署所有计算节点

# 获取所有空闲GPU节点列表 NODES=$(sinfo -h -r -o "%N" | head -n 5) # 限制最多5个节点 # 为每个节点提交独立作业 for node in $NODES; do echo "正在为节点 $node 提交服务作业..." sbatch --nodelist=$node slurm_uninlu.sh done # 查看提交结果 squeue -u $USER -n uninlu-service

为什么不用--ntasks？
因为每个节点需要独立运行一个Web服务实例（监听7860端口），而非MPI并行任务。使用--nodelist精确指定节点，确保资源隔离。

5. 任务分发调度配置与负载均衡

5.1 Nginx反向代理实现请求分发

单靠SLURM启动服务还不够——用户需要一个统一入口。我们在登录节点部署Nginx，将请求轮询分发到各计算节点：

# 安装Nginx（登录节点执行） sudo apt update && sudo apt install nginx -y # 配置上游服务器（替换node01,node02为你的实际节点IP） sudo tee /etc/nginx/conf.d/uninlu.conf > /dev/null << 'EOF' upstream uninlu_backend { least_conn; server 192.168.1.101:7860 max_fails=3 fail_timeout=30s; server 192.168.1.102:7860 max_fails=3 fail_timeout=30s; server 192.168.1.103:7860 max_fails=3 fail_timeout=30s; } server { listen 7860; server_name _; location / { proxy_pass http://uninlu_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 300; proxy_connect_timeout 300; } location /api/ { proxy_pass http://uninlu_backend/api/; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 300; proxy_connect_timeout 300; } } EOF # 重启Nginx sudo systemctl restart nginx

现在用户只需访问http://YOUR_LOGIN_NODE_IP:7860，Nginx会自动将请求分发到负载最低的计算节点，实现真正的高可用。

5.2 SLURM作业级任务调度（进阶场景）

当需要处理大量离线批处理任务（如每天分析10万条客服对话）时，可绕过Web服务，直接用SLURM调度Python脚本：

# 创建批处理脚本 batch_process.py cat > /root/nlp_structbert_siamese-uninlu_chinese-base/batch_process.py << 'EOF' import json import requests import sys from pathlib import Path # 从命令行读取输入文件路径 input_file = sys.argv[1] output_file = sys.argv[2] # 读取待处理文本 with open(input_file, 'r', encoding='utf-8') as f: texts = [line.strip() for line in f if line.strip()] # 调用本地服务（注意：此处调用本节点服务，非Nginx） url = "http://localhost:7860/api/predict" results = [] for i, text in enumerate(texts): # 示例：执行命名实体识别 payload = { "text": text, "schema": '{"人物": null, "地理位置": null}' } try: resp = requests.post(url, json=payload, timeout=60) results.append(resp.json()) except Exception as e: results.append({"error": str(e), "text": text}) if i % 10 == 0: print(f"已完成 {i}/{len(texts)}") # 保存结果 with open(output_file, 'w', encoding='utf-8') as f: json.dump(results, f, ensure_ascii=False, indent=2) print(f"处理完成，结果已保存至 {output_file}") EOF # 创建SLURM批处理作业脚本 cat > /root/nlp_structbert_siamese-uninlu_chinese-base/batch_job.sh << 'EOF' #!/bin/bash #SBATCH --job-name=uninlu-batch #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=2 #SBATCH --mem=8G #SBATCH --time=02:00:00 #SBATCH --output=batch-%j.out #SBATCH --error=batch-%j.err source ~/miniconda3/bin/activate conda activate uninlu-env cd /root/nlp_structbert_siamese-uninlu_chinese-base python batch_process.py input.txt output.json EOF # 提交批处理任务（自动选择空闲GPU节点） sbatch batch_job.sh

6. 服务监控与故障自愈机制

6.1 健康检查脚本（health_check.sh）

#!/bin/bash # 检查本节点服务是否存活，若宕机则自动重启 SERVICE_URL="http://localhost:7860/health" if curl -s --head --fail "$SERVICE_URL" >/dev/null; then echo "$(date): 服务正常" exit 0 else echo "$(date): 服务异常，正在重启..." # 杀死旧进程 pkill -f "app.py" # 清理残留日志 rm -f server.log # 重启服务 cd /root/nlp_structbert_siamese-uninlu_chinese-base nohup python3 app.py > server.log 2>&1 & # 等待端口就绪 timeout 60 bash -c 'until nc -z localhost 7860; do sleep 1; done' echo "$(date): 服务已重启" fi

6.2 设置定时健康检查（所有计算节点）

# 添加到crontab，每5分钟检查一次 (crontab -l 2>/dev/null; echo "*/5 * * * * /root/nlp_structbert_siamese-uninlu_chinese-base/health_check.sh >> /root/nlp_structbert_siamese-uninlu_chinese-base/health.log 2>&1") | crontab -

6.3 关键指标监控（集成Prometheus）

在app.py中添加简易健康端点（如未提供）：

# 在app.py的Flask应用中追加 @app.route('/health') def health_check(): return jsonify({ "status": "healthy", "timestamp": int(time.time()), "model_loaded": True, "gpu_available": torch.cuda.is_available() })

然后配置Prometheus抓取，即可在Grafana中监控各节点服务状态、响应延迟、错误率等。