news 2026/2/11 15:30:25

【Qwen】train()函数说明

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
【Qwen】train()函数说明

train()函数文档

train(attn_implementation='flash_attention_2')

Runs the main training loop for Qwen VL (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE) instruction tuning.
Parses command-line arguments for model, data, and training config; loads the appropriate model class and processor; optionally applies LoRA or configures which modules to tune (vision encoder, MLP merger, LLM); builds the supervised data module and Hugging FaceTrainer, runs training (with optional resume), then saves the final model and processor tooutput_dir.

Parameters

NameTypeDefaultDescription
attn_implementationstr"flash_attention_2"Attention implementation passed to the model (e.g."flash_attention_2"for Flash Attention 2).

Command-line arguments (parsed viaHfArgumentParser)

  • ModelArguments

    • model_name_or_path(str) – HuggingFace model id or path (e.g.Qwen/Qwen2.5-VL-3B-Instruct,Qwen/Qwen3-VL-8B-Instruct). Used to select model class (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, or Qwen3-VL-MoE).
    • tune_mm_llm(bool) – Whether to train the language model (andlm_head).
    • tune_mm_mlp(bool) – Whether to train the vision merger (MLP).
    • tune_mm_vision(bool) – Whether to train the vision encoder.

  • DataArguments

    • dataset_use(str) – Comma-separated dataset names (with optional%Nsampling, e.g.dataset1%50).
    • data_flatten(bool) – Whether to flatten/concat batch sequences.
    • data_packing(bool) – Whether to use packed data (requires preprocessing withpack_data.py).
    • max_pixels(int) – Max image pixels (default28*28*576).
    • min_pixels(int) – Min image pixels (default28*28*16).
    • video_max_frames,video_min_frames,video_max_pixels,video_min_pixels,video_fps– Video sampling and resolution settings.
  • TrainingArguments(extendstransformers.TrainingArguments)

    • cache_dir(str, optional) – Cache directory for model/processor.
    • model_max_length(int) – Maximum sequence length for tokenizer.
    • lora_enable(bool) – IfTrue, apply LoRA and ignoretune_mm_*for the base model.
    • lora_r,lora_alpha,lora_dropout– LoRA rank, alpha, and dropout.
    • mm_projector_lr,vision_tower_lr– Optional learning rates for projector and vision tower.
    • Plus standard Trainer args:output_dir,bf16,per_device_train_batch_size,gradient_accumulation_steps,learning_rate,num_train_epochs,save_steps,gradient_checkpointing,deepspeed, etc.

Returns

None. Model and processor are saved undertraining_args.output_dir.

Notes

  • Ifoutput_diralready containscheckpoint-*directories, training is resumed withresume_from_checkpoint=True.
  • Whendata_flattenordata_packingis enabled, the Qwen2 VL attention class is replaced for compatibility.
  • Qwen3-VL MoE models useQwen3VLMoeForConditionalGeneration; other Qwen3-VL models useQwen3VLForConditionalGeneration; Qwen2.5-VL and Qwen2-VL use the corresponding classes inferred frommodel_name_or_path.

Example

# Typical usage: arguments are passed via command line (e.g. from scripts/sft_qwen3_4b.sh)torchrun --nproc_per_node=4qwenvl/train/train_qwen.py\--model_name_or_path Qwen/Qwen3-VL-8B-Instruct\--dataset_use my_dataset\--data_flatten True\--tune_mm_vision False --tune_mm_mlp True --tune_mm_llm True\--output_dir ./output\--bf16 --per_device_train_batch_size4--gradient_accumulation_steps4\--learning_rate 1e-5 --num_train_epochs0.5
# Programmatic call (still requires sys.argv or explicit parse for HfArgumentParser)fromqwenvl.train.train_qwenimporttrain train(attn_implementation="flash_attention_2")
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/2/10 12:42:53

全网最全 10个降AI率网站测评:专科生必备的降AI率工具推荐

在当前的学术写作环境中,AI生成内容(AIGC)已经成为高校和科研机构重点关注的问题。对于专科生而言,如何有效降低论文中的AI痕迹、提升原创性,同时确保语义通顺、逻辑清晰,是撰写高质量论文的关键步骤。AI降…

作者头像 李华
网站建设 2026/2/10 12:38:15

php python+vue在线考试系统设计与开发开题报告

目录 项目背景与意义技术选型依据系统功能模块关键技术实现创新点与难点预期成果 项目技术支持可定制开发之功能亮点源码获取详细视频演示 :文章底部获取博主联系方式!同行可合作 项目背景与意义 在线考试系统结合PHP、Python和Vue技术,旨在…

作者头像 李华
网站建设 2026/2/11 13:14:32

【小程序毕设源码分享】基于springboot+小程序的校园文化艺术展示app的设计与实现(程序+文档+代码讲解+一条龙定制)

博主介绍:✌️码农一枚 ,专注于大学生项目实战开发、讲解和毕业🚢文撰写修改等。全栈领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、小程序技术领域和毕业项目实战 ✌️技术范围:&am…

作者头像 李华
网站建设 2026/2/10 8:20:20

基于hadoop的电影推荐和分析系统设计和实现(设计源文件+万字报告+讲解)(支持资料、图片参考_相关定制)_文章底部可以扫码

项目2:基于hadoop的电影推荐和分析系统设计和实现(设计源文件万字报告讲解)(支持资料、图片参考_相关定制)_文章底部可以扫码简介: 本项目是一个基于Hadoop的电影推荐系统,专注于大数据环境下的推荐服务。系统通过MapReduce框架处…

作者头像 李华