执行以下命令合并 LoRA 训练的模型:
python src/export_model_lora.py \ --model_name_or_path /home/ctq/Huggingface/Qwen2.5-1.5B-Instruct \ --adapter_name_or_path /home/ctq/Pyproject/LLaMA-Factory/outmoxi/Qwen2.5-1.5B-1 \ --template qwen \ --export_dir ./outmoxi/Qwen2.5-1.5B-lora
通用格式:
python src/export_model_lora.py \ --model_name_or_path (原始模型路径) \ --adapter_name_or_path (训练后 LoRA 模型路径) \ --template qwen \ --export_dir (合并输出模型路径)
激活llamafactory环境并运行训练脚本:
conda activate llamafactory bash qw15.sh
切换到vllm环境并启动 VLLM 服务器:
conda activate vllm vllm serve /home/ctq/Huggingface/Qwen2.5-1.5B-Instruct
export http_proxy=192.168.2.218:2023 export https_proxy=192.168.2.218:2023
取消代理:
unset http_proxy unset https_proxy
python src/webui.py --model_name_or_path /home/ctq/Huggingface/Qwen2-VL-7B-Instruct
前往 ModelScope 下载模型:
modelscope download Qwen/Qwen2.5-3B-Instruct --local_dir Qwen2.5-3B-Instruct
或者使用通用格式:
modelscope download [模型地址] --local_dir [模型名称]
CUDA_VISIBLE_DEVICES=0 python src/train.py \ --model_name_or_path /home/ctq/Huggingface/DeepSeek-R1-Distill-Qwen-1.5B \ --stage sft \ --do_train \ --finetuning_type full \ --dataset ctqtest \ --template deepseek3 \ --cutoff_len 4096 \ --overwrite_cache \ --preprocessing_num_workers 16 \ --output_dir ./outmoxi/DeepSeek-R1-Distill-Qwen-1.5B \ --logging_steps 1 \ --save_steps 10 \ --plot_loss true \ --overwrite_output_dir yes \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 4 \ --learning_rate 1e-5 \ --warmup_ratio 0.05 \ --num_train_epochs 3.0 \ --lr_scheduler_type cosine \ --fp16 true \ --eval_steps 10 \ --val_size 0.01 \ --per_device_eval_batch_size 2 \ --evaluation_strategy steps \ --load_best_model_at_end true
CUDA_VISIBLE_DEVICES=0 python src/train.py \ --model_name_or_path /home/ctq/Huggingface/DeepSeek-R1-Distill-Qwen-1.5B \ --stage sft \ --do_train \ --finetuning_type lora \ --dataset ctqtest \ --template deepseek3 \ --cutoff_len 4096 \ --overwrite_cache \ --preprocessing_num_workers 16 \ --output_dir ./outmoxi/DeepSeek-R1-Distill-Qwen-1.5B \ --logging_steps 1 \ --save_steps 10 \ --plot_loss true \ --overwrite_output_dir yes \ --per_device_train_batch_size 8 \ --gradient_accumulation_steps 4 \ --learning_rate 1e-5 \ --warmup_ratio 0.05 \ --num_train_epochs 3.0 \ --lr_scheduler_type cosine \ --fp16 true \ --eval_steps 10 \ --val_size 0.01 \ --per_device_eval_batch_size 2 \ --evaluation_strategy steps \ --load_best_model_at_end true