执行以下命令合并 LoRA 训练的模型:
python src/export_model_lora.py \
--model_name_or_path /home/ctq/Huggingface/Qwen2.5-1.5B-Instruct \
--adapter_name_or_path /home/ctq/Pyproject/LLaMA-Factory/outmoxi/Qwen2.5-1.5B-1 \
--template qwen \
--export_dir ./outmoxi/Qwen2.5-1.5B-lora
通用格式:
python src/export_model_lora.py \
--model_name_or_path (原始模型路径) \
--adapter_name_or_path (训练后 LoRA 模型路径) \
--template qwen \
--export_dir (合并输出模型路径)
激活llamafactory环境并运行训练脚本:
conda activate llamafactory bash qw15.sh
切换到vllm环境并启动 VLLM 服务器:
conda activate vllm vllm serve /home/ctq/Huggingface/Qwen2.5-1.5B-Instruct
export http_proxy=192.168.2.218:2023 export https_proxy=192.168.2.218:2023
取消代理:
unset http_proxy unset https_proxy
python src/webui.py --model_name_or_path /home/ctq/Huggingface/Qwen2-VL-7B-Instruct
前往 ModelScope 下载模型:
modelscope download Qwen/Qwen2.5-3B-Instruct --local_dir Qwen2.5-3B-Instruct
或者使用通用格式:
modelscope download [模型地址] --local_dir [模型名称]
CUDA_VISIBLE_DEVICES=0 python src/train.py \
--model_name_or_path /home/ctq/Huggingface/DeepSeek-R1-Distill-Qwen-1.5B \
--stage sft \
--do_train \
--finetuning_type full \
--dataset ctqtest \
--template deepseek3 \
--cutoff_len 4096 \
--overwrite_cache \
--preprocessing_num_workers 16 \
--output_dir ./outmoxi/DeepSeek-R1-Distill-Qwen-1.5B \
--logging_steps 1 \
--save_steps 10 \
--plot_loss true \
--overwrite_output_dir yes \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 1e-5 \
--warmup_ratio 0.05 \
--num_train_epochs 3.0 \
--lr_scheduler_type cosine \
--fp16 true \
--eval_steps 10 \
--val_size 0.01 \
--per_device_eval_batch_size 2 \
--evaluation_strategy steps \
--load_best_model_at_end true
CUDA_VISIBLE_DEVICES=0 python src/train.py \
--model_name_or_path /home/ctq/Huggingface/DeepSeek-R1-Distill-Qwen-1.5B \
--stage sft \
--do_train \
--finetuning_type lora \
--dataset ctqtest \
--template deepseek3 \
--cutoff_len 4096 \
--overwrite_cache \
--preprocessing_num_workers 16 \
--output_dir ./outmoxi/DeepSeek-R1-Distill-Qwen-1.5B \
--logging_steps 1 \
--save_steps 10 \
--plot_loss true \
--overwrite_output_dir yes \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 1e-5 \
--warmup_ratio 0.05 \
--num_train_epochs 3.0 \
--lr_scheduler_type cosine \
--fp16 true \
--eval_steps 10 \
--val_size 0.01 \
--per_device_eval_batch_size 2 \
--evaluation_strategy steps \
--load_best_model_at_end true