Rev for Pre
This commit is contained in:
@@ -1,24 +1,24 @@
|
||||
% 摘要-{中文}{英文}
|
||||
\Abstract{%
|
||||
论文摘要是对论文研究内容的高度概括,应体现论文工作的核心思想。博士学
|
||||
位论文的中文摘要一般约800~1200字;硕士学位论文的中文摘要一般约500字。摘
|
||||
要内容应涉及本项科研工作的目的和意义、研究思想和方法、研究成果和结论,博
|
||||
士学位论文应突出论文的创造性成果,硕士学位论文应突出论文的新见解。应具有
|
||||
独立性和自含性,即应是一篇简短但意义完整的文章。论文摘要中不要出现图片、
|
||||
图表、表格或其他插图材料。
|
||||
|
||||
论文的关键词,是为了文献标引工作从论文中选取出来用以表示全文主题内容
|
||||
信息的单词或术语,关键词一般为3~5个,按词条的外延层次排列(外延大的排在
|
||||
前面)。每个关键词之间用逗号间隔,最后一个关键词后不缀标点符号。
|
||||
|
||||
论文摘要的中文版与英文版文字内容要对应。从中文摘要开始编写页码并采用
|
||||
双面印刷。“Keywords”与中文摘要部分的关键词对应,每个关键词之间用逗号间隔。
|
||||
随着大语言模型在推理、问答、生成与多任务协同场景中的广泛应用,如何在有限参数预算下高效完成下游适配,已成为基础模型落地中的关键问题。现有参数高效微调方法大多对不同层、不同模块或不同参数子空间施加形式一致的更新,默认模型内部结构是均匀的,因而难以充分利用大语言模型在模块功能、表示维度和参数组织等方面普遍存在的结构异质性。围绕这一问题,本文以结构异质性分析为切入点,研究面向大语言模型的结构感知表征适配方法,构建了由统一分析框架到多层级方法设计的系统技术路线。
|
||||
|
||||
首先,本文从表示流与参数更新两个视角出发,将模型适配统一刻画为在结构角色约束下的非均匀调制过程,构建了结构感知调制的统一分析框架,并将结构异质性归纳为模块级功能异质性、维度级位置结构异质性、频谱级多尺度异质性和参数级容量分配异质性四类,进一步对应乘性调制、组合式调制和结构分解调制三种基本形式。
|
||||
|
||||
其次,围绕表示空间中的结构异质性,本文提出三类递进式方法:针对多任务场景中自注意力与前馈网络的功能分工差异,提出基于模块功能角色感知的混合上下文注意力调制方法 HyCAM,通过共享调制、专用调制与动态路由协同实现知识共享与任务特化;针对 RoPE 诱导的维度级位置结构异质性,提出静态选择性适配方法 RoSA,通过低频维度增强与动态层选择提升对关键位置结构的利用效率;在此基础上,进一步提出动态位置注意力调制方法 DyPAM,通过输入条件化的维度对调制以及头级、层级结构偏置,实现由静态选择向动态细粒度调制的扩展。
|
||||
|
||||
再次,围绕参数空间中的结构异质性,本文提出两类结构化适配方法:针对权重更新中的多尺度频谱结构,提出基于频谱级联的多尺度参数适配方法 CASCADE,通过 DCT 低频专家、小波高频专家与空域残差专家的协同建模,实现从全局平滑调整到局部精细修正的级联适配;针对多任务适配中的容量分配问题,提出基于共享-特有稀疏分解的多任务适配方法 MESSA,通过预算感知的软到硬结构学习,在统一参数预算下实现共享能力与任务特化能力之间的高效分配。
|
||||
|
||||
最后,本文在常识推理、数学推理和多任务联合适配等多类基准任务上进行了系统实验,覆盖 LLaMA、Qwen、Gemma、Mistral 等多个主流模型家族以及 0.5B 至 14B 的不同参数规模。实验结果表明,本文提出的五种方法在各自适用场景下均能稳定优于 LoRA、DoRA、AdaLoRA、FourierFT、MTLoRA 等代表性基线,验证了显式建模并利用大语言模型内部结构异质性对于提升参数高效适配性能与资源利用效率的有效性。本文的研究为大语言模型结构感知适配提供了统一的分析视角和系统的方法支撑。
|
||||
}{
|
||||
The abstract is a concise summary of the research content of the thesis, reflecting the core ideas of the work. For a doctoral dissertation, the Chinese abstract is typically around 800–1,200 words, while for a master's thesis, it is generally about 500 words. The abstract should address the purpose and significance of the research, the methodology and approach, as well as the key findings and conclusions. Doctoral dissertations should emphasize original contributions, while master's theses should highlight novel insights. The abstract must be self-contained and independent, functioning as a complete yet concise standalone text. Figures, charts, tables, or other illustrative materials should not appear in the abstract.
|
||||
|
||||
Keywords are terms or phrases selected from the thesis to represent the main thematic content for indexing purposes. Typically, 3–5 keywords are required, arranged in hierarchical order of scope (with broader terms listed first). Keywords are separated by semicolons, with no punctuation following the last keyword.
|
||||
|
||||
The Chinese and English versions of the abstract must align in content. Page numbering begins with the Chinese abstract, and the document should be printed double-sided. The "Keywords" section in the English abstract corresponds to the Chinese version, with terms similarly separated by semicolons.
|
||||
As large language models are increasingly deployed in reasoning, question answering, generation, and multi-task scenarios, parameter-efficient adaptation under limited budgets has become a central problem for practical use. Most existing parameter-efficient fine-tuning methods apply updates with largely uniform forms across layers, modules, or parameter subspaces, implicitly assuming structural homogeneity inside the model. Such designs cannot fully exploit the structural heterogeneity that widely exists in module functions, representation dimensions, and parameter organization. To address this problem, this dissertation studies structure-aware representation adaptation for large language models and develops a systematic technical route from a unified analytical framework to multi-level method design.
|
||||
|
||||
First, from the dual perspectives of representation flow and parameter updates, this dissertation formulates model adaptation as a non-uniform modulation process constrained by structural roles, and establishes a unified framework for structure-aware modulation. Structural heterogeneity is summarized into four types: module-level functional heterogeneity, dimension-level positional heterogeneity, spectrum-level multi-scale heterogeneity, and parameter-level capacity-allocation heterogeneity. These types are further associated with three basic modulation forms, namely multiplicative modulation, compositional modulation, and structural-decomposition modulation.
|
||||
|
||||
Second, for heterogeneity in the representation space, three progressively refined methods are proposed. HyCAM addresses the functional-role differences between self-attention and feed-forward networks in multi-task adaptation by coordinating shared modulation, task-specific modulation, and dynamic routing to balance knowledge sharing and task specialization. RoSA targets the dimension-level positional heterogeneity induced by RoPE, and improves the utilization of critical positional structure through selective low-frequency enhancement and dynamic layer selection. Building on this, DyPAM further extends static selection to fine-grained dynamic modulation by introducing input-conditioned modulation over RoPE-aligned dimension pairs together with head-level and layer-level structural biases.
|
||||
|
||||
Third, for heterogeneity in the parameter space, two structured adaptation methods are proposed. CASCADE addresses the multi-scale spectral structure of weight updates by combining a DCT-based low-frequency expert, a wavelet-based high-frequency expert, and a spatial residual expert, thereby realizing cascading adaptation from global smooth adjustment to local fine-grained correction. MESSA addresses capacity allocation in multi-task adaptation through a shared-specific sparse decomposition and a budget-aware soft-to-hard structure learning strategy, enabling efficient allocation between shared capability and task-specific capability under a unified parameter budget.
|
||||
|
||||
Finally, comprehensive experiments are conducted on commonsense reasoning, mathematical reasoning, and joint multi-task adaptation benchmarks, covering multiple mainstream model families including LLaMA, Qwen, Gemma, and Mistral, with model sizes ranging from 0.5B to 14B parameters. Experimental results show that the five proposed methods consistently outperform representative baselines such as LoRA, DoRA, AdaLoRA, FourierFT, and MTLoRA in their respective settings, validating the effectiveness of explicitly modeling and utilizing the structural heterogeneity inside large language models for improving adaptation performance and parameter efficiency. This dissertation provides both a unified analytical perspective and a systematic methodological foundation for structure-aware adaptation of large language models.
|
||||
}
|
||||
% 关键字-{中文}{英文}
|
||||
\Keyword{大语言模型,参数高效微调,结构异质性,表征适配,结构感知调制}{Large Language Model, Parameter-Efficient Fine-Tuning, Structural Heterogeneity, Representation Adaptation, Structure-Aware Modulation}
|
||||
\Keyword{大语言模型,参数高效微调,结构异质性,结构感知适配,表征适配}{Large Language Models, Parameter-Efficient Fine-Tuning, Structural Heterogeneity, Structure-Aware Adaptation, Representation Adaptation}
|
||||
|
||||
Reference in New Issue
Block a user