<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.d-ai.co/index.php?action=history&amp;feed=atom&amp;title=Instruction_Tuning</id>
	<title>Instruction Tuning - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.d-ai.co/index.php?action=history&amp;feed=atom&amp;title=Instruction_Tuning"/>
	<link rel="alternate" type="text/html" href="https://wiki.d-ai.co/index.php?title=Instruction_Tuning&amp;action=history"/>
	<updated>2026-06-18T08:58:05Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://wiki.d-ai.co/index.php?title=Instruction_Tuning&amp;diff=17&amp;oldid=prev</id>
		<title>Whale at 07:10, 15 December 2025</title>
		<link rel="alternate" type="text/html" href="https://wiki.d-ai.co/index.php?title=Instruction_Tuning&amp;diff=17&amp;oldid=prev"/>
		<updated>2025-12-15T07:10:58Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:10, 15 December 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Instruction tuning&#039;&#039;&#039; is a technique used in the training of Large Language Models &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;([[LLM&lt;/del&gt;]]&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;s&lt;/del&gt;) to improve their ability to follow natural language instructions. While [[Pre-training|pre-training]] enables a model to predict the next token in a sequence based on vast amounts of text data, it does not inherently teach the model to act as a helpful assistant or adhere to specific user commands. Instruction tuning bridges this gap by [[Fine-tuning|fine-tuning]] the pre-trained model on a dataset of instruction-output pairs.&amp;lt;ref&amp;gt;IBM, &quot;What Is Instruction Tuning?&quot;, accessed 2025-12-15, https://www.ibm.com/think/topics/instruction-tuning&amp;lt;/ref&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&#039;&#039;&#039;Instruction tuning&#039;&#039;&#039; is a technique used in the training of &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[&lt;/ins&gt;Large Language Models]] &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(LLMs&lt;/ins&gt;) to improve their ability to follow natural language instructions. While [[Pre-training|pre-training]] enables a model to predict the next token in a sequence based on vast amounts of text data, it does not inherently teach the model to act as a helpful assistant or adhere to specific user commands. Instruction tuning bridges this gap by [[Fine-tuning|fine-tuning]] the pre-trained model on a dataset of instruction-output pairs.&amp;lt;ref&amp;gt;IBM, &quot;What Is Instruction Tuning?&quot;, accessed 2025-12-15, https://www.ibm.com/think/topics/instruction-tuning&amp;lt;/ref&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Overview ==  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Overview ==  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Whale</name></author>
	</entry>
	<entry>
		<id>https://wiki.d-ai.co/index.php?title=Instruction_Tuning&amp;diff=12&amp;oldid=prev</id>
		<title>Whale: Created page with &quot;&#039;&#039;&#039;Instruction tuning&#039;&#039;&#039; is a technique used in the training of Large Language Models (LLMs) to improve their ability to follow natural language instructions. While pre-training enables a model to predict the next token in a sequence based on vast amounts of text data, it does not inherently teach the model to act as a helpful assistant or adhere to specific user commands. Instruction tuning bridges this gap by fine-tuning the pre-tra...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.d-ai.co/index.php?title=Instruction_Tuning&amp;diff=12&amp;oldid=prev"/>
		<updated>2025-12-15T06:53:54Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;Instruction tuning&amp;#039;&amp;#039;&amp;#039; is a technique used in the training of Large Language Models (&lt;a href=&quot;/index.php?title=LLM&amp;amp;action=edit&amp;amp;redlink=1&quot; class=&quot;new&quot; title=&quot;LLM (page does not exist)&quot;&gt;LLMs&lt;/a&gt;) to improve their ability to follow natural language instructions. While &lt;a href=&quot;/wiki/Pre-training&quot; title=&quot;Pre-training&quot;&gt;pre-training&lt;/a&gt; enables a model to predict the next token in a sequence based on vast amounts of text data, it does not inherently teach the model to act as a helpful assistant or adhere to specific user commands. Instruction tuning bridges this gap by &lt;a href=&quot;/index.php?title=Fine-tuning&amp;amp;action=edit&amp;amp;redlink=1&quot; class=&quot;new&quot; title=&quot;Fine-tuning (page does not exist)&quot;&gt;fine-tuning&lt;/a&gt; the pre-tra...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;Instruction tuning&amp;#039;&amp;#039;&amp;#039; is a technique used in the training of Large Language Models ([[LLM]]s) to improve their ability to follow natural language instructions. While [[Pre-training|pre-training]] enables a model to predict the next token in a sequence based on vast amounts of text data, it does not inherently teach the model to act as a helpful assistant or adhere to specific user commands. Instruction tuning bridges this gap by [[Fine-tuning|fine-tuning]] the pre-trained model on a dataset of instruction-output pairs.&amp;lt;ref&amp;gt;IBM, &amp;quot;What Is Instruction Tuning?&amp;quot;, accessed 2025-12-15, https://www.ibm.com/think/topics/instruction-tuning&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Overview == &lt;br /&gt;
The primary goal of instruction tuning is to align the model&amp;#039;s behavior with human intent. A standard pre-trained LLM might respond to the prompt &amp;quot;Explain the theory of relativity&amp;quot; by generating a continuation like &amp;quot;was proposed by Albert Einstein in 1905,&amp;quot; rather than providing the explanation requested. By training the model on examples where the input is an instruction (e.g., &amp;quot;Summarize this text&amp;quot;) and the output is the desired response, the model learns to interpret and execute the user&amp;#039;s intent.&amp;lt;ref&amp;gt;GeeksforGeeks, &amp;quot;Instruction Tuning for Large Language Models&amp;quot;, accessed 2025-12-15, https://www.geeksforgeeks.org/artificial-intelligence/instruction-tuning-for-large-language-models/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This process is often considered a critical step in &amp;quot;alignment,&amp;quot; or ensuring AI systems behave in accordance with human values and expectations, and serves as a precursor to more advanced techniques like [[Reinforcement Learning from Human Feedback]] (RLHF).&amp;lt;ref&amp;gt;Ouyang, L., et al., &amp;quot;Training language models to follow instructions with human feedback&amp;quot;, accessed 2025-12-15, https://arxiv.org/abs/2203.02155&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Methodology == &lt;br /&gt;
Instruction tuning typically follows [[Supervised Learning|supervised learning]] paradigms. The process involves compiling a dataset where each example consists of:&lt;br /&gt;
&lt;br /&gt;
An instruction: A natural language command describing the task (e.g., &amp;quot;Translate the following sentence into French&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
An input (optional): The context or data to operate on (e.g., &amp;quot;The cat sat on the mat&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
An output: The target response (e.g., &amp;quot;Le chat s&amp;#039;est assis sur le tapis&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Datasets&lt;br /&gt;
Early instruction tuning relied on datasets like FLAN (Finetuned Language Net), which aggregated existing [[Natural Language Processing]] (NLP) tasks—such as translation, summarization, and reading comprehension—and converted them into instruction formats.&amp;lt;ref&amp;gt;Wei, J., et al., &amp;quot;Finetuned Language Models Are Zero-Shot Learners&amp;quot;, accessed 2025-12-15, https://research.google/pubs/finetuned-language-models-are-zero-shot-learners/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Later approaches, such as Stanford Alpaca, demonstrated that high-quality instruction data could be synthesized by prompting a stronger teacher model (like [[GPT-3]]) to generate diverse instruction-response pairs, significantly reducing the cost of data collection.&amp;lt;ref&amp;gt;Taori, R., et al., &amp;quot;Stanford Alpaca: An Instruction-following LLaMA Model&amp;quot;, accessed 2025-12-15, https://github.com/tatsu-lab/stanford_alpaca&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;Less is More&amp;quot; Hypothesis&lt;br /&gt;
Research has suggested that the quantity of instruction data may be less important than its quality. The LIMA (Less Is More for Alignment) study proposed the &amp;quot;Superficial Alignment Hypothesis,&amp;quot; suggesting that an LLM acquires most of its knowledge during pre-training. Consequently, instruction tuning serves mainly to teach the model the specific format or style of interaction, achievable with as few as 1,000 carefully curated examples.&amp;lt;ref&amp;gt;Zhou, C., et al., &amp;quot;LIMA: Less Is More for Alignment&amp;quot;, accessed 2025-12-15, https://arxiv.org/abs/2305.11206&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Benefits ==&lt;br /&gt;
&lt;br /&gt;
Zero-Shot Generalization: Instruction-tuned models show improved performance on tasks they were not explicitly trained on, as they learn the general concept of following instructions.&amp;lt;ref&amp;gt;Wei, J., et al., &amp;quot;Finetuned Language Models Are Zero-Shot Learners&amp;quot;, accessed 2025-12-15, https://research.google/pubs/finetuned-language-models-are-zero-shot-learners/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Steerability: Users can direct the model&amp;#039;s output style, tone, and format more effectively.&lt;br /&gt;
&lt;br /&gt;
Efficiency: Compared to full model re-training, instruction tuning is computationally cheaper and can be applied to smaller models to achieve performance comparable to larger, non-tuned models.&lt;br /&gt;
&lt;br /&gt;
== References == &lt;br /&gt;
&amp;lt;references /&amp;gt;&lt;/div&gt;</summary>
		<author><name>Whale</name></author>
	</entry>
</feed>