Taxonomy repository FAQs #538

stevsmit · 2024-03-15T19:42:41Z

stevsmit
Mar 15, 2024

Taxonomy repository FAQs

This page serves as an FAQ for the taxonomy repository. Note that this page covers more niche questions related to the taxonomy repository. For more general questions related to InstructLab and contributing to the taxonomy repository, see .

About the taxonomy repository

InstructLab uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "lab" in InstructLab stands for Large-scale Alignment for Chat Bots. The LAB method is driven by taxonomies, which are largely created manually and with care.

The taxonomy repository contains a taxonomy tree that will allow you to create models tuned with your data (enhanced via synthetic data generation) using the LAB method.

Taxonomy repository FAQs

The following FAQs are common questions related to the LLM's taxonomy and the taxonomy repository.

Q: What languages are contributions being accepted in?

A: Contributions are currently accepted in English.

Q: How do I sign the DCO if my PR was blocked?

A: To ensure your PR isn't blocked in the future, always include "Signed-off-by: Author Name [email protected]" in every commit message. You can also do this automatically by using the -s flag (i.e., git commit -s).

Q: Are there any tools within the project to ensure that our YAML files are properly formatted before submitting them?

A: Currently, we're in the process of implementing tools for this purpose. Some PRs are already up, and we're also considering adding a linter on the taxonomy repository as a PR check.

Q: Do we know who is approving PRs to add skills? Is it one person, multiple people, etc? It seems that this model of training is highly susceptible to implicit bias.

A: We have a dedicated team managing this task. They are meticulous and are developing governance and processes to ensure fair and unbiased approval of PRs.

Q: LLMs always get confused with dates. Can we teach it to understand calendars?

A: While teaching the LLM to understand calendars may not be feasible, we can add a skill to acknowledge its limitations in answering certain questions.

Q: Can someone "inside" of the team discuss how this knowledge base will be curated? With thousands of contributions expected, how will variations in skill, depth, topic, relevance, and quality be managed?

A: We have a dedicated taxonomy triage and review workstream developing processes and documentation to address this challenge and ensure the quality and relevance of contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taxonomy repository FAQs #538

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Taxonomy repository FAQs #538

stevsmit Mar 15, 2024

Taxonomy repository FAQs

About the taxonomy repository

Taxonomy repository FAQs

Replies: 0 comments

stevsmit
Mar 15, 2024