Name		Name	Last commit message	Last commit date
parent directory ..
data		data
scripts		scripts
src		src
tests		tests
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
set_env_variables_template.sh		set_env_variables_template.sh

README.md

Semi-automatic Q&A dataset generation

Context

Fine-tuning is about adjusting the model weights to maximize performance on a narrowly defined specific task, for example, provide the best possible financial advice.

In a real-world project, we would hire a team of financial experts, to bootstrap an initial dataset of pairs (question, answer). In this tutorial, we will follow a semi-automatic approach, and use a general LLM, like ChatGPT, to bootstrap a reasonable training set.

This dataset should resemble as much as possible the actual questions, and answers we expect, from this model once deployed. This is the dataset we will use to fine-tune our LLM.

Quick set up

Set up virtual env using Poetry
```
$ make init
```
Run the init script for environment variables
```
$ . ./set_env_variables.sh
```
Generate a sample of training data
```
$ make training-data
```

Not used here but might be useful later on

Unused pieces of code that can be useful later on, for example, to backfill the feature store or the vector db.

Get around 18k news from January 2023 from Alpaca into a JSON file:
```
$ make download
```
Push this JSON file to Qdrant DB as embeddings
```
$ make embed
```

References

A bit more about prompt engineering Here is a recent prompt engineering idea we can use with ChatGP https://twitter.com/jeremyphoward/status/1689464587077509120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

q_and_a_dataset_generator

q_and_a_dataset_generator

README.md

Semi-automatic Q&A dataset generation

Context

Quick set up

Not used here but might be useful later on

References

Files

q_and_a_dataset_generator

Directory actions

More options

Directory actions

More options

Latest commit

History

q_and_a_dataset_generator

Folders and files

parent directory

README.md

Semi-automatic Q&A dataset generation

Context

Quick set up

Not used here but might be useful later on

References