llm-ls/crates/testbed/README.md

81 lines
2.6 KiB
Markdown
Raw Normal View History

2023-11-06 15:26:37 -05:00
# testbed
testbed is a framework to evaluate the efficiency of the completions generated by llm-ls and the underlying model.
It works by first making holes in files, then generates completions for a given list of repositories and finally runs the associated unit tests.
The result is a table containing a line for each repository and the total with the average percentage of successful unit tests.
Here is a simplified pseudo code algorithm for testbed:
```
read the repositories file
read the holes file(s)
for each repository
spawn a thread
setup the repository
for each hole
make the hole as specified by the file
generate completions
build the code
run the tests
print results
```
## Running testbed
Before running testbed you will need to create a repositories file. It is a YAML file containing a list of repositories to test.
It also contains the parameters to the `llm-ls/getCompletions` request.
Repositories can either be sourced from your local storage or Github.
You can check the repositories files at the root of the crate to see the full structure.
### Generating holes
Before running testbed, you will need to generate a holes file for each repository. To generate a holes file run testbed with the `-g` option. You can specify the number of holes to make with `-n <number>`. It will take the list of repositories in your YAML file and create the associated files at the defined path.
### Setup
testbed runs completions for each repository in parallel. It will first create a temporary directory, then copy or download the repository's source files to that location and finally run the setup commands.
Setup commands are useful to install dependencies.
```yaml
setup_commands:
- ["python3", ["-m", "venv", "huggingface_hub-venv"]]
- ["huggingface_hub-venv/bin/python3", ["-m", "pip", "install", ".[dev]"]]
```
### Build
Before running the tests, testbed will run a build command to check if the code is valid.
To configure the commands, you can do the following:
```yaml
build_command: huggingface_hub-venv/bin/python3
build_args: ["-m", "compileall", "-q", "."]
```
### Runners
testbed supports two test runners at the moment:
- cargo
- pytest
To configure your runner, you have the following options:
```yaml
runner: pytest
runner_command: huggingface_hub-venv/bin/python3
runner_extra_args:
- "-k"
- "_utils_ and not _utils_cache and not _utils_http and not paginate and not git"
```
You can override the runners command with `runner_command`, which is useful when setting up dependencies in a venv.
## References
testbed was inspired by [human-eval](https://github.com/openai/human-eval) and [RepoEval](https://arxiv.org/abs/2303.12570).