6c4e0e4176
* feat: parallelise at hole level * fix(ci): move strategy to testbed job * feat: output json results file * fix(ci): install jq * fix(ci): add missing `runs-on` * fix(ci): add dependency to testbed job * fix(ci): invalid artifact key name * fix(ci): add missing i in fastapi key * feat(ci): make CI run different # of threads per repo * fix(ci): results.json not in markdown * feat: round output values * fix: avoid creating zombie processes * fix: check on word instead of line * feat: recreate holes for long CI
84 lines
2.9 KiB
Markdown
84 lines
2.9 KiB
Markdown
# testbed
|
|
|
|
testbed is a framework to evaluate the efficiency of the completions generated by llm-ls and the underlying model.
|
|
|
|
It works by first making holes in files, then generates completions for a given list of repositories and finally runs the associated unit tests.
|
|
|
|
The result is a table containing a line for each repository and the total with the average percentage of successful unit tests.
|
|
|
|
Here is a simplified pseudo code algorithm for testbed:
|
|
```
|
|
read the repositories file
|
|
read the holes file(s)
|
|
for each repository
|
|
for each hole
|
|
spawn a thread
|
|
setup the repository -- only once for each repository
|
|
copy files from the setup cache to a new temp dir
|
|
make the hole as specified by the file
|
|
generate completions
|
|
build the code
|
|
run the tests
|
|
print results
|
|
```
|
|
|
|
## Running testbed
|
|
|
|
Before running testbed you will need to create a repositories file. It is a YAML file containing a list of repositories to test.
|
|
|
|
It also contains the parameters to the `llm-ls/getCompletions` request.
|
|
|
|
Repositories can either be sourced from your local storage or Github.
|
|
|
|
You can check the repositories files at the root of the crate to see the full structure.
|
|
|
|
### Generating holes
|
|
|
|
Before running testbed, you will need to generate a holes file for each repository. To generate a holes file run testbed with the `-g` option. You can specify the number of holes to make with `-n <number>`. It will take the list of repositories in your YAML file and create the associated files at the defined path.
|
|
|
|
### Setup
|
|
|
|
testbed runs hole completions in parallel. It will first, and only once per repository, create a temporary directory, then copy or download the repository's source files to that location and finally run the setup commands. Then for each subsequent completion it will copy the content of the "setup directory" to a new temporary directory so that work can be parallelised.
|
|
|
|
Setup commands are useful to install dependencies.
|
|
|
|
```yaml
|
|
setup_commands:
|
|
- ["python3", ["-m", "venv", "huggingface_hub-venv"]]
|
|
- ["huggingface_hub-venv/bin/python3", ["-m", "pip", "install", ".[dev]"]]
|
|
```
|
|
|
|
### Build
|
|
|
|
Before running the tests, testbed will run a build command to check if the code is valid.
|
|
|
|
To configure the commands, you can do the following:
|
|
|
|
```yaml
|
|
build_command: huggingface_hub-venv/bin/python3
|
|
build_args: ["-m", "compileall", "-q", "."]
|
|
```
|
|
|
|
### Runners
|
|
|
|
testbed supports four test runners:
|
|
- cargo
|
|
- jest
|
|
- pytest
|
|
- vitest
|
|
|
|
To configure your runner, you have the following options:
|
|
```yaml
|
|
runner: pytest
|
|
runner_command: huggingface_hub-venv/bin/python3
|
|
runner_extra_args:
|
|
- "-k"
|
|
- "_utils_ and not _utils_cache and not _utils_http and not paginate and not git"
|
|
```
|
|
|
|
You can override the runners command with `runner_command`, which is useful when setting up dependencies in a venv.
|
|
|
|
## References
|
|
|
|
testbed was inspired by [human-eval](https://github.com/openai/human-eval) and [RepoEval](https://arxiv.org/abs/2303.12570).
|