6c4e0e4176
* feat: parallelise at hole level * fix(ci): move strategy to testbed job * feat: output json results file * fix(ci): install jq * fix(ci): add missing `runs-on` * fix(ci): add dependency to testbed job * fix(ci): invalid artifact key name * fix(ci): add missing i in fastapi key * feat(ci): make CI run different # of threads per repo * fix(ci): results.json not in markdown * feat: round output values * fix: avoid creating zombie processes * fix: check on word instead of line * feat: recreate holes for long CI |
||
---|---|---|
.. | ||
holes | ||
repositories/simple | ||
src | ||
Cargo.toml | ||
README.md | ||
repositories-ci.yaml | ||
repositories.yaml |
testbed
testbed is a framework to evaluate the efficiency of the completions generated by llm-ls and the underlying model.
It works by first making holes in files, then generates completions for a given list of repositories and finally runs the associated unit tests.
The result is a table containing a line for each repository and the total with the average percentage of successful unit tests.
Here is a simplified pseudo code algorithm for testbed:
read the repositories file
read the holes file(s)
for each repository
for each hole
spawn a thread
setup the repository -- only once for each repository
copy files from the setup cache to a new temp dir
make the hole as specified by the file
generate completions
build the code
run the tests
print results
Running testbed
Before running testbed you will need to create a repositories file. It is a YAML file containing a list of repositories to test.
It also contains the parameters to the llm-ls/getCompletions
request.
Repositories can either be sourced from your local storage or Github.
You can check the repositories files at the root of the crate to see the full structure.
Generating holes
Before running testbed, you will need to generate a holes file for each repository. To generate a holes file run testbed with the -g
option. You can specify the number of holes to make with -n <number>
. It will take the list of repositories in your YAML file and create the associated files at the defined path.
Setup
testbed runs hole completions in parallel. It will first, and only once per repository, create a temporary directory, then copy or download the repository's source files to that location and finally run the setup commands. Then for each subsequent completion it will copy the content of the "setup directory" to a new temporary directory so that work can be parallelised.
Setup commands are useful to install dependencies.
setup_commands:
- ["python3", ["-m", "venv", "huggingface_hub-venv"]]
- ["huggingface_hub-venv/bin/python3", ["-m", "pip", "install", ".[dev]"]]
Build
Before running the tests, testbed will run a build command to check if the code is valid.
To configure the commands, you can do the following:
build_command: huggingface_hub-venv/bin/python3
build_args: ["-m", "compileall", "-q", "."]
Runners
testbed supports four test runners:
- cargo
- jest
- pytest
- vitest
To configure your runner, you have the following options:
runner: pytest
runner_command: huggingface_hub-venv/bin/python3
runner_extra_args:
- "-k"
- "_utils_ and not _utils_cache and not _utils_http and not paginate and not git"
You can override the runners command with runner_command
, which is useful when setting up dependencies in a venv.
References
testbed was inspired by human-eval and RepoEval.