llm-ls/crates/testbed
Luc Georges 59185abfd9
feat!: make API camelCase (#46)
API had inconsistencies in case
2023-11-09 12:00:39 +01:00
..
holes feat: testbed (#39) 2023-11-06 21:26:37 +01:00
repositories/simple feat: testbed (#39) 2023-11-06 21:26:37 +01:00
src feat!: make API camelCase (#46) 2023-11-09 12:00:39 +01:00
Cargo.toml feat: testbed (#39) 2023-11-06 21:26:37 +01:00
README.md feat: testbed (#39) 2023-11-06 21:26:37 +01:00
repositories-ci.yaml feat!: make API camelCase (#46) 2023-11-09 12:00:39 +01:00
repositories.yaml feat!: make API camelCase (#46) 2023-11-09 12:00:39 +01:00

testbed

testbed is a framework to evaluate the efficiency of the completions generated by llm-ls and the underlying model.

It works by first making holes in files, then generates completions for a given list of repositories and finally runs the associated unit tests.

The result is a table containing a line for each repository and the total with the average percentage of successful unit tests.

Here is a simplified pseudo code algorithm for testbed:

read the repositories file
read the holes file(s)
for each repository
  spawn a thread
  setup the repository
  for each hole
    make the hole as specified by the file
    generate completions
    build the code
    run the tests
print results

Running testbed

Before running testbed you will need to create a repositories file. It is a YAML file containing a list of repositories to test.

It also contains the parameters to the llm-ls/getCompletions request.

Repositories can either be sourced from your local storage or Github.

You can check the repositories files at the root of the crate to see the full structure.

Generating holes

Before running testbed, you will need to generate a holes file for each repository. To generate a holes file run testbed with the -g option. You can specify the number of holes to make with -n <number>. It will take the list of repositories in your YAML file and create the associated files at the defined path.

Setup

testbed runs completions for each repository in parallel. It will first create a temporary directory, then copy or download the repository's source files to that location and finally run the setup commands.

Setup commands are useful to install dependencies.

setup_commands:
  - ["python3", ["-m", "venv", "huggingface_hub-venv"]]
  - ["huggingface_hub-venv/bin/python3", ["-m", "pip", "install", ".[dev]"]]

Build

Before running the tests, testbed will run a build command to check if the code is valid.

To configure the commands, you can do the following:

build_command: huggingface_hub-venv/bin/python3
build_args: ["-m", "compileall", "-q", "."]

Runners

testbed supports two test runners at the moment:

  • cargo
  • pytest

To configure your runner, you have the following options:

runner: pytest
runner_command: huggingface_hub-venv/bin/python3
runner_extra_args:
  - "-k"
  - "_utils_ and not _utils_cache and not _utils_http and not paginate and not git"

You can override the runners command with runner_command, which is useful when setting up dependencies in a venv.

References

testbed was inspired by human-eval and RepoEval.