# testbed testbed is a framework to evaluate the efficiency of the completions generated by llm-ls and the underlying model. It works by first making holes in files, then generates completions for a given list of repositories and finally runs the associated unit tests. The result is a table containing a line for each repository and the total with the average percentage of successful unit tests. Here is a simplified pseudo code algorithm for testbed: ``` read the repositories file read the holes file(s) for each repository for each hole spawn a thread setup the repository -- only once for each repository copy files from the setup cache to a new temp dir make the hole as specified by the file generate completions build the code run the tests print results ``` ## Running testbed Before running testbed you will need to create a repositories file. It is a YAML file containing a list of repositories to test. It also contains the parameters to the `llm-ls/getCompletions` request. Repositories can either be sourced from your local storage or Github. You can check the repositories files at the root of the crate to see the full structure. ### Generating holes Before running testbed, you will need to generate a holes file for each repository. To generate a holes file run testbed with the `-g` option. You can specify the number of holes to make with `-n `. It will take the list of repositories in your YAML file and create the associated files at the defined path. ### Setup testbed runs hole completions in parallel. It will first, and only once per repository, create a temporary directory, then copy or download the repository's source files to that location and finally run the setup commands. Then for each subsequent completion it will copy the content of the "setup directory" to a new temporary directory so that work can be parallelised. Setup commands are useful to install dependencies. ```yaml setup_commands: - ["python3", ["-m", "venv", "huggingface_hub-venv"]] - ["huggingface_hub-venv/bin/python3", ["-m", "pip", "install", ".[dev]"]] ``` ### Build Before running the tests, testbed will run a build command to check if the code is valid. To configure the commands, you can do the following: ```yaml build_command: huggingface_hub-venv/bin/python3 build_args: ["-m", "compileall", "-q", "."] ``` ### Runners testbed supports four test runners: - cargo - jest - pytest - vitest To configure your runner, you have the following options: ```yaml runner: pytest runner_command: huggingface_hub-venv/bin/python3 runner_extra_args: - "-k" - "_utils_ and not _utils_cache and not _utils_http and not paginate and not git" ``` You can override the runners command with `runner_command`, which is useful when setting up dependencies in a venv. ## References testbed was inspired by [human-eval](https://github.com/openai/human-eval) and [RepoEval](https://arxiv.org/abs/2303.12570).