SandboxAPI for coding interviews

The grading primitive

SandboxAPI's response includes a status field that tells you exactly what happened. You don't have to diff strings yourself or interpret exit codes — the API does it for you.

completed Code ran cleanly. exit_code = 0.

wrong_answer Code ran cleanly but stdout didn't match expected_output.

timeout Code exceeded its CPU or wall-time budget.

compilation_error Compiled languages: build failed before execution.

runtime_error Process exited with non-zero exit code or signal.

memory_limit Process exceeded its memory budget; killed by cgroup.

Multi-file submission, single grading call

Real interview problems aren't single files. SandboxAPI accepts a base64-encoded ZIP plus optional compile_script and run_script, so you can build problems with helper modules, headers, build configs — anything.

# Pack the candidate's submission + grader's harness into a ZIP.
$ ls submission/
  solution.py
  test_runner.py    # the harness invokes solution and prints results
  inputs/case1.txt
  inputs/case2.txt
$ zip -r submission.zip submission/
$ BASE64_ZIP=$(base64 -i submission.zip)

# One POST grades all of it.
curl -X POST https://sandboxapi.p.rapidapi.com/v1/execute \
  -H "X-RapidAPI-Key: $KEY" \
  -d '{
    "language": "python3",
    "additional_files": "'"$BASE64_ZIP"'",
    "compile_script": "true",
    "run_script": "python test_runner.py",
    "timeout": 30,
    "expected_output": "PASS\nPASS\n"
  }'

Grading verdict in one round trip

The response combines code output and the grading verdict. Your platform reads status and decides what to show the candidate.

{
  "id": "exec_xyz",
  "status": "wrong_answer",
  "stdout": "PASS\nFAIL\n",
  "stderr": "",
  "compile_output": "",
  "exit_code": 0,
  "execution_time_ms": 187,
  "wall_time_ms": 211,
  "memory_used_kb": 12544
}

Map status straight into your candidate-facing UI:

completed + wrong_answer on different test cases → "Passed 7 of 10 cases."
compilation_error → show compile_output verbatim.
timeout → "Your solution exceeded the time budget. Optimize and retry."
memory_limit → "Out of memory — review your data structures."

Multi-language support

Every problem can support every language without separate code paths. The same endpoint runs Python, Java, C++, Rust, or any of the 12 supported languages — pick by passing a different language field. Custom compile_script / run_script means you can support languages with non-trivial build steps without extra infrastructure.

End-to-end Python example

import base64, zipfile, io, requests

def build_zip(files: dict) -> str:
    buf = io.BytesIO()
    with zipfile.ZipFile(buf, "w") as z:
        for name, content in files.items():
            z.writestr(name, content)
    return base64.b64encode(buf.getvalue()).decode()

def grade(language, files, run_script, expected, time_limit=10):
    payload = {
        "language": language,
        "additional_files": build_zip(files),
        "run_script": run_script,
        "expected_output": expected,
        "timeout": time_limit,
    }
    r = requests.post(
        "https://sandboxapi.p.rapidapi.com/v1/execute",
        headers={"X-RapidAPI-Key": API_KEY, "Content-Type": "application/json"},
        json=payload,
    )
    return r.json()

# Grade a candidate's solution
result = grade(
    language="python3",
    files={
        "solution.py": candidate_code,
        "harness.py": HARNESS_CODE,
    },
    run_script="python harness.py",
    expected="all_passed\n",
)

# Map verdict to candidate-facing message
verdict = {
    "completed":         "All test cases passed.",
    "wrong_answer":      "Output didn't match. Review the failing case.",
    "timeout":           "Time limit exceeded.",
    "compilation_error": "Code didn't compile.",
    "runtime_error":     "Code crashed during execution.",
    "memory_limit":      "Out of memory.",
}.get(result["status"], "Internal error.")
print(verdict)

Why teams pick SandboxAPI for assessments

Predictable latency — pre-warmed sandboxes mean p50 cold-start <300ms across all 12 languages.
Modern runtimes — your candidates can solve in Python 3.12 or Node 22, not 2018-era stacks.
Multi-file submissions — accept ZIPs, run custom build/run scripts. Grading harnesses, fixture files, helper modules — all in one call.
Output verification — expected_output built in. No grading-script bugs in your platform code.
Status taxonomy — six clean statuses cover every real grading outcome.
gVisor isolation — candidate code runs inside a user-space kernel sandbox. Even if a candidate tries to bypass your time limit, syscall filtering stops it.

Recommendation: start on Pro for the 60s timeout and 50-snippet batch. Move to Ultra at scale for 300s timeouts (long-running test suites) and 100-snippet batches. The Mega tier is right for institutional assessment platforms running 100k+ submissions a month.

Build coding interviews that grade themselves

The grading primitive

Multi-file submission, single grading call

Grading verdict in one round trip

Multi-language support

End-to-end Python example

Why teams pick SandboxAPI for assessments

Start with Pro