The grading primitive
SandboxAPI's response includes a status field that tells you exactly what happened. You don't have to diff strings yourself or interpret exit codes — the API does it for you.
exit_code = 0.
expected_output.
Multi-file submission, single grading call
Real interview problems aren't single files. SandboxAPI accepts a base64-encoded ZIP plus optional compile_script and run_script, so you can build problems with helper modules, headers, build configs — anything.
# Pack the candidate's submission + grader's harness into a ZIP.
$ ls submission/
solution.py
test_runner.py # the harness invokes solution and prints results
inputs/case1.txt
inputs/case2.txt
$ zip -r submission.zip submission/
$ BASE64_ZIP=$(base64 -i submission.zip)
# One POST grades all of it.
curl -X POST https://sandboxapi.p.rapidapi.com/v1/execute \
-H "X-RapidAPI-Key: $KEY" \
-d '{
"language": "python3",
"additional_files": "'"$BASE64_ZIP"'",
"compile_script": "true",
"run_script": "python test_runner.py",
"timeout": 30,
"expected_output": "PASS\nPASS\n"
}'
Grading verdict in one round trip
The response combines code output and the grading verdict. Your platform reads status and decides what to show the candidate.
{
"id": "exec_xyz",
"status": "wrong_answer",
"stdout": "PASS\nFAIL\n",
"stderr": "",
"compile_output": "",
"exit_code": 0,
"execution_time_ms": 187,
"wall_time_ms": 211,
"memory_used_kb": 12544
}
Map status straight into your candidate-facing UI:
completed+wrong_answeron different test cases → "Passed 7 of 10 cases."compilation_error→ showcompile_outputverbatim.timeout→ "Your solution exceeded the time budget. Optimize and retry."memory_limit→ "Out of memory — review your data structures."
Multi-language support
Every problem can support every language without separate code paths. The same endpoint runs Python, Java, C++, Rust, or any of the 12 supported languages — pick by passing a different language field. Custom compile_script / run_script means you can support languages with non-trivial build steps without extra infrastructure.
End-to-end Python example
import base64, zipfile, io, requests
def build_zip(files: dict) -> str:
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w") as z:
for name, content in files.items():
z.writestr(name, content)
return base64.b64encode(buf.getvalue()).decode()
def grade(language, files, run_script, expected, time_limit=10):
payload = {
"language": language,
"additional_files": build_zip(files),
"run_script": run_script,
"expected_output": expected,
"timeout": time_limit,
}
r = requests.post(
"https://sandboxapi.p.rapidapi.com/v1/execute",
headers={"X-RapidAPI-Key": API_KEY, "Content-Type": "application/json"},
json=payload,
)
return r.json()
# Grade a candidate's solution
result = grade(
language="python3",
files={
"solution.py": candidate_code,
"harness.py": HARNESS_CODE,
},
run_script="python harness.py",
expected="all_passed\n",
)
# Map verdict to candidate-facing message
verdict = {
"completed": "All test cases passed.",
"wrong_answer": "Output didn't match. Review the failing case.",
"timeout": "Time limit exceeded.",
"compilation_error": "Code didn't compile.",
"runtime_error": "Code crashed during execution.",
"memory_limit": "Out of memory.",
}.get(result["status"], "Internal error.")
print(verdict)
Why teams pick SandboxAPI for assessments
- Predictable latency — pre-warmed sandboxes mean p50 cold-start <300ms across all 12 languages.
- Modern runtimes — your candidates can solve in Python 3.12 or Node 22, not 2018-era stacks.
- Multi-file submissions — accept ZIPs, run custom build/run scripts. Grading harnesses, fixture files, helper modules — all in one call.
- Output verification —
expected_outputbuilt in. No grading-script bugs in your platform code. - Status taxonomy — six clean statuses cover every real grading outcome.
- gVisor isolation — candidate code runs inside a user-space kernel sandbox. Even if a candidate tries to bypass your time limit, syscall filtering stops it.