# Troubleshooting

Common issues and how to fix them. This page is updated as we solve real customer problems.

## Build problems

### Entry file not found

**Symptom:** Deploy returns a PID, but `agent_count` drops to 0 within seconds.

**Cause:** Your `entry` path in `[agent]` is relative to `[build] working_dir` (default: `/agent/code`). If your entry is `dist/index.js`, the agent looks for `/agent/code/dist/index.js`. Make sure your build steps create this file.

The agent process starts in `[build] working_dir` (default `/agent/code`). Entry paths are relative to this:

- `entry = "index.js"` → `/agent/code/index.js`
- `entry = "dist/index.js"` → `/agent/code/dist/index.js`

### Build didn't run

**Symptom:** Deploy succeeds but the agent crashes because dependencies aren't installed.

**Cause:** You skipped the build step. Run it first:

```bash
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/build \
  -H "Authorization: Bearer orb_YOUR_KEY"
```

### Agent running old code after a build

**Symptom:** You pushed new code, triggered a build, but the agent is still running the previous version.

**Fix:** This was a bug, fixed. Every `POST /build` call now runs `git fetch --depth=1 && git reset --hard FETCH_HEAD`, so new commits always land. If you hit this on an older build, trigger a new build and redeploy:

```bash
# Pull latest code
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/build \
  -H "Authorization: Bearer orb_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'

# Restart the computer (kills old process, preserves disk)
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/restart \
  -H "Authorization: Bearer orb_YOUR_KEY"

# Redeploy
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/agents \
  -H "Authorization: Bearer orb_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
```

### Agent in Failed state, computer stuck

**Symptom:** Agent shows `"state": "failed"` and redeploying immediately fails again. Computer is unusable.

**Cause:** The previous agent process left orphaned child processes holding the port. Fresh deploy fails because the port is already bound.

**Fix:** Use the restart endpoint, it kills every process in the computer's cgroup (including orphans), resets state to Ready, and preserves all data on disk:

```bash
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/restart \
  -H "Authorization: Bearer orb_YOUR_KEY"

# Then redeploy immediately
curl -X POST https://api.orbcloud.dev/v1/computers/{id}/agents \
  -H "Authorization: Bearer orb_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{}'
```

You do **not** need to delete and recreate the computer. All installed packages, cloned source, and persistent data survive the restart.

### npm ci / npm install fails

**Symptom:** Build step fails with `getaddrinfo EAI_AGAIN`.

**Cause:** DNS failure. The computer's network namespace can't resolve external hostnames. Fix: destroy the computer and create a new one.

If DNS fails, destroy and re-create the computer.

## Network problems

### Computer networking broken

**Symptom:** Agent can't reach external APIs. `curl` or DNS fails from inside the computer.

**Fix:** Destroy the computer and create a new one. This gives it a fresh network namespace with a clean subnet allocation.

```bash
curl -X DELETE https://api.orbcloud.dev/v1/computers/{id} \
  -H "Authorization: Bearer orb_YOUR_KEY"
```

Then re-create, re-upload config, re-build, re-deploy.

## Runtime problems

### Missing environment variable at startup

**Symptom:** Agent process exits immediately because a required env var isn't set.

**Cause:** Your code reads an env var (e.g. `ANTHROPIC_API_KEY`) at startup that isn't passed in. See [Environment Variables](config-reference.md#agent-env-environment-variables) in the config reference for how to set them, either as literals in `orb.toml` or via `org_secrets` at deploy time.

**Common mistake:** Using `${VAR}` syntax without passing `VAR` in `org_secrets`. The variable resolves to an empty string and your agent crashes. For API keys you control, use the literal value directly.

### Agent sleeps and never wakes

ORB automatically checkpoints idle agents to save memory. They wake on:

1. **LLM response arriving** on the proxied connection the agent opened. Automatic, no setup required.
2. **Inbound HTTP request to the agent's exposed port** via `https://{short-id}.orbcloud.dev/...`, only fires if your `orb.toml` has `[ports] expose = [...]`. ~1s latency on first request.
3. **Manual promote**, `POST /v1/computers/{id}/agents/promote` or via dashboard.

If your agent doesn't expose ports (`[ports] expose`), wake-on-request won't trigger. For proactive agents that don't receive HTTP, the LLM-call checkpoint handles most cases, the agent sleeps between LLM calls and wakes when the response arrives.

**ORB's management API endpoints do not wake the agent.** Calls to `/v1/computers/{id}/files`, `/v1/computers/{id}/agents`, `/v1/computers/{id}/metrics`, and `/v1/computers/{id}/terminal` read from the host filesystem and the runtime's in-memory registry directly. They never touch the agent process. You can list files, poll state, or load a dashboard against a sleeping agent without interrupting its sleep. For interactive debugging, use the WebSocket terminal or hit your agent's own HTTP endpoints.

## How to debug

1. **Check agent state:**
```bash
curl https://api.orbcloud.dev/v1/computers/{id} \
  -H "Authorization: Bearer orb_YOUR_KEY"
```

2. **Add a health endpoint to your agent**, return status, call count, error state. Hit it via `https://{short-id}.orbcloud.dev/health`.

3. **Log at startup**, print env vars, working directory, file paths. Check the agent's own logs via your health endpoint.

4. **Test locally first**, run your agent on your machine, verify it works, then deploy to ORB. The sandbox is the same execution environment.

5. **Test via your agent's own HTTP endpoints**, not via shell. Your agent should expose a health check endpoint (e.g. `GET /health`). Test Chrome, Playwright, or any binary by launching it from your agent code, that's the real execution context.

6. **Use build steps for setup commands** (npm install, pip install, apt-get install). Build steps run with full internet access and a 10-minute timeout each:

```toml
[build]
steps = [
  "npm ci",
  "npx tsc",
]
working_dir = "/agent/code"
```

## Chrome / Playwright

Use Playwright with Chromium. Install via build steps:

```toml
[build]
steps = [
  "apt-get update",
  "apt-get install -y libnss3 libatk-bridge2.0-0t64 libcups2t64 libdrm2 libgbm1 libpango-1.0-0 libcairo2 libasound2t64 libxshmfence1 libxcomposite1 libxrandr2 libxdamage1 libxfixes3 libxext6 libx11-xcb1 libxcb1 libxkbcommon0 libdbus-1-3",
  "npm install playwright",
  "npx playwright install chromium",
]
```

Launch Chrome from your agent code:

```javascript
const { chromium } = require('playwright');
const browser = await chromium.launch({
  headless: true,
  args: ['--no-sandbox', '--disable-gpu', '--disable-dev-shm-usage']
});
```

Chrome sessions (cookies, localStorage, active DOM) survive checkpoint and restore.

## Still stuck?

Email [support@orbcloud.dev](mailto:support@orbcloud.dev) from your registered email (human or agent, we match you by sender) with your computer ID, the command you ran, and the response you got. Prefer a call? Book one: [cal.com/nidhish](https://cal.com/nidhish).
