Operations Runbook
Overview
This runbook provides operational procedures for maintaining and operating the CyberAi control plane. Follow these procedures for common operational tasks.
Daily Operations
Health Check
# Run audit to verify system health
./tools/audit/audit.sh
# Check workflow status
gh workflow list
# Verify site deployment
curl -I https://cyberai.network Monitoring
- Check GitHub Actions for failed workflows
- Review contract validation results
- Monitor site uptime and performance
- Check for security alerts in the Security tab
Common Tasks
Adding a New Contract
- Create contract JSON in appropriate directory (
contracts/agents/orcontracts/repos/) - Validate against schema:
ajv validate -s contracts/contract.schema.json -d "contracts/agents/your-contract.json" - Create PR with the new contract
- Wait for automated validation to pass
- Merge after review
Updating Documentation
- Edit files in
site/src/pages/docs/ - Test locally:
cd site && npm run dev - Build:
npm run build - Commit and push to trigger deployment
Deploying Site Updates
# Build the site
cd site
npm run build
# Commit and push
git add .
git commit -m "Update site"
git push origin main
# Deployment happens automatically via GitHub Actions Incident Response
Site Down
- Check GitHub Pages status: githubstatus.com
- Verify DNS records for cyberai.network
- Check latest workflow runs for deployment failures
- Review recent commits for breaking changes
- Rollback if necessary:
git revert <commit-sha>
Contract Validation Failures
- Review workflow logs in GitHub Actions
- Identify failing contracts
- Validate manually:
./tools/audit/audit.sh - Fix schema violations or contract issues
- Re-run validation
Security Alert
- Review alert details in GitHub Security tab
- Assess severity and impact
- Create incident ticket
- Apply fix or mitigation
- Verify fix with audit tool
- Document incident and resolution
Maintenance Windows
Scheduled Maintenance
For planned maintenance:
- Announce maintenance window in advance
- Create maintenance branch
- Perform updates and testing
- Merge to main during maintenance window
- Verify deployment
- Announce completion
Emergency Maintenance
For critical issues requiring immediate attention:
- Assess impact and urgency
- Create hotfix branch from main
- Apply minimal fix
- Fast-track review and merge
- Monitor post-deployment
- Schedule follow-up for complete fix
Backup and Recovery
Data Backup
All data is version-controlled in Git:
- Contracts are backed up in the repository
- Site content is version-controlled
- Configuration files are tracked in Git
Recovery Procedures
# Restore from a specific commit
git checkout <commit-sha> -- contracts/
# Rebuild site
cd site && npm run build
# Redeploy
git push origin main Escalation
Escalate issues when:
- Security vulnerability affects production
- Site is down for more than 15 minutes
- Data integrity is compromised
- Unable to resolve within SLA
Contact: See SECURITY.md for escalation contacts.