docs: add deployment checklist for database consolidation
IMPORTANT: This migration is BLOCKED until Impress 2020 is retired. Created comprehensive deployment guide documenting: - Why this migration is blocked (Impress 2020 uses openneo_id directly) - Two paths forward: retire Impress 2020 (recommended) or coordinated update - Complete step-by-step deployment checklist for when ready - Rollback procedures - Risk assessment and mitigations - Success criteria and timeline estimates This ensures we don't accidentally deploy this change before addressing the Impress 2020 dependency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
9ba94f9f4b
commit
f311c92dbb
1 changed files with 287 additions and 0 deletions
287
docs/database-consolidation-deployment.md
Normal file
287
docs/database-consolidation-deployment.md
Normal file
|
|
@ -0,0 +1,287 @@
|
|||
# Database Consolidation Deployment Guide
|
||||
|
||||
This document outlines the plan and checklist for consolidating the `openneo_id` database into the main `openneo_impress` database.
|
||||
|
||||
## Current Status: BLOCKED
|
||||
|
||||
**This migration cannot be deployed until Impress 2020 is retired.**
|
||||
|
||||
## The Problem
|
||||
|
||||
While the main DTI Rails app is ready to move to a single-database architecture, **Impress 2020 still directly accesses both databases**:
|
||||
|
||||
- `openneo_impress` - For reading item, pet, and outfit data
|
||||
- `openneo_id` - For user authentication via GraphQL
|
||||
|
||||
If we consolidate the databases now, Impress 2020's authentication will break immediately, causing login failures for users accessing DTI through the Impress 2020 GraphQL API.
|
||||
|
||||
## Path Forward
|
||||
|
||||
There are two options to unblock this migration:
|
||||
|
||||
### Option A: Retire Impress 2020 First (Recommended)
|
||||
|
||||
1. Complete the migration of remaining Impress 2020 dependencies back to the main Rails app
|
||||
- See `docs/impress-2020-dependencies.md` for current status
|
||||
- Primary remaining dependencies: GraphQL API for outfit data, image generation service
|
||||
2. Spin down the Impress 2020 service entirely
|
||||
3. Execute the database consolidation (steps below)
|
||||
|
||||
### Option B: Coordinated Update (Complex)
|
||||
|
||||
1. Update Impress 2020 to point to `openneo_impress.auth_users` instead of `openneo_id.users`
|
||||
2. Deploy both applications simultaneously during a maintenance window
|
||||
3. Execute the database consolidation
|
||||
|
||||
**Recommendation:** Option A is simpler and aligns with our long-term goal of fully consolidating back into the Rails monolith.
|
||||
|
||||
---
|
||||
|
||||
## Deployment Checklist (When Ready)
|
||||
|
||||
⚠️ **DO NOT EXECUTE UNTIL IMPRESS 2020 IS RETIRED**
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [ ] Impress 2020 service is spun down and no longer accessing databases
|
||||
- [ ] All Impress 2020 dependencies have been migrated to main Rails app
|
||||
- [ ] Database backups are current and tested
|
||||
- [ ] Maintenance window scheduled (estimate: 30-60 minutes)
|
||||
|
||||
### Phase 1: Deploy Write Lock
|
||||
|
||||
**Branch:** `feature/consolidate-auth-database` @ commit `604a8667`
|
||||
|
||||
**Purpose:** Prevent writes to AuthUser table while keeping login/logout functional.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Deploy Phase 1 to production
|
||||
2. Verify:
|
||||
- [ ] Existing users can log in
|
||||
- [ ] Existing users can log out
|
||||
- [ ] Registration shows maintenance message
|
||||
- [ ] Settings updates show maintenance message
|
||||
- [ ] NeoPass connection shows maintenance message
|
||||
|
||||
**Expected Downtime:** None (read-only mode for account changes only)
|
||||
|
||||
### Phase 2: Copy Data
|
||||
|
||||
**Purpose:** Copy auth data from `openneo_id` to `openneo_impress` while table is stable.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Backup openneo_id database:**
|
||||
```bash
|
||||
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_backup_$(date +%Y%m%d_%H%M%S).sql
|
||||
```
|
||||
|
||||
2. **Verify backup:**
|
||||
```bash
|
||||
# Check file size is reasonable
|
||||
ls -lh openneo_id_backup_*.sql
|
||||
|
||||
# Spot-check contents
|
||||
head -n 50 openneo_id_backup_*.sql
|
||||
```
|
||||
|
||||
3. **Run the migration:**
|
||||
```bash
|
||||
cd /var/www/impress
|
||||
bundle exec rails db:migrate
|
||||
```
|
||||
|
||||
4. **Verify data copy:**
|
||||
```sql
|
||||
-- Connect to MySQL
|
||||
mysql -h [host] -u [user] -p
|
||||
|
||||
-- Check row counts match
|
||||
SELECT COUNT(*) AS openneo_id_count FROM openneo_id.users;
|
||||
SELECT COUNT(*) AS auth_users_count FROM openneo_impress.auth_users;
|
||||
|
||||
-- Spot-check a few records
|
||||
SELECT id, name, email FROM openneo_id.users LIMIT 5;
|
||||
SELECT id, name, email FROM openneo_impress.auth_users WHERE id IN (1, 2, 3, 4, 5);
|
||||
|
||||
-- Verify indexes were created
|
||||
SHOW INDEX FROM openneo_impress.auth_users;
|
||||
```
|
||||
|
||||
5. **Verify results:**
|
||||
- [ ] Row counts match exactly
|
||||
- [ ] Sample records match (IDs, names, emails)
|
||||
- [ ] All 4 indexes created (email, provider+uid, reset_password_token, unlock_token)
|
||||
|
||||
**Expected Downtime:** None (still in write-lock mode)
|
||||
|
||||
### Phase 3: Switch to New Table
|
||||
|
||||
**Branch:** `feature/consolidate-auth-database` @ commit `2c21269a`
|
||||
|
||||
**Purpose:** Point AuthUser at consolidated table, restore full functionality.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Deploy Phase 2 to production
|
||||
2. **Immediately test critical paths:**
|
||||
- [ ] Login with existing account
|
||||
- [ ] Logout
|
||||
- [ ] Register new account
|
||||
- [ ] Update account settings (email, password)
|
||||
- [ ] Connect NeoPass (if available)
|
||||
- [ ] Disconnect NeoPass (if available)
|
||||
|
||||
3. **Monitor error logs:**
|
||||
```bash
|
||||
tail -f /var/www/impress/log/production.log | grep -i error
|
||||
```
|
||||
|
||||
4. **Verify database queries are using auth_users:**
|
||||
```bash
|
||||
# Check recent queries in logs
|
||||
grep "auth_users" /var/www/impress/log/production.log | tail -n 20
|
||||
|
||||
# Should see SELECT/INSERT/UPDATE on auth_users, NOT openneo_id.users
|
||||
```
|
||||
|
||||
**Expected Downtime:** Brief (< 1 minute for deployment)
|
||||
|
||||
**Rollback Plan:** If critical issues found, revert to Phase 1 commit and restore openneo_id from backup.
|
||||
|
||||
### Phase 4: Documentation Update
|
||||
|
||||
**Branch:** `feature/consolidate-auth-database` @ commit `9ba94f9f`
|
||||
|
||||
**Purpose:** Update documentation to reflect single-database architecture.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Deploy Phase 3 to production
|
||||
2. Verify no errors
|
||||
|
||||
**Expected Downtime:** None
|
||||
|
||||
### Phase 5: Database Teardown
|
||||
|
||||
**Purpose:** Remove the now-unused `openneo_id` database.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Wait 7 days** to ensure no issues found in production
|
||||
|
||||
2. **Final backup:**
|
||||
```bash
|
||||
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_final_backup_$(date +%Y%m%d_%H%M%S).sql
|
||||
```
|
||||
|
||||
3. **Store backup offsite:**
|
||||
- Upload to secure backup storage
|
||||
- Keep for at least 90 days
|
||||
|
||||
4. **Drop the database:**
|
||||
```sql
|
||||
DROP DATABASE openneo_id;
|
||||
```
|
||||
|
||||
5. **Remove environment variable:**
|
||||
- Delete `DATABASE_URL_OPENNEO_ID` from production environment config
|
||||
- Restart app to ensure it doesn't try to connect
|
||||
|
||||
6. **Update MySQL users:**
|
||||
```sql
|
||||
-- Remove openneo_id privileges from users
|
||||
-- (Already done in deploy/setup.yml for new deployments)
|
||||
```
|
||||
|
||||
**Expected Downtime:** None
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### If Issues Found After Phase 3
|
||||
|
||||
1. **Immediate rollback:**
|
||||
```bash
|
||||
# Revert to Phase 1 commit
|
||||
git checkout 604a8667
|
||||
bundle exec rails db:migrate:down VERSION=20251102064247
|
||||
# Deploy
|
||||
```
|
||||
|
||||
2. **Restore openneo_id (if needed):**
|
||||
```bash
|
||||
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
|
||||
```
|
||||
|
||||
3. **Investigate issues before reattempting**
|
||||
|
||||
### If Data Corruption Detected
|
||||
|
||||
1. **Immediately restore from backup:**
|
||||
```bash
|
||||
# Drop corrupted auth_users table
|
||||
mysql -h [host] -u [user] -p -e "DROP TABLE openneo_impress.auth_users;"
|
||||
|
||||
# Restore openneo_id if needed
|
||||
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
|
||||
```
|
||||
|
||||
2. **Revert to pre-migration code**
|
||||
3. **Review migration SQL before reattempting**
|
||||
|
||||
---
|
||||
|
||||
## Key Risks & Mitigations
|
||||
|
||||
| Risk | Impact | Mitigation | Status |
|
||||
|------|--------|------------|--------|
|
||||
| Impress 2020 auth breaks | HIGH - Users can't log in via I2020 | Block deployment until I2020 retired | ⚠️ BLOCKING |
|
||||
| Data copy fails mid-migration | HIGH - Incomplete auth data | Wrapped in transaction, can rollback | ✅ Mitigated |
|
||||
| Production traffic during copy | MEDIUM - Stale data | Write lock prevents changes | ✅ Mitigated |
|
||||
| Schema mismatch between DBs | MEDIUM - Migration fails | Migration matches exact schema | ✅ Mitigated |
|
||||
| Indexes not created | MEDIUM - Slow queries | Verification step checks indexes | ✅ Mitigated |
|
||||
| Login tracking data loss | LOW - Missing login stats | Acceptable trade-off | ✅ Accepted |
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] All existing users can log in
|
||||
- [ ] New user registration works
|
||||
- [ ] Settings updates work
|
||||
- [ ] NeoPass connection/disconnection works
|
||||
- [ ] No errors in production logs
|
||||
- [ ] Query performance unchanged
|
||||
- [ ] Database row counts match
|
||||
- [ ] All auth_users indexes present
|
||||
|
||||
---
|
||||
|
||||
## Timeline Estimate
|
||||
|
||||
**Total time:** 30-60 minutes (after Impress 2020 retired)
|
||||
|
||||
- Phase 1 deployment: 5 min
|
||||
- Phase 2 data copy: 5-10 min (depending on user count)
|
||||
- Phase 3 deployment + testing: 15-30 min
|
||||
- Phase 4 deployment: 5 min
|
||||
- Phase 5 teardown: 7+ days later, 10 min
|
||||
|
||||
---
|
||||
|
||||
## Questions Before Proceeding
|
||||
|
||||
1. **Is Impress 2020 fully retired?** If not, STOP.
|
||||
2. Do we have recent database backups? (< 24 hours old)
|
||||
3. Do we have a maintenance window scheduled?
|
||||
4. Have we announced the maintenance to users?
|
||||
5. Do we have rollback access ready?
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** November 2025
|
||||
**Status:** Blocked on Impress 2020 retirement
|
||||
**Branch:** `feature/consolidate-auth-database`
|
||||
Loading…
Reference in a new issue