diff --git a/docs/database-consolidation-deployment.md b/docs/database-consolidation-deployment.md new file mode 100644 index 00000000..71b7bf66 --- /dev/null +++ b/docs/database-consolidation-deployment.md @@ -0,0 +1,287 @@ +# Database Consolidation Deployment Guide + +This document outlines the plan and checklist for consolidating the `openneo_id` database into the main `openneo_impress` database. + +## Current Status: BLOCKED + +**This migration cannot be deployed until Impress 2020 is retired.** + +## The Problem + +While the main DTI Rails app is ready to move to a single-database architecture, **Impress 2020 still directly accesses both databases**: + +- `openneo_impress` - For reading item, pet, and outfit data +- `openneo_id` - For user authentication via GraphQL + +If we consolidate the databases now, Impress 2020's authentication will break immediately, causing login failures for users accessing DTI through the Impress 2020 GraphQL API. + +## Path Forward + +There are two options to unblock this migration: + +### Option A: Retire Impress 2020 First (Recommended) + +1. Complete the migration of remaining Impress 2020 dependencies back to the main Rails app + - See `docs/impress-2020-dependencies.md` for current status + - Primary remaining dependencies: GraphQL API for outfit data, image generation service +2. Spin down the Impress 2020 service entirely +3. Execute the database consolidation (steps below) + +### Option B: Coordinated Update (Complex) + +1. Update Impress 2020 to point to `openneo_impress.auth_users` instead of `openneo_id.users` +2. Deploy both applications simultaneously during a maintenance window +3. Execute the database consolidation + +**Recommendation:** Option A is simpler and aligns with our long-term goal of fully consolidating back into the Rails monolith. + +--- + +## Deployment Checklist (When Ready) + +⚠️ **DO NOT EXECUTE UNTIL IMPRESS 2020 IS RETIRED** + +### Prerequisites + +- [ ] Impress 2020 service is spun down and no longer accessing databases +- [ ] All Impress 2020 dependencies have been migrated to main Rails app +- [ ] Database backups are current and tested +- [ ] Maintenance window scheduled (estimate: 30-60 minutes) + +### Phase 1: Deploy Write Lock + +**Branch:** `feature/consolidate-auth-database` @ commit `604a8667` + +**Purpose:** Prevent writes to AuthUser table while keeping login/logout functional. + +**Steps:** + +1. Deploy Phase 1 to production +2. Verify: + - [ ] Existing users can log in + - [ ] Existing users can log out + - [ ] Registration shows maintenance message + - [ ] Settings updates show maintenance message + - [ ] NeoPass connection shows maintenance message + +**Expected Downtime:** None (read-only mode for account changes only) + +### Phase 2: Copy Data + +**Purpose:** Copy auth data from `openneo_id` to `openneo_impress` while table is stable. + +**Steps:** + +1. **Backup openneo_id database:** + ```bash + mysqldump -h [host] -u [user] -p openneo_id > openneo_id_backup_$(date +%Y%m%d_%H%M%S).sql + ``` + +2. **Verify backup:** + ```bash + # Check file size is reasonable + ls -lh openneo_id_backup_*.sql + + # Spot-check contents + head -n 50 openneo_id_backup_*.sql + ``` + +3. **Run the migration:** + ```bash + cd /var/www/impress + bundle exec rails db:migrate + ``` + +4. **Verify data copy:** + ```sql + -- Connect to MySQL + mysql -h [host] -u [user] -p + + -- Check row counts match + SELECT COUNT(*) AS openneo_id_count FROM openneo_id.users; + SELECT COUNT(*) AS auth_users_count FROM openneo_impress.auth_users; + + -- Spot-check a few records + SELECT id, name, email FROM openneo_id.users LIMIT 5; + SELECT id, name, email FROM openneo_impress.auth_users WHERE id IN (1, 2, 3, 4, 5); + + -- Verify indexes were created + SHOW INDEX FROM openneo_impress.auth_users; + ``` + +5. **Verify results:** + - [ ] Row counts match exactly + - [ ] Sample records match (IDs, names, emails) + - [ ] All 4 indexes created (email, provider+uid, reset_password_token, unlock_token) + +**Expected Downtime:** None (still in write-lock mode) + +### Phase 3: Switch to New Table + +**Branch:** `feature/consolidate-auth-database` @ commit `2c21269a` + +**Purpose:** Point AuthUser at consolidated table, restore full functionality. + +**Steps:** + +1. Deploy Phase 2 to production +2. **Immediately test critical paths:** + - [ ] Login with existing account + - [ ] Logout + - [ ] Register new account + - [ ] Update account settings (email, password) + - [ ] Connect NeoPass (if available) + - [ ] Disconnect NeoPass (if available) + +3. **Monitor error logs:** + ```bash + tail -f /var/www/impress/log/production.log | grep -i error + ``` + +4. **Verify database queries are using auth_users:** + ```bash + # Check recent queries in logs + grep "auth_users" /var/www/impress/log/production.log | tail -n 20 + + # Should see SELECT/INSERT/UPDATE on auth_users, NOT openneo_id.users + ``` + +**Expected Downtime:** Brief (< 1 minute for deployment) + +**Rollback Plan:** If critical issues found, revert to Phase 1 commit and restore openneo_id from backup. + +### Phase 4: Documentation Update + +**Branch:** `feature/consolidate-auth-database` @ commit `9ba94f9f` + +**Purpose:** Update documentation to reflect single-database architecture. + +**Steps:** + +1. Deploy Phase 3 to production +2. Verify no errors + +**Expected Downtime:** None + +### Phase 5: Database Teardown + +**Purpose:** Remove the now-unused `openneo_id` database. + +**Steps:** + +1. **Wait 7 days** to ensure no issues found in production + +2. **Final backup:** + ```bash + mysqldump -h [host] -u [user] -p openneo_id > openneo_id_final_backup_$(date +%Y%m%d_%H%M%S).sql + ``` + +3. **Store backup offsite:** + - Upload to secure backup storage + - Keep for at least 90 days + +4. **Drop the database:** + ```sql + DROP DATABASE openneo_id; + ``` + +5. **Remove environment variable:** + - Delete `DATABASE_URL_OPENNEO_ID` from production environment config + - Restart app to ensure it doesn't try to connect + +6. **Update MySQL users:** + ```sql + -- Remove openneo_id privileges from users + -- (Already done in deploy/setup.yml for new deployments) + ``` + +**Expected Downtime:** None + +--- + +## Rollback Procedures + +### If Issues Found After Phase 3 + +1. **Immediate rollback:** + ```bash + # Revert to Phase 1 commit + git checkout 604a8667 + bundle exec rails db:migrate:down VERSION=20251102064247 + # Deploy + ``` + +2. **Restore openneo_id (if needed):** + ```bash + mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql + ``` + +3. **Investigate issues before reattempting** + +### If Data Corruption Detected + +1. **Immediately restore from backup:** + ```bash + # Drop corrupted auth_users table + mysql -h [host] -u [user] -p -e "DROP TABLE openneo_impress.auth_users;" + + # Restore openneo_id if needed + mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql + ``` + +2. **Revert to pre-migration code** +3. **Review migration SQL before reattempting** + +--- + +## Key Risks & Mitigations + +| Risk | Impact | Mitigation | Status | +|------|--------|------------|--------| +| Impress 2020 auth breaks | HIGH - Users can't log in via I2020 | Block deployment until I2020 retired | ⚠️ BLOCKING | +| Data copy fails mid-migration | HIGH - Incomplete auth data | Wrapped in transaction, can rollback | ✅ Mitigated | +| Production traffic during copy | MEDIUM - Stale data | Write lock prevents changes | ✅ Mitigated | +| Schema mismatch between DBs | MEDIUM - Migration fails | Migration matches exact schema | ✅ Mitigated | +| Indexes not created | MEDIUM - Slow queries | Verification step checks indexes | ✅ Mitigated | +| Login tracking data loss | LOW - Missing login stats | Acceptable trade-off | ✅ Accepted | + +--- + +## Success Criteria + +- [ ] All existing users can log in +- [ ] New user registration works +- [ ] Settings updates work +- [ ] NeoPass connection/disconnection works +- [ ] No errors in production logs +- [ ] Query performance unchanged +- [ ] Database row counts match +- [ ] All auth_users indexes present + +--- + +## Timeline Estimate + +**Total time:** 30-60 minutes (after Impress 2020 retired) + +- Phase 1 deployment: 5 min +- Phase 2 data copy: 5-10 min (depending on user count) +- Phase 3 deployment + testing: 15-30 min +- Phase 4 deployment: 5 min +- Phase 5 teardown: 7+ days later, 10 min + +--- + +## Questions Before Proceeding + +1. **Is Impress 2020 fully retired?** If not, STOP. +2. Do we have recent database backups? (< 24 hours old) +3. Do we have a maintenance window scheduled? +4. Have we announced the maintenance to users? +5. Do we have rollback access ready? + +--- + +**Last Updated:** November 2025 +**Status:** Blocked on Impress 2020 retirement +**Branch:** `feature/consolidate-auth-database`