docs: add deployment checklist for database consolidation

IMPORTANT: This migration is BLOCKED until Impress 2020 is retired.

Created comprehensive deployment guide documenting:
- Why this migration is blocked (Impress 2020 uses openneo_id directly)
- Two paths forward: retire Impress 2020 (recommended) or coordinated update
- Complete step-by-step deployment checklist for when ready
- Rollback procedures
- Risk assessment and mitigations
- Success criteria and timeline estimates

This ensures we don't accidentally deploy this change before addressing
the Impress 2020 dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Emi Matchu 2025-11-02 07:07:57 +00:00
parent 9ba94f9f4b
commit f311c92dbb

View file

@ -0,0 +1,287 @@
# Database Consolidation Deployment Guide
This document outlines the plan and checklist for consolidating the `openneo_id` database into the main `openneo_impress` database.
## Current Status: BLOCKED
**This migration cannot be deployed until Impress 2020 is retired.**
## The Problem
While the main DTI Rails app is ready to move to a single-database architecture, **Impress 2020 still directly accesses both databases**:
- `openneo_impress` - For reading item, pet, and outfit data
- `openneo_id` - For user authentication via GraphQL
If we consolidate the databases now, Impress 2020's authentication will break immediately, causing login failures for users accessing DTI through the Impress 2020 GraphQL API.
## Path Forward
There are two options to unblock this migration:
### Option A: Retire Impress 2020 First (Recommended)
1. Complete the migration of remaining Impress 2020 dependencies back to the main Rails app
- See `docs/impress-2020-dependencies.md` for current status
- Primary remaining dependencies: GraphQL API for outfit data, image generation service
2. Spin down the Impress 2020 service entirely
3. Execute the database consolidation (steps below)
### Option B: Coordinated Update (Complex)
1. Update Impress 2020 to point to `openneo_impress.auth_users` instead of `openneo_id.users`
2. Deploy both applications simultaneously during a maintenance window
3. Execute the database consolidation
**Recommendation:** Option A is simpler and aligns with our long-term goal of fully consolidating back into the Rails monolith.
---
## Deployment Checklist (When Ready)
⚠️ **DO NOT EXECUTE UNTIL IMPRESS 2020 IS RETIRED**
### Prerequisites
- [ ] Impress 2020 service is spun down and no longer accessing databases
- [ ] All Impress 2020 dependencies have been migrated to main Rails app
- [ ] Database backups are current and tested
- [ ] Maintenance window scheduled (estimate: 30-60 minutes)
### Phase 1: Deploy Write Lock
**Branch:** `feature/consolidate-auth-database` @ commit `604a8667`
**Purpose:** Prevent writes to AuthUser table while keeping login/logout functional.
**Steps:**
1. Deploy Phase 1 to production
2. Verify:
- [ ] Existing users can log in
- [ ] Existing users can log out
- [ ] Registration shows maintenance message
- [ ] Settings updates show maintenance message
- [ ] NeoPass connection shows maintenance message
**Expected Downtime:** None (read-only mode for account changes only)
### Phase 2: Copy Data
**Purpose:** Copy auth data from `openneo_id` to `openneo_impress` while table is stable.
**Steps:**
1. **Backup openneo_id database:**
```bash
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_backup_$(date +%Y%m%d_%H%M%S).sql
```
2. **Verify backup:**
```bash
# Check file size is reasonable
ls -lh openneo_id_backup_*.sql
# Spot-check contents
head -n 50 openneo_id_backup_*.sql
```
3. **Run the migration:**
```bash
cd /var/www/impress
bundle exec rails db:migrate
```
4. **Verify data copy:**
```sql
-- Connect to MySQL
mysql -h [host] -u [user] -p
-- Check row counts match
SELECT COUNT(*) AS openneo_id_count FROM openneo_id.users;
SELECT COUNT(*) AS auth_users_count FROM openneo_impress.auth_users;
-- Spot-check a few records
SELECT id, name, email FROM openneo_id.users LIMIT 5;
SELECT id, name, email FROM openneo_impress.auth_users WHERE id IN (1, 2, 3, 4, 5);
-- Verify indexes were created
SHOW INDEX FROM openneo_impress.auth_users;
```
5. **Verify results:**
- [ ] Row counts match exactly
- [ ] Sample records match (IDs, names, emails)
- [ ] All 4 indexes created (email, provider+uid, reset_password_token, unlock_token)
**Expected Downtime:** None (still in write-lock mode)
### Phase 3: Switch to New Table
**Branch:** `feature/consolidate-auth-database` @ commit `2c21269a`
**Purpose:** Point AuthUser at consolidated table, restore full functionality.
**Steps:**
1. Deploy Phase 2 to production
2. **Immediately test critical paths:**
- [ ] Login with existing account
- [ ] Logout
- [ ] Register new account
- [ ] Update account settings (email, password)
- [ ] Connect NeoPass (if available)
- [ ] Disconnect NeoPass (if available)
3. **Monitor error logs:**
```bash
tail -f /var/www/impress/log/production.log | grep -i error
```
4. **Verify database queries are using auth_users:**
```bash
# Check recent queries in logs
grep "auth_users" /var/www/impress/log/production.log | tail -n 20
# Should see SELECT/INSERT/UPDATE on auth_users, NOT openneo_id.users
```
**Expected Downtime:** Brief (< 1 minute for deployment)
**Rollback Plan:** If critical issues found, revert to Phase 1 commit and restore openneo_id from backup.
### Phase 4: Documentation Update
**Branch:** `feature/consolidate-auth-database` @ commit `9ba94f9f`
**Purpose:** Update documentation to reflect single-database architecture.
**Steps:**
1. Deploy Phase 3 to production
2. Verify no errors
**Expected Downtime:** None
### Phase 5: Database Teardown
**Purpose:** Remove the now-unused `openneo_id` database.
**Steps:**
1. **Wait 7 days** to ensure no issues found in production
2. **Final backup:**
```bash
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_final_backup_$(date +%Y%m%d_%H%M%S).sql
```
3. **Store backup offsite:**
- Upload to secure backup storage
- Keep for at least 90 days
4. **Drop the database:**
```sql
DROP DATABASE openneo_id;
```
5. **Remove environment variable:**
- Delete `DATABASE_URL_OPENNEO_ID` from production environment config
- Restart app to ensure it doesn't try to connect
6. **Update MySQL users:**
```sql
-- Remove openneo_id privileges from users
-- (Already done in deploy/setup.yml for new deployments)
```
**Expected Downtime:** None
---
## Rollback Procedures
### If Issues Found After Phase 3
1. **Immediate rollback:**
```bash
# Revert to Phase 1 commit
git checkout 604a8667
bundle exec rails db:migrate:down VERSION=20251102064247
# Deploy
```
2. **Restore openneo_id (if needed):**
```bash
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
```
3. **Investigate issues before reattempting**
### If Data Corruption Detected
1. **Immediately restore from backup:**
```bash
# Drop corrupted auth_users table
mysql -h [host] -u [user] -p -e "DROP TABLE openneo_impress.auth_users;"
# Restore openneo_id if needed
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
```
2. **Revert to pre-migration code**
3. **Review migration SQL before reattempting**
---
## Key Risks & Mitigations
| Risk | Impact | Mitigation | Status |
|------|--------|------------|--------|
| Impress 2020 auth breaks | HIGH - Users can't log in via I2020 | Block deployment until I2020 retired | ⚠️ BLOCKING |
| Data copy fails mid-migration | HIGH - Incomplete auth data | Wrapped in transaction, can rollback | ✅ Mitigated |
| Production traffic during copy | MEDIUM - Stale data | Write lock prevents changes | ✅ Mitigated |
| Schema mismatch between DBs | MEDIUM - Migration fails | Migration matches exact schema | ✅ Mitigated |
| Indexes not created | MEDIUM - Slow queries | Verification step checks indexes | ✅ Mitigated |
| Login tracking data loss | LOW - Missing login stats | Acceptable trade-off | ✅ Accepted |
---
## Success Criteria
- [ ] All existing users can log in
- [ ] New user registration works
- [ ] Settings updates work
- [ ] NeoPass connection/disconnection works
- [ ] No errors in production logs
- [ ] Query performance unchanged
- [ ] Database row counts match
- [ ] All auth_users indexes present
---
## Timeline Estimate
**Total time:** 30-60 minutes (after Impress 2020 retired)
- Phase 1 deployment: 5 min
- Phase 2 data copy: 5-10 min (depending on user count)
- Phase 3 deployment + testing: 15-30 min
- Phase 4 deployment: 5 min
- Phase 5 teardown: 7+ days later, 10 min
---
## Questions Before Proceeding
1. **Is Impress 2020 fully retired?** If not, STOP.
2. Do we have recent database backups? (< 24 hours old)
3. Do we have a maintenance window scheduled?
4. Have we announced the maintenance to users?
5. Do we have rollback access ready?
---
**Last Updated:** November 2025
**Status:** Blocked on Impress 2020 retirement
**Branch:** `feature/consolidate-auth-database`