impress/docs/database-consolidation-deployment.md
Emi Matchu f311c92dbb docs: add deployment checklist for database consolidation
IMPORTANT: This migration is BLOCKED until Impress 2020 is retired.

Created comprehensive deployment guide documenting:
- Why this migration is blocked (Impress 2020 uses openneo_id directly)
- Two paths forward: retire Impress 2020 (recommended) or coordinated update
- Complete step-by-step deployment checklist for when ready
- Rollback procedures
- Risk assessment and mitigations
- Success criteria and timeline estimates

This ensures we don't accidentally deploy this change before addressing
the Impress 2020 dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-02 07:07:57 +00:00

287 lines
8.3 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Database Consolidation Deployment Guide
This document outlines the plan and checklist for consolidating the `openneo_id` database into the main `openneo_impress` database.
## Current Status: BLOCKED
**This migration cannot be deployed until Impress 2020 is retired.**
## The Problem
While the main DTI Rails app is ready to move to a single-database architecture, **Impress 2020 still directly accesses both databases**:
- `openneo_impress` - For reading item, pet, and outfit data
- `openneo_id` - For user authentication via GraphQL
If we consolidate the databases now, Impress 2020's authentication will break immediately, causing login failures for users accessing DTI through the Impress 2020 GraphQL API.
## Path Forward
There are two options to unblock this migration:
### Option A: Retire Impress 2020 First (Recommended)
1. Complete the migration of remaining Impress 2020 dependencies back to the main Rails app
- See `docs/impress-2020-dependencies.md` for current status
- Primary remaining dependencies: GraphQL API for outfit data, image generation service
2. Spin down the Impress 2020 service entirely
3. Execute the database consolidation (steps below)
### Option B: Coordinated Update (Complex)
1. Update Impress 2020 to point to `openneo_impress.auth_users` instead of `openneo_id.users`
2. Deploy both applications simultaneously during a maintenance window
3. Execute the database consolidation
**Recommendation:** Option A is simpler and aligns with our long-term goal of fully consolidating back into the Rails monolith.
---
## Deployment Checklist (When Ready)
⚠️ **DO NOT EXECUTE UNTIL IMPRESS 2020 IS RETIRED**
### Prerequisites
- [ ] Impress 2020 service is spun down and no longer accessing databases
- [ ] All Impress 2020 dependencies have been migrated to main Rails app
- [ ] Database backups are current and tested
- [ ] Maintenance window scheduled (estimate: 30-60 minutes)
### Phase 1: Deploy Write Lock
**Branch:** `feature/consolidate-auth-database` @ commit `604a8667`
**Purpose:** Prevent writes to AuthUser table while keeping login/logout functional.
**Steps:**
1. Deploy Phase 1 to production
2. Verify:
- [ ] Existing users can log in
- [ ] Existing users can log out
- [ ] Registration shows maintenance message
- [ ] Settings updates show maintenance message
- [ ] NeoPass connection shows maintenance message
**Expected Downtime:** None (read-only mode for account changes only)
### Phase 2: Copy Data
**Purpose:** Copy auth data from `openneo_id` to `openneo_impress` while table is stable.
**Steps:**
1. **Backup openneo_id database:**
```bash
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_backup_$(date +%Y%m%d_%H%M%S).sql
```
2. **Verify backup:**
```bash
# Check file size is reasonable
ls -lh openneo_id_backup_*.sql
# Spot-check contents
head -n 50 openneo_id_backup_*.sql
```
3. **Run the migration:**
```bash
cd /var/www/impress
bundle exec rails db:migrate
```
4. **Verify data copy:**
```sql
-- Connect to MySQL
mysql -h [host] -u [user] -p
-- Check row counts match
SELECT COUNT(*) AS openneo_id_count FROM openneo_id.users;
SELECT COUNT(*) AS auth_users_count FROM openneo_impress.auth_users;
-- Spot-check a few records
SELECT id, name, email FROM openneo_id.users LIMIT 5;
SELECT id, name, email FROM openneo_impress.auth_users WHERE id IN (1, 2, 3, 4, 5);
-- Verify indexes were created
SHOW INDEX FROM openneo_impress.auth_users;
```
5. **Verify results:**
- [ ] Row counts match exactly
- [ ] Sample records match (IDs, names, emails)
- [ ] All 4 indexes created (email, provider+uid, reset_password_token, unlock_token)
**Expected Downtime:** None (still in write-lock mode)
### Phase 3: Switch to New Table
**Branch:** `feature/consolidate-auth-database` @ commit `2c21269a`
**Purpose:** Point AuthUser at consolidated table, restore full functionality.
**Steps:**
1. Deploy Phase 2 to production
2. **Immediately test critical paths:**
- [ ] Login with existing account
- [ ] Logout
- [ ] Register new account
- [ ] Update account settings (email, password)
- [ ] Connect NeoPass (if available)
- [ ] Disconnect NeoPass (if available)
3. **Monitor error logs:**
```bash
tail -f /var/www/impress/log/production.log | grep -i error
```
4. **Verify database queries are using auth_users:**
```bash
# Check recent queries in logs
grep "auth_users" /var/www/impress/log/production.log | tail -n 20
# Should see SELECT/INSERT/UPDATE on auth_users, NOT openneo_id.users
```
**Expected Downtime:** Brief (< 1 minute for deployment)
**Rollback Plan:** If critical issues found, revert to Phase 1 commit and restore openneo_id from backup.
### Phase 4: Documentation Update
**Branch:** `feature/consolidate-auth-database` @ commit `9ba94f9f`
**Purpose:** Update documentation to reflect single-database architecture.
**Steps:**
1. Deploy Phase 3 to production
2. Verify no errors
**Expected Downtime:** None
### Phase 5: Database Teardown
**Purpose:** Remove the now-unused `openneo_id` database.
**Steps:**
1. **Wait 7 days** to ensure no issues found in production
2. **Final backup:**
```bash
mysqldump -h [host] -u [user] -p openneo_id > openneo_id_final_backup_$(date +%Y%m%d_%H%M%S).sql
```
3. **Store backup offsite:**
- Upload to secure backup storage
- Keep for at least 90 days
4. **Drop the database:**
```sql
DROP DATABASE openneo_id;
```
5. **Remove environment variable:**
- Delete `DATABASE_URL_OPENNEO_ID` from production environment config
- Restart app to ensure it doesn't try to connect
6. **Update MySQL users:**
```sql
-- Remove openneo_id privileges from users
-- (Already done in deploy/setup.yml for new deployments)
```
**Expected Downtime:** None
---
## Rollback Procedures
### If Issues Found After Phase 3
1. **Immediate rollback:**
```bash
# Revert to Phase 1 commit
git checkout 604a8667
bundle exec rails db:migrate:down VERSION=20251102064247
# Deploy
```
2. **Restore openneo_id (if needed):**
```bash
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
```
3. **Investigate issues before reattempting**
### If Data Corruption Detected
1. **Immediately restore from backup:**
```bash
# Drop corrupted auth_users table
mysql -h [host] -u [user] -p -e "DROP TABLE openneo_impress.auth_users;"
# Restore openneo_id if needed
mysql -h [host] -u [user] -p openneo_id < openneo_id_backup_[timestamp].sql
```
2. **Revert to pre-migration code**
3. **Review migration SQL before reattempting**
---
## Key Risks & Mitigations
| Risk | Impact | Mitigation | Status |
|------|--------|------------|--------|
| Impress 2020 auth breaks | HIGH - Users can't log in via I2020 | Block deployment until I2020 retired | BLOCKING |
| Data copy fails mid-migration | HIGH - Incomplete auth data | Wrapped in transaction, can rollback | Mitigated |
| Production traffic during copy | MEDIUM - Stale data | Write lock prevents changes | Mitigated |
| Schema mismatch between DBs | MEDIUM - Migration fails | Migration matches exact schema | Mitigated |
| Indexes not created | MEDIUM - Slow queries | Verification step checks indexes | Mitigated |
| Login tracking data loss | LOW - Missing login stats | Acceptable trade-off | Accepted |
---
## Success Criteria
- [ ] All existing users can log in
- [ ] New user registration works
- [ ] Settings updates work
- [ ] NeoPass connection/disconnection works
- [ ] No errors in production logs
- [ ] Query performance unchanged
- [ ] Database row counts match
- [ ] All auth_users indexes present
---
## Timeline Estimate
**Total time:** 30-60 minutes (after Impress 2020 retired)
- Phase 1 deployment: 5 min
- Phase 2 data copy: 5-10 min (depending on user count)
- Phase 3 deployment + testing: 15-30 min
- Phase 4 deployment: 5 min
- Phase 5 teardown: 7+ days later, 10 min
---
## Questions Before Proceeding
1. **Is Impress 2020 fully retired?** If not, STOP.
2. Do we have recent database backups? (< 24 hours old)
3. Do we have a maintenance window scheduled?
4. Have we announced the maintenance to users?
5. Do we have rollback access ready?
---
**Last Updated:** November 2025
**Status:** Blocked on Impress 2020 retirement
**Branch:** `feature/consolidate-auth-database`