參考答案
I led the infrastructure side of a significant migration project for an e-commerce platform from an on-premises data center to AWS. The platform consisted of several monolithic applications, a large PostgreSQL database, and various microservices. The complexity stemmed from the platform's 24/7 nature, requiring zero downtime, and its tight interdependencies. We started by conducting a thorough discovery phase, mapping out all application dependencies, network flows, and resource utilization using tools like CloudEndure for initial data collection and AWS Migration Hub. We categorized applications for re-host, re-platform, or re-factor strategies. The core e-commerce application, a critical monolithic Java application, was identified for re-platforming. We decided to containerize it with Docker and deploy it on Amazon ECS, fronted by an Application Load Balancer. The PostgreSQL database, which was over 10TB, was migrated to Amazon RDS for PostgreSQL. This required a multi-phase approach. First, we set up AWS Direct Connect for a stable, high-bandwidth link between our data center and AWS. Then, we used AWS DMS (Database Migration Service) for a continuous replication from the on-premises database to RDS. This allowed us to maintain data consistency during the cutover. I configured the DMS tasks, monitored replication lag, and ensured data integrity checks were in place. For the application migration, we used CloudEndure Migration for an initial lift-and-shift of the EC2 instances, creating exact replicas in AWS. Once those were stable, we began the re-platforming process, building new AMIs and ECS task definitions from scratch using Packer and Docker. We developed new CI/CD pipelines with GitLab CI to build and deploy these containerized applications to ECS. The cutover itself was meticulously planned. We performed several dry runs on staging environments, simulating the exact cutover steps. On the actual migration night, we switched DNS records to point to the new AWS environment after verifying all application services were healthy and performing optimally. We had rollback plans in place, but thankfully didn't need them. Post-migration, I focused on optimizing costs and performance, implementing auto-scaling for ECS services and rightsizing RDS instances based on actual usage. This project took about eight months, and its success significantly improved our platform's scalability, reliability, and agility.