1. Overview
This time there is a need to migrate a quantitative financial trading system from GCP to AWS.
Green Sea Digital Trading Company is a company dedicated to providing safe and efficient digital asset trading services for global users. It manages and operates the blockchain, implements effective risk management strategies to safeguard user assets, and at the same time continuously innovates and optimizes trading systems and services to enhance user experience. It is committed to exploring the application of blockchain technology and strictly abides by international financial regulatory regulations to ensure the legitimacy and transparency of the trading platform.
1.1.Project Objectives & Pain Points
GCP hosted service migration: the customer's technical team uses GCP hosted service in depth, and when migrating to AWS, a professional technical service provider is needed to help the customer match to the corresponding AWS service, respectively, the service adaptation of GCP to AWS, migration feasibility study, migration method validation, etc.; due to the differences in the products of the two vendors, the system architecture and the deployment method are different, which leads to the migration and reconstruction of the The workload of migration and reconstruction is large. For example, ES migration, EKS rebuild, etc. We also hope that customers can have professional guidance to get started quickly when using AWS, reduce unnecessary risk of trial and error, and help to quickly complete the architectural design, migration, and deployment of the customer's quantitative financial system on AWS.
Continuous synchronization of customer's self-built digital warehouse with Aurora PostgreSQL: The customer's self-built GreenPlum 7 digital warehouse is compatible with Postgre SQL, and handles a large number of data tables that need to be used in business applications, i.e., they need to be synchronized with Postgre SQL databases for use, and synchronization of a large number of tables will be a cumbersome DMS task as well as a PG performance issue to deal with.
The tables needed for the Postgre SQL database are pulled from the counting warehouse to synchronize consistently. The customer's datasheet mentioned that there is 1 large table with tens of millions of rows of data, and dozens of smaller tables that need to be synchronized for other operations. We have tested and verified many tables DMS CDC, in fact, the DMS Instance and Aurora's IO and other performance requirements are higher, this part has been DMS best practices introduced to the customer, will be split into a number of DMS instances to synchronize different tables, respectively, to alleviate the pressure and improve fault tolerance.
We have previously recommended Redshift as an alternative to the client's self-built GP silo and PG database (but tested that Redshift does not support large numbers) so the client is still using the more familiar self-built GP silo.
1.2 Expected project results
The client will consider the program successful if the following key criteria are met:
No. | Descriptions | Measurements |
---|---|---|
1 | Cloud Store files and ES, PG databases migrated properly, data integrity verified | pass a test |
2 | The application's service functions are tested correctly | pass a test |
3 | EKS cluster elastic expansion completion time ≤ 10 min | pass a test |
4 | DMS can synchronize in real time to the core Aurora Postgre SQL database | pass a test |
5 | AWS cloud resources and application-related monitoring metrics mailbox alerts | pass a test |
6 | Cost control and clear billing | pass a test |
1.3. Proposed architecture
In order to meet the needs of customers, the overall business system deployment architecture as shown in the figure, will focus on the implementation of the following aspects:
Architecture Description
Based on AWS architecture design considerations:
Operations, security, high availability, performance, cost, sustainability.
1, we plan Landing Zone to help customers do a reasonable network hierarchy, the project is mainly for internal use, to do data processing analysis and modeling strategy training, the external network only put part of the NAT and the springboard machine, the main business is focused on the implementation of the intranet and do a good job of route planning and security groups and other strategies.
2、Multi-availability area to place resources to execute business, improve business security, as well as the expansion of the upper limit, but also to reduce the risk of a single point of failure
3, will utilize AWS IAM, CloudTrail and other services to assist in the management of customer permissions, operation records, account resources abnormal use of the situation and other issues
Monitor alarm notification design:
1, as customers need to understand the business load, need to assist in the configuration of CloudWatch monitoring alert information such as: EC2 state changes, Health events, memory, CPU utilization, etc.; for the core database PG load and DMS failure to provide customized alerts, the subsequent optimization of the way may be to the enterprise WeChat or Flying Book
2, it is recommended to use hosted Prometheus to collect EKS clusters, computing resources and other monitoring indicators combined with CloudWatch; customers have their own monitoring and alerting program, on-demand implementation (the architecture retains the recommended design of the project cycle for reasons that the customer decided to follow the first self-built containers to monitor the subsequent time will be tested and see the effect of the consideration of the use of aws hosted monitoring)
Core business EKS cluster design:
1, WorkerNode using multi-availability region deployment to ensure the availability of the business, in the model test will not be regularly enabled to quickly test the effect of bulk resources, designed to use Karpenter as a Kubernetes auto scaling tool to meet the business needs of the faster and more flexible way of expanding and shrinking capacity.
2. Core services are deployed in the EKS cluster;
3. design the cluster to expose Services using the ALB Controller's ingress;
4. Single EKS container cluster, the number of nodes capacity to set up 2-20 to maintain the basic business and control costs, according to the current GCP load estimated 16 servers;
5. Each EC2 is initially configured with 50G GP3 disks that can be expanded as needed, and EFS is considered as a way to centrally store the contents of scripts and other startup items and logs;
6. Deploy nodes in Private Subnet, prohibit the exposure of nodes in the public network to ensure the security and control of Nodes;
7. the security group of the node group allows access only from the EKS cluster and the ELB;
Difficult issues database migration and synchronization:
1. ES snapshots of GCP are imported into AWS Opensearch;
Plan to complete the migration using snapshot shooting export upload to s3, ES downtime can be pre-arranged for 1 or 2 days
If the migration is not completed, the subsequent incremental part of the need to consider doing real-time incremental migration will increase the complexity. Therefore, we need to evaluate and test several times in advance the time to execute 1TB snapshots under high load of ES in the current GCP, as well as the time to import to the aws side to ensure that the migration is as static as possible.
2、 Customers build their own warehouse multiple tables synchronized to the core database Postgre SQL (for details, refer to the customer's background pain points section)
1.4. Migration success delivery criteria
- The program meets the requirements;
- Migration documentation is complete and compliant;
- Application and data migration is fully completed;
- Functional compliance after relocation;
- Completion of the migration design and implementation of the agreed scope and 15 working days of stable operation after business switchover.
1.5. Delivery targets
- Architecture Optimization Report
- Project Incremental Protection
- AWS Cloud Architecture Description
- Account & IAM Permissions Best Practices
- Service Operation Validation File
2. Why Amazon Web Services
- Mature cross-border private line program and solution capability
- Scalable EC2 Developer for High Performance Computing Requirements
- Workspaces meets enterprise-level security requirements
4. Why choose Dayum
As an AWS APN Select Consulting Partner, Dayun understands the customer's current problems and needs in the course of continuous service, and starts to work with the customer to build a global R&D network and effective management tools on AWS. During this period, Dayun provided technical consulting to the customer, helped the customer to make up for the knowledge gap of public cloud network, and ensured that the customer's technical requirements were met with reasonable project planning and management to realize fast and high-quality delivery, and ultimately promote the successful implementation of the project.
As an MSP service provider, Dayun not only helps customers solve the last-mile cloud problem, but also continues to analyze and optimize the customer's IT system architecture from a professional point of view, and helps customers improve the system architecture, enhance the security of the system, and save the cost of using the cloud with faster service speed and complete and reasonable service process, so as to escort customers with a good cloud.