Project Scenario & Architecture Diagram
In the connected world, it is imperative that the organizations be interlinked with the customers and vendors. This process has been very sluggish, manual, batch-based and prone to failures. Such integration design has lead to impaired decision-making and delay in detection of fraudulent actions. The objective is to create an automated, event-based real-time process that does not have these limitations. Data should flow rapidly from the source to the destination in addition to maintaining a data lake of structured and unstructured data.
Procedure Followed:
Architecture Implentation:
- The customer uploads the invoice data to S3 bucket in a text format as per their guidelines and policies. This bucket will have a policy to auto delete any content that is more than 1 day old (24 hours)..
- An event will trigger in the bucket that will place a message in SNS topic.
- A custom program running in EC2 will subscribe to the SNS topic and get the message placed by S3 event.
- The program will use S3 API to read from the bucket, parse the content of the file and create a CSV record along with saving the original record in DynamoDB.
- The program will use S3 API to write CSV record to destination S3 bucket as new S3 object.
Creation of source and target S3 buckets:
Creation of SNS topic:
SNS access policy configuration:
SNS notification setup for the source bucket:
EC2 instance creation for custom program:
Custom program configuration and upload:
Creation and verification of SNS subscription and generation of CSV file:
Creation of SNS subscription:
Generation of CSV file:
Table creation in DynamoDB: