INFRASTRUCTURE

Building a Serverless Invoice Generation System with AWS

Discover how to build a robust, scalable, and secure serverless invoice generation system using AWS services like API Gateway, Lambda, DynamoDB, SNS, SQS, and S3. Learn about secure HMAC authentication, efficient data storage, reliable message processing, and secure document retrieval. Perfect for developers looking to enhance their AWS skills!

May 22, 2024
Eat, Sleep, Code, Repeat

Hey there! Today, I want to take you on a journey through the process of building a serverless invoice generation system using SST Ion and AWS cloud services. This project was a fun mix of challenges and learning experiences, and I'm excited to share the architecture, the hurdles I faced, and how I overcame them. If you're into AWS, serverless architectures, or just curious about how to process invoices in the cloud, keep reading!

The Goal

The main goal was to create a robust, scalable, and secure system to generate invoices. The system needed to:

  1. Receive and authenticate requests

  2. Store the payload securely

  3. Generate a document from the payload

  4. Store the generated document

  5. Send notifications upon document generation

  6. Provide a way to fetch the document securely

The Architecture

To achieve this, I utilized several AWS services, blending API Gateway, Lambda, DynamoDB, SNS, SQS, and S3. Here’s a high-level overview of the architecture:

  1. API Gateway + Lambda: Handle incoming requests and authentication, fetch the generated document.

  2. AWS Secrets Manager: Manage and rotate API keys for authentication.

  3. DynamoDB: Store payloads and metadata.

  4. SNS: Publish and subscribe to events.

  5. SQS: Queue messages for asynchronous processing.

  6. S3: Store generated documents.

Now, let’s dive deeper into each component and the steps involved.

1. Receiving and Authenticating Requests

First off, I set up API Gateway to act as the front door for all incoming requests. The Lambda function behind it handled the request processing. Security was a top priority, so I used AWS Secrets Manager to rotate the API keys.

Challenge: Ensuring secure HMAC generation and verification.

To authenticate the requests, each incoming request needed to include an HMAC (Hash-based Message Authentication Code) of the API key and the payload. But what is HMAC, and why is it necessary?

HMAC Explained: HMAC is a specific type of message authentication code that involves a cryptographic hash function and a secret key. It provides both data integrity and authenticity. Even though the service is intended to be accessed only by internal services, HMAC adds a critical layer of security. It ensures that the data hasn't been tampered with and verifies that the sender is indeed who they claim to be. This is particularly important in preventing unauthorized access and ensuring that internal communications remain secure.

Here’s how it worked:

  • The client generated the HMAC using the API key and the payload.

  • This HMAC was sent along with the request to the API Gateway.

  • The Lambda function then generated its own HMAC using the same method and compared it with the one sent by the client.

Solution: Implementing a robust HMAC process in Lambda using AWS Secrets Manager to securely handle API keys. This ensured that only authenticated requests were processed, adding an extra layer of security to our system.

2. Storing the Payload

Once authenticated, the Lambda function took the payload and stored it in a DynamoDB table. DynamoDB was chosen for its scalability and low-latency data access.

Why DynamoDB? DynamoDB is a serverless database service, which means it automatically handles the underlying infrastructure, scaling, and maintenance. This is a huge benefit because it allows me to focus on the application logic rather than database administration. Additionally, I'm super familiar with DynamoDB, making it the first choice on my list of considerations. Its seamless integration with other AWS services and pay-per-use pricing model made it an ideal choice for this project.

Challenge: Efficiently storing and retrieving data in DynamoDB.

To tackle this, I designed the DynamoDB schema to optimize both read and write performance. This involved setting up appropriate primary keys and secondary indexes to ensure quick access to the data.

Solution: A well-thought-out DynamoDB schema that balanced performance with cost-efficiency, ensuring that the data was always available when needed.

3. Publishing to SNS

After storing the payload, the Lambda function published a message to an SNS (Simple Notification Service) topic. This message indicated that a new document was created. I set up an SQS (Simple Queue Service) queue subscribed to this SNS topic with a filter policy of REQUESTED.

Challenge: Managing message flow and ensuring reliable delivery.

Given the system was designed to handle approximately 25,000 requests per month, having an SQS queue for this might seem like overkill. However, considering AWS is very cost-effective, it didn't hurt to add this extra layer. The SNS and SQS combination provided a reliable messaging system. SNS handled the publishing of messages, while SQS ensured that these messages were processed even if there was a spike in activity.

Solution: Using SNS for real-time message publishing and SQS for reliable message processing, ensuring that no message was lost during high-load periods.

4. Generating the Document

The SQS queue directed the messages to another Lambda function responsible for document generation. Initially, I faced a significant challenge here. Most libraries that sold themselves as PDF generators or HTML to PDF converters were deprecated or archived by their maintainers. Using unmaintained libraries poses a high risk, as there is a huge chance you will stumble upon a problem and not find a solution nor anyone to help you.

Challenge: Finding a reliable, maintained library for document generation.

After extensive research, I realized that Puppeteer, a library widely adopted for headless browser automation, could not only view web pages but also generate them and convert them to PDF. Puppeteer is actively maintained and supported, making it a much safer choice. Here's why Puppeteer stood out:

  • Widely adopted and actively maintained: Ensuring a community and support.

  • Versatility: Capable of rendering web pages and converting them to PDF.

  • Simplicity: Achieves the desired result with just a few lines of code.

Solution: Implementing Puppeteer for document generation. This allowed me to render HTML templates and convert them into PDFs efficiently. Once the document was generated, it was stored in S3 (Simple Storage Service).

5. Sending Notifications through Webhooks

After storing the document in S3, the Lambda function updated the corresponding record in DynamoDB with the S3 object key. Then, it published another message to the SNS topic, this time with a filter policy of GENERATED.

Challenge: Ensuring timely updates and notifications.

Another SQS queue subscribed to this new filter policy, directed the message to a final Lambda function. This Lambda function was responsible for sending a webhook notification to the configured subscription endpoints for the given service key (API Key). If the subscriber failed to receive the message, the Lambda would throw an error, causing the message to go back to the SQS queue. Using exponential backoff through SQS, the Lambda would resend the message up to five times. If the message still didn't succeed, it would go to a Dead Letter Queue (DLQ).

Solution: The combination of SNS, SQS, and Lambda ensured a seamless flow of updates and notifications. This approach allowed real-time processing and response, while the DLQ mechanism provided a fallback for failed notifications, ensuring that no message was lost without further action.

Diagrama sem nome.drawio.png

6. Fetching the Document Securely

To provide a way for users to fetch the generated document, I created an optional endpoint using API Gateway and Lambda. This endpoint required the API Key and the document ID returned from the initial request. Upon receiving a valid request, the Lambda function generated a signed URL from S3.

Challenge: Ensuring secure and timely access to the document.

The signed URL had a very short living time to ensure security. It was designed to be called right when the user clicks on "download" and to immediately download the document. This minimized the risk of unauthorized access.

Solution: Implementing this endpoint allowed secure, temporary access to the document, ensuring that users could fetch their documents quickly and safely.

Diagrama sem nome.drawio (2).png

Wrapping Up

Building this system was an incredible journey, filled with both challenges and learning opportunities. By leveraging AWS's powerful services, I created a scalable, secure, and efficient invoice generation system. The key takeaways from this project include the importance of robust authentication, efficient data handling, reliable message processing, and secure document retrieval.

If you're considering building a serverless solution on AWS, I hope this walkthrough inspires and guides you. Feel free to reach out with any questions or comments—I’d love to hear about your own experiences and projects!

Happy coding!


If you enjoyed this article, please share it with your friends and colleagues. Stay tuned for more posts on serverless architectures and cloud computing!

Continue Reading
BlogAboutContact
© 2024 by Geraldo Silva