Amazon DynamoDB, a serverless NoSQL database, has been a go-to resolution for over a million clients to construct low-latency and high-scale functions. As knowledge grows, organizations are continually in search of methods to extract precious insights from operational knowledge, which is commonly saved in DynamoDB. Nonetheless, to take advantage of this knowledge in Amazon DynamoDB for analytics and machine studying (ML) use instances, clients usually construct customized knowledge pipelines—a time-consuming infrastructure activity that provides little distinctive worth to their core enterprise.
Beginning at this time, you need to use Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse to run analytics and ML workloads in only a few clicks with out consuming your DynamoDB desk capability. Amazon SageMaker Lakehouse unifies all of your knowledge throughout Amazon S3 knowledge lakes and Amazon Redshift knowledge warehouses, serving to you construct highly effective analytics and AI/ML functions on a single copy of information.
Zero-ETL is a set of integrations that eliminates or minimizes the necessity to construct ETL knowledge pipelines. This zero-ETL integration reduces the complexity of engineering efforts required to construct and keep knowledge pipelines, benefiting customers working analytics and ML workloads on operational knowledge in Amazon DynamoDB with out impacting manufacturing workflows.
Let’s get began
For the next demo, I have to arrange zero-ETL integration for my knowledge in Amazon DynamoDB with an Amazon Easy Storage Service knowledge lake managed by Amazon SageMaker Lakehouse. Earlier than organising the zero-ETL integration, there are conditions to finish. If you wish to study extra on the way to arrange, check with this Amazon DynamoDB documentation web page.
With all of the conditions accomplished, I can get began with this integration. I navigate to the AWS Glue console and choose Zero-ETL integrations underneath Information Integration and ETL. Then, I select Create zero-ETL integration.
Right here, I’ve choices to pick my knowledge supply. I select Amazon DynamoDB and select Subsequent.
Subsequent, I have to configure the supply and goal particulars. Within the Supply particulars part, I choose my Amazon DynamoDB desk. Within the Goal particulars part, I specify the S3 bucket that I’ve arrange within the AWS Glue Information Catalog.
To arrange this integration, I would like an IAM position that grants AWS Glue the mandatory permissions. For steering on configuring IAM permissions, go to the Amazon DynamoDB documentation web page. Additionally, if I haven’t configured a useful resource coverage for my AWS Glue Information Catalog, I can choose Repair it for me to robotically add the required useful resource insurance policies.
Right here, I’ve choices to configure the output. Below Information partitioning, I can both use DynamoDB desk keys for partitioning or specify customized partition keys. After finishing the configuration, I select Subsequent.
As a result of I choose the Repair it for me checkbox, I have to overview the required adjustments and select Proceed earlier than I can proceed to the following step.
On the following web page, I’ve the flexibleness to configure knowledge encryption. I can use AWS Key Administration Service (AWS KMS) or a customized encryption key. Then, I assign a reputation to the mixing and select Subsequent.
On the final step, I have to overview the configurations. Once I’m joyful, I select Subsequent to create the zero-ETL integration.
After the preliminary knowledge ingestion completes, my zero-ETL integration will likely be prepared to be used. The completion time varies relying on the scale of my supply DynamoDB desk.
If I navigate to Tables underneath Information Catalog within the left navigation panel, I can observe extra particulars together with Schema. Below the hood, this zero-ETL integration makes use of Apache Iceberg to remodel associated to knowledge format and construction in my DynamoDB knowledge into Amazon S3.
Lastly, I can inform that every one my knowledge is accessible in my S3 bucket.Â
This zero-ETL integration considerably reduces the complexity and operational burden of information motion, and I can due to this fact concentrate on extracting insights reasonably than managing pipelines.
Out there now
This new zero-ETL functionality is accessible within the following AWS Areas: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Hong Kong, Singapore, Sydney, Tokyo), Europe (Frankfurt, Eire, Stockholm).
Discover the way to streamline your knowledge analytics workflows utilizing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse. Study extra the way to get began on the Amazon DynamoDB documentation web page.
Completely happy constructing!
— Donnie