Aws lakehouse

4/29/2023

S3 is the common object store that backstops data in AWS, but the company offers more than 15 databases for specific applications. “Things that are best-of-breed for a particular purpose allow customers to get away from having to compromise on performance or functionality or scale for that use case.” “Anyone that says one tool is the answer to all of your problem is probably incorrect,” Pathak tells Datanami. Rahul Pathak, vice president of AWS Analytics, makes no bones about AWS’s best-of-breed approach.Īmazon advocates a best-of-breed approach to databases (Joe Techapanupreeda/Shutterstock) However, if there was one ding against AWS, in Gartner’s eyes, it was that the company offers a multitude of databases as part of its focus on “best-fit engineering.” While this ensures the highest level of performance, it puts an integration burden on customers, Gartner says. AWS is number one in the customer-count and revenue departments, and it has a better service record than other hyperscalers, even if Oracle, Google, and IBM outscored AWS on the completeness of vision axis. The $46-billion company is already a giant in the burgeoning market for cloud databases, and scored very highly in Gartner’s first-ever Magic Quadrant for Cloud Database Management Systems, which we told you about last week. The lakehouse could help boost AWS’s fortunes as companies ramp up the migration of big data systems to the cloud. That enabled other AWS services, such as Amazon Athena, to get access to the data.

Finally, it announced the capability to output the results of Redshift queries directly into S3 using Parquet. It also announced federated queries in Redshift, enabling users to point their Redshift SQL queries at data residing in other repositories, including PostgreSQL. With Spectrum, users can leave the data in Parquet, the efficient column-oriented data format that was popularized in Hadoop (Avro, ORC, and JSON can also be used). In late 2019, at its annual re:Invent conference, AWS launched Redshift Spectrum, a new service that enabled users to run Redshift queries directly on data residing in S3, thereby eliminating the need to move the data into the Redshift database and to store it in the optimized Redshift format. Athena Federated Query enables users to execute queries that touch on a wide range of data sources, including data sitting in S3 as well as relational and non-relational databases in AWS. One of the key elements of that lakehouse strategy is Amazon Athena, which is AWS’s version of the Presto SQL query engine. The company started talking about its lakehouse architecture about a year ago. Google Cloud and Databricks have been early practitioners of data lakehouses, which are designed to provide a force for centralization and to reduce the data integration challenges that crop up when one allows data silos to proliferate widely. It’s hard to pinpoint exactly when AWS began adopting the data lakehouse design paradigm, in which characteristics of a data warehouse are implemented atop a data lake (hence the merger of a “lake” with a “house”). If you think these two belief systems are mutually exclusive, then perhaps you should learn more about AWS’s data lakehouse strategy. AWS also wants to help unify your data to ensure that insights don’t fall between the cracks.

Amazon Web Services wants you to create data silos to ensure you get the best performance when processing data.

0 Comments

Aws lakehouse

Leave a Reply.

Author

Archives

Categories