Service crew

Google previews the BigLake Data Lakehouse service • The Register

Google announced a preview on Google Cloud of BigLake, a data lake storage service that it claims can remove data caps by combining data lakes and data warehouses.

BigLake is designed to solve the problems associated with the growing volumes of data and different types of data currently stored and maintained by organizations of all sizes. The motivation for storing all this data can often come down to “because it may prove useful”, the idea being that if analyzed using the right tools, it will yield valuable insights that will benefit the ‘business.

Unveiled to coincide with Google’s Data Cloud Summit, BigLake enables organizations to unify their data warehouses and data lakes to analyze data without worrying about the underlying storage layer. This eliminates the need to duplicate or move data from its source to another location for processing and reduces costs and inefficiencies, Google said.

According to Google, traditional data architectures are unable to unlock the full potential of all stored data, while managing them across disparate data lakes and data warehouses creates silos and increases risk and cost for organizations. A data lake is basically a large collection of data that has been stored and can be a mixture of structured and unstructured formats, while a data warehouse is generally considered a repository of structured and filtered data.

Google said BigLake leverages experience gained from years of development with its BigQuery tool used to access data lakes on Google Cloud Storage to enable what it calls an “open lakehouse” architecture.

This concept of a data “lakehouse” was pioneered in recent years by Snowflake or Databricks, depending on who you believe, and refers to a single platform capable of supporting all of an organization’s data workloads.

BigLake provides users with granular access controls, support for open file formats like Parquet, an open-source column-oriented storage format designed for analytical queries, and open-source processing engines like Apache Spark.

Another new data-related feature announced by Google is Spanner change feeds, which it says allow users to track changes in their Spanner database in real time in order to unlock new value. Spanner is Google’s distributed SQL database management and storage service, and the new feature tracks Spanner inserts, updates, and deletes in real time across a customer’s Spanner database.

bar in a nightclub

MongoDB Goes Crazy With Marketing Budget Movie Mania: Yes, It’s Choose Your Own Adventure Hackers With Drop-Down Menus


This allows users to ensure that the most recent data updates are available for replication from Spanner to BigQuery for real-time analytics, or for other purposes such as triggering application behavior by downstream using Pub/Sub.

Google also announced that Vertex AI Workbench is now generally available for its Vertex AI machine learning platform. This brings data and machine learning tools together in a single environment so users have access to a common set of tools across data analytics, data science, and machine learning.

According to Google, Vertex AI Workbench allows teams to build, train and deploy machine learning models five times faster than with traditional AI laptops. ®