Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Ethereum Blockchain dataset now available in BigQuery for smart contract analytics

Save for later
  • 3 min read
  • 03 Sep 2018

article-image

Google made the Bitcoin dataset publicly available for analysis in Google BigQuery in February, this year. On the same lines, it announced Ethereum dataset availability in BigQuery, recently, on August 29th for smart contract analytics.

Ethereum blockchain is considered as an immutable distributed ledger similar to its predecessor, Bitcoin. However, Vitalik Buterin, Ethereum’s creator, extended Ethereum’s set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

Need for Ethereum blockchain data availability on Google Cloud


Ethereum blockchain peer-to-peer software has an API for a subset of commonly used random-access functions, for instance, checking transaction status, looking up wallet-transaction associations, and checking wallet balances.

API endpoints neither exist for easy access to the data stored on-chain, nor for viewing the blockchain data in aggregate.  Given below is an example chart showing the total Ether transferred, and average transaction cost, aggregated by day:

ethereum-blockchain-dataset-now-available-in-bigquery-for-smart-contract-analytics-img-0

Source: Google


Such a visualization, underpinned with a database query aids in making business decisions, such as prioritizing improvements to the Ethereum architecture itself to balance sheet adjustments.

BigQuery has strong OLAP capabilities to support such an analysis during ad-hoc and in general situations. Also, this does not require additional API implementation.

Accordingly, Google built a software system on Google Cloud that:

  • Synchronizes the Ethereum blockchain to computers running Parity in Google Cloud.
  • Performs a daily extraction of data from the Ethereum blockchain ledger, including the results of smart contract transactions, such as token transfers.
  • De-normalizes and stores date-partitioned data to BigQuery for easy and cost-effective exploration.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at R$50/month. Cancel anytime


Google has also demonstrated a number of interesting queries and visualizations based on the Ethereum dataset. The analysis focus on three topics:

  1. Smart contract function calls
  2. On-chain transaction time-series and transaction networks
  3. Smart contract function analytics


The Ethereum blockchain dataset is also available on Kaggle. You can query the live data in Kernels, Kaggle’s no charge in-browser coding environment, using the BigQuery Python client library. The Ethereum ETL project on GitHub contains all source code used to extract data from the Ethereum blockchain and load it into BigQuery.

Read more about this news in detail on Google Cloud blog.

Vitalik Buterin’s new consensus algorithm to make Ethereum 99% fault tolerant

How to set up an Ethereum development environment [Tutorial]

Everything you need to know about Ethereum