World first: A Netflix series successfully stored in DNA

DNA data storage
DNA data storage
Image licensed to author

The convergence of technology with biological sciences is not a new phenomenon; there have been numerous examples in recent history where technology has played a pivotal role in driving fundamental breakthroughs in biological science.

Three notable advancements come to mind:

The brainchild of Tesla founder, Elon Musk, the Neural Link is a biomedical device that is inserted into the human skull and can communicate with the brain.

The initial goal of our technology will be to help people with paralysis to regain independence through the control of computers and mobile devices.
NeuralLink.com

If you’ve not seen it, the live demonstration…


From BigQuery truncate table to dynamic SQL support; we cover 12 user-friendly BigQuery functions released by Google Cloud.

Image licensed to author
Image licensed to author
Image licensed to author

We were excited to see the power of BigQuery receive a further boost this week with the release of 12 new BigQuery SQL features. Google Cloud describes these as “user-friendly SQL capabilities”. So, let’s take a look at what’s now possible.

1. Add table columns via DDL

New to BigQuery is the ability to add new columns via the ALTER TABLE DDL statement. This is something data professionals with a background in traditional on-prem database platforms would expect as standard, so nice to see that Google Cloud has acknowledged this.

alter table mydataset.mytable
add column a string,
add column if not exists b geography,
add column c array<numeric>,
add column…


From SQL Server to Google BigQuery: Learn why I love common table expressions.

CTE SQL
CTE SQL
Image licensed to author

This is my 20th year working with SQL (eek!). To mark this milestone, I recently wrote an article on the 10 SQL naming standards I live by, which I feel, produce SQL that is easy to read, debug, and maintain.

Following this theme, I thought I’d explore some of the other elements of the SQL language that I feel, help differentiate great SQL, from simply functional SQL.

One element at the top of my list is the common table expression, or CTE for short. …


I explore BigQuery’s new UI, just released in preview

BigQuery new UI 2021
BigQuery new UI 2021
Image licensed to author

This week marked an exciting week for users of Google BigQuery; a much-anticipated UI update was made available in public preview.

Having used the new UI in anger for the past week, I thought I’d share my top 10 likes on the new look.

Readers should note that the new UI is in preview, and therefore likely to further improve before being officially released to (most likely) beta.

So, in no particular order, let’s begin.

#1 new layout, new panels

So as a reminder, here’s what the previous UI looked like; here I have run a simple query against a copy I made of the…


SQL tuning tips and advice to help reduce BigQuery execution time and costs. Start 2021 off on the right foot!

BigQuery performance tips
BigQuery performance tips
Image licensed to author

Google BigQuery, like other modern hyper-scale data platforms, has a different architecture to what many data professionals and data scientists are used to; it stores its data in columns instead of rows (referred to as a column-store), and processes SQL queries in a fully distributed architecture.

It is these unique properties that enable the platform to achieve such high levels of query performance.

If, like me, you came to BigQuery from an MS SQL background, a lot of what I knew in regards to writing efficient SQL queries, actually no longer applied. …


Make a NY resolution for 2021: Easy to read, and maintainable SQL

SQL naming standards
SQL naming standards
Image licensed to author

This is my 20th year working with SQL (eek!), and I’ve shared my 10 key learnings to help make sure your SQL code is easy to read, debug, and maintain.

The key to success is, of course, to enforce these (or your own standards) across your Enterprise. Tip 10 discusses how you can do this.

So, in no particular order, let’s begin.

#1 Choose a CASE and Stick to_it

If I was given £1 every time I saw something like this, I think I’d be sitting on a tidy sum:

select first employee_first_name,
surname employee_last_name…


We use a common example: un-nesting Firebase event data to facilitate data science analysis

BigQuery UNNEST
BigQuery UNNEST
Image licensed to author

We frequently apply machine learning techniques to event-based data in order to generate high-value insights; from predicting customer churn using people’s recent activity of say, a subscription-based product, to scoring how engaged a user is on a piece of content for feeding into a recommendation engine.

Event-based data is a good example of semi-structured-data.

Structured data is data that adheres to a rigid tabular format. This makes it ideal for storing in say, a database table or a spreadsheet.

Semi-structured data, however, has variations in its structure; attributes are not fixed(equivalent to columns in a database table), and, to further…


We train an AutoML image classification model for Kaggle’s latest competition. See how it ranks against human Data Scientists.

Image licensed to author

Kaggle recently (end Nov 2020) released a new data science competition, centered around identifying deseases on the Cassava plant — a root vegetable widely farmed in Africa.

“As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.”
From Kaggle.com Cassava Leaf Desease Classification

The…


We show you how in easy to follow steps

Importing Kaggle data into Google Cloud Storage
Importing Kaggle data into Google Cloud Storage
Image licensed to author

So, you have picked your Kaggle competition, and you want to start training your model and make yourself known on the Kaggle leaderboard.

If like us, you use Google Cloud AI Platform for your data science workloads, one of the first steps in a Kaggle competition is to upload the Kaggle training data into Google Cloud Storage.

We will be using this recently announced (late Nov 2020) Kaggle competition as an example.

This particular competition is an image classification problem with circa 20k training images (jpeg files) in Kaggle. …


In the third article of the series, we explore Google’s (human) Data Labelling Service for Advanced Video Labelling

Google Cloud AI Platform Data labeling Service
Google Cloud AI Platform Data labeling Service
(Photo by Griffin Wooldridge on Unsplash)

In Part 1, we explored how Google’s (human) Data Labelling Service could assist in image labelling.

In this article, we will be exploring how this service can assist with video labelling. We include a hands-on example for you to follow along.

Google Cloud AI Platform

Google AI Platform is a suite of services on Google Cloud specifically targeted at the building, deploying, and managing of machine learning models in the cloud.

If you are not familiar with Google AI Platform, you may want to read our first article in the series, where we present an overview of what’s available on the platform.

What do we mean by labelling?

Labelling is…

James Green

Head of Data, Analytics & AI @ Ancoris. Part of a highly committed team of data scientists, mathematicians & engineers delivering Google Cloud client solutions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store