The convergence of technology with biological sciences is not a new phenomenon; there have been numerous examples in recent history where technology has played a pivotal role in driving fundamental breakthroughs in biological science.
Three notable advancements come to mind:
The brainchild of Tesla founder, Elon Musk, the Neural Link is a biomedical device that is inserted into the human skull and can communicate with the brain.
The initial goal of our technology will be to help people with paralysis to regain independence through the control of computers and mobile devices.
If you’ve not seen it, the live demonstration with a NeuralLink device inserted into a pig (they assure viewers how important they take the welfare of these animals), is a fascinating, if not slightly unnerving watch. …
We were excited to see the power of BigQuery receive a further boost this week with the release of 12 new BigQuery SQL features. Google Cloud describes these as “user-friendly SQL capabilities”. So, let’s take a look at what’s now possible.
New to BigQuery is the ability to add new columns via the ALTER TABLE DDL statement. This is something data professionals with a background in traditional on-prem database platforms would expect as standard, so nice to see that Google Cloud has acknowledged this.
alter table mydataset.mytable
add column a string,
add column if not exists b geography,
add column c array<numeric>,
add column d date options(description="my…
This is my 20th year working with SQL (eek!). To mark this milestone, I recently wrote an article on the 10 SQL naming standards I live by, which I feel, produce SQL that is easy to read, debug, and maintain.
Following this theme, I thought I’d explore some of the other elements of the SQL language that I feel, help differentiate great SQL, from simply functional SQL.
One element at the top of my list is the common table expression, or CTE for short. …
This week marked an exciting week for users of Google BigQuery; a much-anticipated UI update was made available in public preview.
Having used the new UI in anger for the past week, I thought I’d share my top 10 likes on the new look.
Readers should note that the new UI is in preview, and therefore likely to further improve before being officially released to (most likely) beta.
So, in no particular order, let’s begin.
So as a reminder, here’s what the previous UI looked like; here I have run a simple query against a copy I made of the London Cycle Hire Scheme (a public dataset available from Google). …
Google BigQuery, like other modern hyper-scale data platforms, has a different architecture to what many data professionals and data scientists are used to; it stores its data in columns instead of rows (referred to as a column-store), and processes SQL queries in a fully distributed architecture.
It is these unique properties that enable the platform to achieve such high levels of query performance.
If, like me, you came to BigQuery from an MS SQL background, a lot of what I knew in regards to writing efficient SQL queries, actually no longer applied. …
This is my 20th year working with SQL (eek!), and I’ve shared my 10 key learnings to help make sure your SQL code is easy to read, debug, and maintain.
The key to success is, of course, to enforce these (or your own standards) across your Enterprise. Tip 10 discusses how you can do this.
So, in no particular order, let’s begin.
If I was given £1 every time I saw something like this, I think I’d be sitting on a tidy sum:
select first employee_first_name,
CASE WHEN employment = 1 THEN 'FT' WHEN employment = 2 THEN 'PT' ELSE 'T' END
'Y' AS isValid
We frequently apply machine learning techniques to event-based data in order to generate high-value insights; from predicting customer churn using people’s recent activity of say, a subscription-based product, to scoring how engaged a user is on a piece of content for feeding into a recommendation engine.
Event-based data is a good example of semi-structured-data.
Structured data is data that adheres to a rigid tabular format. This makes it ideal for storing in say, a database table or a spreadsheet.
Semi-structured data, however, has variations in its structure; attributes are not fixed(equivalent to columns in a database table), and, to further complicate things, the data itself can be nested. …
Kaggle recently (end Nov 2020) released a new data science competition, centered around identifying deseases on the Cassava plant — a root vegetable widely farmed in Africa.
“As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.”
From Kaggle.com Cassava Leaf Desease Classification
The challenge — train a multi-label image classification model to classify images of the Cassava plant to one of five…
So, you have picked your Kaggle competition, and you want to start training your model and make yourself known on the Kaggle leaderboard.
If like us, you use Google Cloud AI Platform for your data science workloads, one of the first steps in a Kaggle competition is to upload the Kaggle training data into Google Cloud Storage.
We will be using this recently announced (late Nov 2020) Kaggle competition as an example.
This particular competition is an image classification problem with circa 20k training images (jpeg files) in Kaggle. …
In Part 1, we explored how Google’s (human) Data Labelling Service could assist in image labelling.
In this article, we will be exploring how this service can assist with video labelling. We include a hands-on example for you to follow along.
Google AI Platform is a suite of services on Google Cloud specifically targeted at the building, deploying, and managing of machine learning models in the cloud.
If you are not familiar with Google AI Platform, you may want to read our first article in the series, where we present an overview of what’s available on the platform.
Labelling is a data science activity to support the training of supervised machine learning models. The term supervised is a direct reference to how these models rely on accurately labelled training data in order for them to learn. …