If you work in the Geospatial Machine Learning (or AI1) domain, you’ll know that embeddings are all the rage at the moment. I used the meme below in an internal presentation a few months ago, but I think it is relevant here:
But what exactly are embeddings?
What are embeddings?
There is a little bit of heterogeneity in embeddings, but effectively they are compressed representations of data that have been “embedded” into some feature space, usually by a learned model. This is not unique to geospatial data, and can be done for text, images, videos, graphs, or whatever data structure you can train a model for. You might have heard of RAG (retrieval augmented generation) in the language domain, where documents are embedded by a language model, and the embeddings are stored in some way (often a vector database) from which they can be retrieved via a semantic search. A search engines image search functionality often works in a similar fashion, but using image models.
For the geospatial case, embeddings are created by training a foundation model on large amounts of (usually) satellite data, and then embeddings are created for a large corpus of satellite data. Some examples are the embeddings created using the Clay Foundation model, the EarthIndex embeddings created by EarthGenome, the Tessera Embeddings, the Major TOM embeddings by ESA, and the ones I used today, the AlphaEarth Foundations dataset by Google (Deepmind).
There are some differences between all these embeddings: the sources of data are different (most use at least Sentinel-2, but often combine it with other data), and some do inference on image patches (e.g. 224x224 pixel patches of th earth), while some do it per pixel, i.e. release an embedding for every 10x10 meters on earth. The AlphaEarth dataset does the latter.
From here on, when I talk about embeddings I will generally mean “embeddings created using a geospatial foundation model on global satellite data”.
Why embeddings?
Once you have embeddings, there is a actually a lot of fun stuff you can do with them. This tutorial on the Google Earth Engine page goes through a couple examples, ranging from similarity search, to supervised and unsupervised classification.
Essentially, a lot of work goes into training these large foundation models, and doing inference with these models is not cheap either. See this Github issue I stumbled upon where the Clay team discuss what it would cost to do an inference run for the entire world:
| chip size (x10m/px) | cost unit (km^2/$/h/worker) | cost to run the world |
|---|---|---|
| 50x50 px | 1000 | $510K |
| 100x100 px | 4000 | $127K |
But once you have created the embeddings dataset, you have amortized the cost, and everything you do downstream can now benefit from the work that has been done in creating these embeddings. This allows you to cheaply do large scale calculations for a fraction of the cost of if you had to run the encoder every time.
30DayMapChallenge: Day 23 - Process
Being a geospatial company, we at LiveEO are doing the 30DayMapChallenge. I’ve wanted an excuse to play around with embeddings a little bit more, so I thought I’d pick day 23. The theme of today is to document the process of making the map, hence this blog post.
What I wanted to do is to see how I could use embeddings to extract some information that can be mapped. What I’m doing today is creating a change map of Berlin for the earliest and latest dates available in the AlphaEarth embeddings dataset, i.e. 2017 and 2024.
I did this by calculating (one minus) the cosine similarity between all pixels that overlap with Berlin for 2017 and 2024, and using that too see what has changed.
\[ \text{cosine similarity}(\mathbf{A}, \mathbf{B}) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \sqrt{\sum_{i=1}^{n} B_i^2}} \]
Conceptually, embeddings that are similar will have a high cosine similarity, and embeddings that are very different will have a low cosine similarity, so by doing this we can find the places where the embeddings between 2017 and 2024 have changed the most. As the AlphaEarth embeddings have been unit normed, this means that the formula above actually simplifies to just taking the dot product, which is pretty cheap to calculate.
Plotted, this looks like:
As you can see, the areas with high change stand out as bright spots, while the areas with low change stay dark.
A few highlights
Adlershof
I studied in Adlershof (it hosts the science campus of HU-Berlin), and during the time I was there a lot of construction happened. Looking at the map, you can clearly see a cluster of large changes in that area: