In an earlier post I have talked about the usefulness of NumPy’s masked arrays. Now here’s a simple implementation, which I put together after getting a bit unstuck while using them myself.

**Level 3 Global mapped SMOS Ocean Salinity** dataset accessed from the Centre Aval de Traitement des Donnees SMOS (CATDS).

*10 day composite SMOS ocean surface salinity (11/11/2011 - 20/11/2011)*

## The task

**To convert this data from NETCDF with integer values into a scaled GeoTIFF with real salinity values for easy use and visualisation in a desktop GIS.** Data centres will store grids of integer values to cut down on data storage volume. So a simple linear scaling can be applied to convert them into their true float values.

However, the areas of land and sea ice are classed as no data with a specific value of -32767 which needs to be ignored in the calculation with the use of a masked array. Sounds simple, but this is where I encountered…

## … The problem

**The masked values turn into zeros after performing the linear scaling.** So my resultant GeoTIFF had values of zero rather than -32767 in areas of land and sea ice.

## The solution

**Convert the input integer array to a float array before creating the masked array.** Here’s the function I wrote. Firstly I converted my data into a NumPy array in my usual way, taking acount of projection and transform information. The key is then to change the input array to data type float before creating a masked array.

```
import numpy as np
import osgeo.gdal as gdal
import osgeo.osr as osr
def smos_os_convert(infile,outfile):
"""
Converts SMOS ocean salinity L3 mapped products to scaled tifs
linear scaling equation: salinity = data * 0.001 + 30
"""
gdal.AllRegister()
nodata = -32767
slope = 0.001
inter = 30
proj = "+proj=cea +lon_0=0 +lat_ts=30 +x_0=0 +y_0=0 \
+a=6371228 +b=6371228 +units=m +no_defs"
outproj = osr.SpatialReference()
outproj.ImportFromProj4(proj)
outgeo = [-17334193.9437, 50171.32835, 0, 7338939.4596, 0.0, -50171.32835]
os = gdal.Open(infile)
osarr = os.ReadAsArray()
[cols, rows] = osarr.shape
# Here is where I define the arrays data type as float.
osarr = np.array(osarr, dtype=np.float32)
# Then you can create the masked array...
m = np.ma.masked_values(osarr, nodata)
# ... and very quickly apply the linear scaling, ignoring the no data pixels.
scaled = m * slope + inter
# And finally save out the GeoTIF
outdata = gdal.GetDriverByName("GTiff")
dst_ds = outdata.Create(outfile, rows, cols, 1, gdal.GDT_Float64)
band = dst_ds.GetRasterBand(1)
band.SetNoDataValue(nodata)
dst_ds.SetGeoTransform(outgeo)
dst_ds.SetProjection(outproj.ExportToWkt())
band.WriteArray(scaled)
```

## To illustrate this further

First we convert the GDAL dataset to a NumPy array. `os`

is the GDAL dataset.

```
>>> osarr = os.ReadAsArray()
>>> osarr
array([[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
...,
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767]], dtype=int16)
```

Then we create the masked array.

```
>>> m = np.ma.masked_values(osarr, nodata)
>>> m
masked_array(data =
[[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]
...,
[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]
[-- -- -- ..., -- -- --]],
mask =
[[ True True True ..., True True True]
[ True True True ..., True True True]
[ True True True ..., True True True]
...,
[ True True True ..., True True True]
[ True True True ..., True True True]
[ True True True ..., True True True]],
fill_value = -32767)
>>> m.data
array([[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
...,
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767],
[-32767, -32767, -32767, ..., -32767, -32767, -32767]], dtype=int16)
```

And perform the linear scaling.

`>>> scaled = m * slope + inter`

Now we look at the output data, and see that all the no data values have turned into zeros, even though they should have been ignored in the calculation…

```
>>> scaled.data
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
```

… even though areas of real data do scale (picking a pixel in the sea).

```
>>> scaled.data[100,100]
34.664999999999999
```

So instead we add this line to define the array as datatype float.

```
>>> osarr = np.array(osarr, dtype=np.float32)
>>> m = np.ma.masked_values(osarr,nodata)
>>> m.data
array([[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
...,
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.]], dtype=float32)
>>> scaled = m * slope + inter
```

And now the scaled output no data values remain as they should

```
>>> scaled.data
array([[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
...,
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.],
[-32767., -32767., -32767., ..., -32767., -32767., -32767.]], dtype=float32)
>>> scaled.mask
array([[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True]], dtype=bool)
```

So when we write this to file and we get a dataset with all areas of land remaining with the no data value of -32767.