Forum

Message Boards Message Boards

Back

Using rasterio and /vsis3 to read from S3

Using rasterio and /vsis3 to read from S3
rasterio s3
Answer
4 June 2019 10:37
Dear All,

I am trying to read from S3 using rasterio and the /vsis3 driver (part of GDAL, on which rasterio is based), using the following script:

123456789101112131415161718192021
import boto3
import rasterio
from rasterio.session import AWSSession
import numpy as np

BUCKET = 'DIAS'

KEY='Sentinel-2/MSI/L2A/2018/08/09/S2A_MSIL2A_20180809T105031_N0208_R051_T31TCG_20180809T141746.SAFE/GRANULE/L2A_T31TCG_A016350_20180809T105627/IMG_DATA/R10m/T3
1TCG_20180809T105031_B03_10m.jp2'

access_key='anystring'
secret_key='anystring'

session = boto3.Session(aws_access_key_id=access_key, aws_secret_access_key=secret_key)

with rasterio.Env(AWSSession(session), AWS_S3_ENDPOINT='data.cloudferro.com', AWS_HTTPS='NO') as env:
    print(env.options)
    with rasterio.open('/vsis3//{}/{}'.format(BUCKET, KEY), 'r') as ds:
        array = ds.read(1)

print(array.shape)

This fails with the following error:
123456
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 198, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer

rasterio._err.CPLE_FileIOError: Cannot open file '/vsis3//DIAS/Sentinel-2/MSI/L2A/2018/08/09/S2A_MSIL2A_20180809T105031_N0208_R051_T31TCG_20180809T141746.SAFE/GRANULE/L2A_T31TCG_A016350_20180809T105627/IMG_DATA/R10m/T31TCG_20180809T105031_B03_10m.jp2/product.xml'

i.e. for some unknown reason, rasterio seems to look for a product.xml file, which does not exist.

If I change to the use of /vsis3_streaming I get the following error:
12345
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 198, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 64, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsis3_streaming//DIAS/Sentinel-2/MSI/L2A/2018/08/09/S2A_MSIL2A_20180809T105031_N0208_R051_T31TCG_20180809T141746.SAFE/GRANULE/L2A_T31TCG_A016350_20180809T105627/IMG_DATA/R10m/T31TCG_20180809T105031_B03_10m.jp2' not recognized as a supported file format.

i.e. this time, it tries to access the right file, but can't recognise the format (same happens if I provide a known GeoTIFF file).

Anyone working on this nd encountering similar problems? rasterio and S3 access is extremely poorly documented, so any clues would help.
My mid-term goal is to work with VRT files that can handle /vsis3.

Guido
0 (0 Votes)

RE: Using rasterio and /vsis3 to read from S3
Answer
4 June 2019 11:45 as a reply to Guido Lemoine.
Dear Guido,

We have redirected your issue to our developers and data specialists.
You should obtain the feedback after an analysis on their side.

Best regards,
Mateusz Makowski
0 (0 Votes)

RE: Using rasterio and /vsis3 to read from S3
Answer
4 June 2019 12:19 as a reply to Guido Lemoine.
Dear Sir,

could you take a look on a below code? That should work - the problem is caused by :
  • double // in line 18 -> with rasterio.open('/vsis3//{}/{}'.format(BUCKET, KEY), 'r') as ds: 
  • missing AWS_VIRTUAL_HOSTING

Regards
Paweł Markowski

1234567891011121314151617181920212223
import boto3
import rasterio
from rasterio.session import AWSSession
import numpy as np

BUCKET = 'DIAS'

KEY = 'Sentinel-2/MSI/L2A/2018/08/09/S2A_MSIL2A_20180809T105031_N0208_R051_T31TCG_20180809T141746.SAFE/GRANULE/L2A_T31TCG_A016350_20180809T105627/IMG_DATA/R10m/T31TCG_20180809T105031_B03_10m.jp2'

access_key = 'anystring'
secret_key = 'anystring'

session = boto3.Session(aws_access_key_id=access_key,
                        aws_secret_access_key=secret_key)

with rasterio.Env(AWSSession(session), AWS_S3_ENDPOINT='data.cloudferro.com', AWS_HTTPS='NO', AWS_VIRTUAL_HOSTING='FALSE') as env:
    print(env.options)
    with rasterio.open('/vsis3/{}/{}'.format(BUCKET, KEY), 'r') as ds:
        array = ds.read(1)

print(array.shape)
[size=3]
[/size]
+1 (1 Vote)

RE: Using rasterio and /vsis3 to read from S3
Answer
4 June 2019 12:40 as a reply to Paweł Markowski.
Thanks. This works well. The combination of the missing parameter and the single / after /vsis3 is essential.
AWS_VIRTUAL_HOSTING=False also works, btw.
0 (0 Votes)