Skip to content

File storage

There are two different concepts when handling file storage in InvenioRDM. One is the backend, meaning the actual technology that is used to store it. For example, the local file system or S3. You can find more information about storage backends in the customize section.

Moreover, the origin or method used to transport the files is also important. In InvenioRDM there are three defined types.

  • Local, which represents the files that are managed by the InvenioRDM instance, independently of the backend.
  • Fetch, these are files that are not managed by the instance but will be transported. This means that they will eventually become local files.
  • Remote, these are represented by a reference to an external storage system. Since the files are not managed by the instance there is no possible way to guarantee their availability or integrity. At the moment this type of files are not supported by InvenioRDM.

These file types are stored in the storage_class attribute of the file model, and represented by a one character encoding:

Type Representation
Local L
Fetch F
Remote R

Local files (L)

Local files are managed as defined in the records and drafts reference section.

Files fetching (F)

Introduced in InvenioRDM v11

Experimental feature

The file fetching mechanism in InvenioRDM v11 has a few limitations. Be aware that future releases of InvenioRDM might introduce breaking changes. We will document them as extensively as possible.

Use it at your own risk!

Fetched files accept two more arguments than a local files on their initialization: storage_class, and uri:

Parameters

Name Type Location Description
storage_class string body "L"
uri string body URL to fetch the file from

The uri must be a URL, accessible from the server's network and resolving to a file that can be fetched. No authentication mechanism (e.g. Authorization header) is supported for the request process, so any authentication has to be part of the URL itself (e.g. a token passed in a query string).

Request

POST /api/records/{id}/draft/files HTTP/1.1
Content-Type: application/json

[
    {
        "key": "dataset.zip",
        "uri": "https://example.org/files/dataset.zip?token=<auth token>",
        "storage_class": "F",
    },
    ...
]

Response

HTTP/1.1 201 CREATED
Content-Type: application/json

{
  "enabled": true,
  "default_preview": null,
  "order": [],
  "entries": [
    {
      "key": "dataset.zip",
      "updated": "2020-11-27 11:17:11.002624",
      "created": "2020-11-27 11:17:10.998919",
      "metadata": null,
      "status": "pending",
      "storage_class": "F",
      "uri": "https://example.org/files/dataset.zip?token=<auth token>",
      "links": {
        "content": "/api/records/{id}/draft/files/dataset.zip/content",
        "self": "/api/records/{id}/draft/files/dataset.zip",
        "commit": "/api/records/{id}/draft/files/dataset.zip/commit"
      },
    }
  ],
  "links": {
    "self": "/api/records/{id}/draft/files"
  },
}

At this point an asynchronous task will be launched and the file will be transported into the InvenioRDM instance. Once the file transfer is completed, the status field will be changed to completed. At this point the storage_class of the files has also changed to L. The status can be checked using the files url (/api/records/{id}/draft/files). Note, until all the files have been transferred (i.e. their status is completed) the record cannot be published.

More over, while files are being transferred requests to the content and commit endpoints are not allowed (disabled).

Security

By default file fetching will be refused. Files can only be fetched from a configurable list of trusted domains, which can be configured in the invenio.cfg file.

RECORDS_RESOURCES_FILES_ALLOWED_DOMAINS = [
    "example.org",
    "mystoragehosting.com",
]

Remote files (R)

Not supported

Remote files are currently not supported.