Download artifact metadata#

When you use the track_files() method, it results in an Artifact field that contains metadata about the tracked files. The field can reference a single file as well as a collection of files.

This guide shows how you can fetch metadata from the artifact field.

Assumptions

In this guide, we assume the following file structure:

.
|-- datasets/
    |-- train/
        |-- sample.csv
        |-- ...

We log the datasets/ folder under the field "data_versions".

>>> import neptune
>>> run = neptune.init_run()  # creates a run with the example identifier "CLS-45"
>>> run["data_versions"].track_files("datasets/")
>>> run.stop()

Later, we can connect to the run by passing its Neptune ID at initialization:

>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
https://app.neptune.ai/ml-team/classification/e/CLS-45

How do I find the ID?

The Neptune ID is a unique identifier for the run. In the table view, it's displayed in the leftmost column.

The ID is stored in the system namespace (sys/id).

If the run is active, you can obtain its ID with run["sys/id"].fetch(). For example:

>>> run = neptune.init_run(project="ml-team/classification")
>>> run["sys/id"].fetch()
'CLS-26'

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"

export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run( # (1)!
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)

Also works for init_model(), init_model_version(), init_project(), and integrations that create Neptune runs underneath the hood, such as NeptuneLogger or NeptuneCallback.
API token: In the bottom-left corner, expand the user menu and select Get my API token.
Project name: You can copy the path from the project details ( → Edit project details).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Fetching the artifact hash#

To obtain the hash of the artifact, use the fetch_hash() method on the artifact field:

>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
[neptune] [info   ] Neptune initialized...
>>> run["data_versions"].fetch_hash() 
'4e2f79947dfc5ca977c507f905792fae98c49a4b1df795d81e80279e3ce7be8c'

Fetching metadata of contained files#

You can fetch the metadata of files inside an artifact with the fetch_files_list() method. This returns an ArtifactFileData object with the following properties:

file_hash: Hash of the file.
file_path: Path of the file, relative to the root of the virtual artifact directory.
size: Size of the file, in kilobytes.
metadata: Dictionary with the keys:
- file_path: URL of the file (absolute path in local or S3-compatible storage).
- last_modified: When the file was last modified.

The below example shows how you can interact with the ArtifactFileData object.

>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
[neptune] [info   ] Neptune initialized...
>>> artifact_list = run["data_versions"].fetch_files_list()

You can now access metadata through artifact_list:

>>> artifact_list[0].file_hash
'e54fdfced68d7e057eda168a05910fe609fc27f5'
>>> artifact_list[0].file_path
'train/sample.csv'

The metadata field of an individual file is a dictionary with the following keys: "file_path" (path of the file, either on local storage or S3-compatible storage) and "last_modified".

>>> artifact_list[0].metadata["last_modified"]
'2022-09-30 10:50:40'
>>> artifact_list[0].metadata["file_path"]
'file:///home/jackie/projects/text-classification/datasets/train/sample.csv'

Downloading contained files#

You can also download all the files that are referenced in the artifact field with the download() method.

Neptune looks for each file at the path which was logged originally.

Note for Windows

This method creates symbolic links to the referenced files.
You may need to run your terminal program as administrator, to grant the client the permissions needed to copy the file references in your local system.

>>> import neptune
>>> run = neptune.init_run(with_id="CLS-45", mode="read-only")
[neptune] [info   ] Neptune initialized...
>>> run["data_versions"].download(destination="downloaded_artifact")

If the artifact points to an object stored in S3 or GCS, it downloads the object to the local system directly from the remote storage.