Get files after contract termination

After archive contract is terminated customer may want to move archived invoices to another repository. OpusCapita provides a temporary direct access to the archive storage for the customer to download the invoices. Download access will be provided only on customer request.

What the direct access will provide:

  • metadata of archived invoices: in json format, data that contains invoice number, invoice date, transactionId, reference path to original invoice.

  • SAS token, that is shared access signature that grants list and read access to azure blob storage where original invoice are archived.

Azure Blob storage and AzCopy tool

Archived invoices are stored to highly secure cloud storage called Microsoft Azure Blob (Binary Large Object). In order to download the invoices from customer specific storage containers, customer must use Microsoft´s AzCopy tool and SAS tokens provided by OpusCapita admin. Below instructions describe the basic steps how to

SAS token

SAS (Shared Access Signature) token will be generated by OC admin for the specific business partner container under our specific archive (Blob) storage. SAS tokens provide secure, delegated access to invoice files.

Currently we are using 2 blob storages:

  • the old one, will be referred as type “data”

  • the new one, will be referred as type “archive”

Single business partner can have its archived invoices in both storages, so 2 SAS need to be provided.

SAS has its validity period, currently it is set for 24 hours, it can be changed by OC admin before generation if longer validity period is required.

An example of SAS token:

{ "url": "https://blobx4aq56py4k2zy.blob.core.windows.net/3a7bcb5d4c457ec5bc1509fe1fbfd3ee-data", "sasQueryString": "?st=2024-07-11T10%3A33%3A56Z&se=2024-07-12T10%3A33%3A56Z&sp=rl&spr=https&sv=2018-03-28&sr=c&sig=y3vbPLTgt4aCg7%2Bnjnud0h9nRdaSWxtHXOLhy8mtIiA%3D", "exipry": "2024-07-12T10:33:56.398Z" }

 

Copy data with AzCopy

SAS token provides access to OC Azure Blob storage, with help of Microsoft's Azure AzCopy command-line utility one can list or download blob content.

To start with AzCopy visit Microsoft online documentation at https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10

To download blobs from Azure with AzCopy visit Microsoft online documentation at https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-download

 

With help of provided SAS and AzCopy utility you can eg. list all files in the container:

azcopy list 'https://blobx4aq56py4k2zy.blob.core.windows.net/3a7bcb5d4c457ec5bc1509fe1fbfd3ee-data?st=2024-07-11T10%3A33%3A56Z&se=2024-07-12T10%3A33%3A56Z&sp=rl&spr=https&sv=2018-03-28&sr=c&sig=y3vbPLTgt4aCg7%2Bnjnud0h9nRdaSWxtHXOLhy8mtIiA%3D' | more

the example response is

2024-07-11_12-55.png

 

The important files for business partner should be *.pdf, so eg. to get the list of *.pdf only one can modify the command and limit response to .pdf files only:

azcopy list 'https://blobx4aq56py4k2zy.blob.core.windows.net/3a7bcb5d4c457ec5bc1509fe1fbfd3ee-data?st=2024-07-11T10%3A33%3A56Z&se=2024-07-12T10%3A33%3A56Z&sp=rl&spr=https&sv=2018-03-28&sr=c&sig=y3vbPLTgt4aCg7%2Bnjnud0h9nRdaSWxtHXOLhy8mtIiA%3D' | grep .pdf | more

the corresponding response is

2024-07-11_13-02.png

 

Having the files list one can estimate the capacity of the new target resource the files are going to be copied in.

To download single file the following command is used:

so our example is:

We got the following response and file:

 

To download whole directory the following command is used:

In our specific case one should be interested in downloading all .pdf files from the whole container, so in our case the copy command can be the following:

this is going copy all .pdf files from blob storage to the current directory on the running machine keeping the folder structure of azure.

 

Instead of copying files from business partner’s container in OC azure storage into the local machine storage one can copy files directly to another azure blob storage account.

Then our corresponding copy command in this case would be

so instead of providing '<local-directory-path>' target azure blob destination 'https://<destination-storage-account-name>.<blob or dfs>.core.windows.net/<container-name>' should be provided (including authorization, so eg. another SAS with write permission generated for the target storage business partner owns).

 

Directory structure of business partner container in OC blob storage

The usual directory structure in OC blob container can be the following:

  • the old one present in “data” type container
    -- private
    |- sirius
    |- incoming
    |- transactionId
    transaction files
    where private->sirius->incoming is constant

  • the new one present in “archive” type container (although the old structure is also allowed here)
    -- private
    |- YEAR
    |- MONTH
    |- transactionId
    transaction files
    where only private is constant

 

Metadata of archived invoices

To make the reference to downloaded files somehow recognizable business partner should be provided with metadata of the invoices we store in elasticsearch.

For this OC need to generate corresponding .json file and store it in azure blob storage business partner container under /archive/private/esdata/ path so it can be fetched by business partner with AzCopy and SAS as the other files. File can be generated as one big set of data or split eg. yearly depending of number of documents it is going to contain.

Currently there is no process in our system to generate and store in azure blob such metadata file, however the result can easily be generated manually by any developer, using elasticdump tool with eg. the following query against elasticsearch storage, and storing the result under the mentioned folder in business partner’s blob container (example is for invoice sending archive of bnpautomationsupplier business partner)

 

The .json file with metadata is going to contain the following object per each transaction/invoice:

With help of this data business partner should be able to connect specific invoice with it’s original document after copied them from OC storage to the target one.