# Preparing data for large processing jobs ## Why do I need Chiller? EODC stores data, that is not recently used, on tape infrastructure (cold storage) which has several advantages for long term storage. This data is visible on the file system, however for large processing jobs a recall to disk storage (hot storage) can be very helpful to avoid long loading times. In order to process the desired data, a file list must be created, which is then given as input to the EODC's own program Chiller, which moves the data from the tape storage to the disk storage. ## Writing a filelist in MobaXTerm ### 1. Preparing the setup EODC Customers can access the EODC network with an SSH key to search for the desired data. Windows users can use [MobaXTerm](https://mobaxterm.mobatek.net/download-home-edition.html) for this. Linux users have usually by default an SSH client in the commandline available. ### 2. Use case The example use case is that a time series is searched for a specific UTM tile. To create the file list with all the desired data in MobaXTerm, we need to execute a Python script.This should be saved in a .py file and can be dragged and dropped into the home folder on the left-hand side. ### 3. Python example The following Python script searches for all files with the given UTM tile in the Sentinel-2 MSI L1C data and saves the found file names in a list, which is saved in the (home) folder specified at the bottom of the script. The home folder can be specified with the tilde symbol ~. ``` import os import fnmatch import numpy as np import sys import datetime import shutil print(sys.argv) # Define the root directory to start the search root_directory = "/eodc/products/copernicus.eu/s2a_prd_msil1c/" # Define the search pattern (string to search for) search_pattern = "*"+sys.argv[1]+"*" # Create an empty list to store matching file paths matching_files = [] file_names = [] # Walk through the directory tree and find matching files counter = 0 for dirpath, dirnames, filenames in os.walk(root_directory): if counter % 100 == 0: print("Searching.." + dirpath) counter += 1 for filename in fnmatch.filter(filenames, search_pattern): matching_files.append(os.path.join(dirpath, filename)) file_names.append(filename) #print (file_names) with open(r'~'+sys.argv[1]+'.txt', 'w') as fp: for item in matching_files: fp.write("%s\n" % item) print("Done") ``` ### 4. Running the code and writing the filelist In order to run the Python script in MobaXTerm, you must navigate to the folder in which it was stored. To do this, use `cd ..` to go back in the folder directory and `cd ~` to navigate to the home directory. Once the desired directory has been found, the command `ls` can be used to check whether the Python script is located in the folder. I have selected the UTM tile T60WWV for our use case. To start the script, the following expression must be called. First 'python', then the name of the script and then the selected tile. ``` python create_filelist_s2a_prd_msil1c.py T60WWV ``` Now the algorithm searches for every file with the selected UTM tile. When the search is finished, `Done` appears. Then the target folder specified in the script can be refreshed and a .txt file with the file list should have been created. The filelist.txt can then be dragged and dropped into the local explorer. ![image](./lb_imgs/MXT_Search_files.png) ## Staging the data with Chiller To move the data from the tape storage to the disk storage, a job containing the generated file list is started. ``` curl https://chiller.eodc.eu/upload --data-binary "@~T60WWV.txt" ``` After the command has been executed, the ID of the job is the output. The ID can be used to call up the status of the job. ![image](./lb_imgs/MXT_curl_upload.png) The status of the job can be shown with another curl command. ``` curl https://chiller.eodc.eu/jobs/1718707176972687523 ``` When `"Finished":"true"`, then the job is done and the staging is completed. Now, for example, the NDVI can be calculated faster with all files of the time series, as all data is available on disk. To do this, a Python script must be loaded again which contains the file list.