fileflow.task_runners package

fileflow.task_runners.task_runner module

class fileflow.task_runners.task_runner.TaskRunner(context)[source]

Bases: object

get_input_filename(data_dependency, dag_id=None)[source]

Generate the default input filename for a class.

Parameters:
  • data_dependency (str) – Key for the target data_dependency in self.data_dependencies that you want to construct a filename for.
  • dag_id (str) – Defaults to the current DAG id
Returns:

File system path or S3 URL to the input file.

Return type:

str

get_output_filename()[source]

Generate the default output filename or S3 URL for this task instance.

Returns:File system path to output filename
Return type:str
get_upstream_stream(data_dependency_key, dag_id=None)[source]

Returns a stream to the file that was output by a seperate task in the same dag.

Parameters:
  • data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
  • dag_id (str) – Defaults to the current DAG id.
  • encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns:

stream to the file

Return type:

stream

read_upstream_file(data_dependency_key, dag_id=None, encoding='utf-8')[source]

Reads the file that was output by a seperate task in the same dag.

Parameters:
  • data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
  • dag_id (str) – Defaults to the current DAG id.
  • encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns:

Result of reading the file

Return type:

str

read_upstream_json(data_dependency_key, dag_id=None, encoding='utf-8')[source]

Reads a json file from upstream into a python object.

Parameters:
  • data_dependency_key (str) – The key for the upstream data dependency. This will get the value from the self.data_dependencies dict to determine the file to read.
  • dag_id (str) – Defaults to the current DAG id.
  • encoding (str) – The file encoding. Defaults to ‘utf-8’.
Returns:

A python object.

read_upstream_pandas_csv(data_dependency_key, dag_id=None, encoding='utf-8')[source]

Reads a csv file from upstream into a pandas DataFrame. Specifically reads a csv into memory as a pandas dataframe in a standard manner. Reads the data in from a file output by a previous task.

Parameters:
  • data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
  • dag_id (str) – Defaults to the current DAG id.
  • encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns:

The pandas dataframe.

Return type:

pd.DataFrame

run(*args, **kwargs)[source]
write_file(data, content_type='text/plain')[source]

Writes the data out to the correct file.

Parameters:
  • data (str) – The data to output.
  • content_type (str) – The Content-Type to use. Currently only used by S3.
write_from_stream(stream, content_type='text/plain')[source]
write_json(data)[source]

Write a python object to a JSON output file.

Parameters:data (object) – The python object to save.
write_pandas_csv(data)[source]

Specifically writes a csv from a pandas dataframe to the default output file in a standard manner.

Parameters:data – the dataframe to write.
write_timestamp_file()[source]

Writes an output file with the current timestamp.