fileflow.task_runners package¶
fileflow.task_runners.task_runner module¶
-
class
fileflow.task_runners.task_runner.TaskRunner(context)[source]¶ Bases:
object-
get_input_filename(data_dependency, dag_id=None)[source]¶ Generate the default input filename for a class.
Parameters: Returns: File system path or S3 URL to the input file.
Return type:
-
get_output_filename()[source]¶ Generate the default output filename or S3 URL for this task instance.
Returns: File system path to output filename Return type: str
-
get_upstream_stream(data_dependency_key, dag_id=None)[source]¶ Returns a stream to the file that was output by a seperate task in the same dag.
Parameters: - data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
- dag_id (str) – Defaults to the current DAG id.
- encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns: stream to the file
Return type: stream
-
read_upstream_file(data_dependency_key, dag_id=None, encoding='utf-8')[source]¶ Reads the file that was output by a seperate task in the same dag.
Parameters: - data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
- dag_id (str) – Defaults to the current DAG id.
- encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns: Result of reading the file
Return type:
-
read_upstream_json(data_dependency_key, dag_id=None, encoding='utf-8')[source]¶ Reads a json file from upstream into a python object.
Parameters: Returns: A python object.
-
read_upstream_pandas_csv(data_dependency_key, dag_id=None, encoding='utf-8')[source]¶ Reads a csv file from upstream into a pandas DataFrame. Specifically reads a csv into memory as a pandas dataframe in a standard manner. Reads the data in from a file output by a previous task.
Parameters: - data_dependency_key (str) – The key (business logic name) for the upstream dependency. This will get the value from the self.data_dependencies dictionary to determine the file to read from.
- dag_id (str) – Defaults to the current DAG id.
- encoding (str) – The file encoding to use. Defaults to ‘utf-8’.
Returns: The pandas dataframe.
Return type: pd.DataFrame
-
write_file(data, content_type='text/plain')[source]¶ Writes the data out to the correct file.
Parameters:
-
write_json(data)[source]¶ Write a python object to a JSON output file.
Parameters: data (object) – The python object to save.
-