Managed file transfer in AWS

SFTP as a protocol for exchanging data across enterprises refuses to die.

No matter how much you may want this ancient protocol to just fade away, it’s very unlikely to — probably because it’s the lowest common denominator among enterprises.

But SFTP is just a protocol; it simply moves data between a client and a server. There’s always a business process or workflow that is associated with the sending or receiving of data that a protocol is unequipped to handle.

To address workflow requirements managed file transfer, or MFT, builds upon the “raw” protocol to provide support for workflow before and after it is sent and/or received. Here’s my definition of MFT:

Managed file transfer is a set of workflow definition and execution tools that are integrated with file transfer protocols to accomplish end-to-end business processes reliably and securely.

Common products in this area include Globalscape and MOVEit. The limitations of these enterprise-class MFT systems have become painfully evident in the last couple of years. They’re big, expensive and dangerously insecure.

Given the cost and risk of old-school MFT, the question is, “How does an enterprise create a secure, native MFT environment using SFTP in the cloud?”

Below, I outline a cloud-native, event-driven design using orchestration of AWS services. Using a design like this, you can ingest data from multiple sources using SFTP, transform data using extract, transform and load (ETL) services and then store it in a data lake.

I’ll get into the nitty-gritty of the design below in future posts. If you are interested, please let me know via the comments below. Comments are encouragement. 🙂

In this blog post, I just outline which AWS services do what in a typical MFT workflow. In future blog posts, I’ll get into some more nitty-gritty details, like provisioning and security. If you have specific areas you’d like detailed, just let me know.

All of the usual AWS and cloud goodness applies here, including scalability, AWS management of public-facing MFT endpoints and pay-as-you-go. In addition, by orchestrating several AWS services together, MFT workflows can be easily created, modified and managed, all using a true event-driven architecture.

From left to right in the drawing above:

  1. An AWS Transfer Family SFTP server accepts inbound SFTP transfers, preferably by SSH key only, from an S3 bucket. AWS Transfer Family allows IAM “step-down” policies that limit SFTP users to specific buckets and to specific keys in those buckets. In practice, I used a bucket-per-orchestration design.
  2. Because S3 is “hard-wired” to AWS EventBridge, new objects’ arrival can directly trigger a PutObject rule. Optionally, the S3 bucket is versioned with a lifecycle policy to preserve and, eventually, archive inbound data. The EventBridge event’s target is an AWS Step Functions state machine.
  3. The state machine triggers AWS Glue jobs, Glue triggers and Lambda functions as necessary to accomplish an MFT workflow. In this example, AWS Redshift is the target output for this workflow. State machines can optionally wait for a process to complete making it possible to validate the output of one orchestration step before starting another.

There are many moving parts here — configuring the AWS SFTP server, setting up IAM policies and configuring the state machine to run Glue routines, among others.

But the central idea is an old one: service orchestration. And the key here is AWS Step Functions, which has a graphical orchestration builder in which one can graphically build complex workflows.

The bottom line is that by letting AWS Transfer Family handle the internet-facing side of SFTP transfers and coupling that to the internal needs of an MFT workflow using orchestration, enterprises can achieve vastly better security, scalability and customization compared to old-school, monolithic managed file transfer products.

Once again, if you have specific feedback about which parts of this design you’d like to see expanded on in future blog posts, let me know in the comments.


Posted

in

, ,

by

Tags:

Comments

2 responses to “Managed file transfer in AWS”

  1. Dan Kearney Avatar
    Dan Kearney

    A couple days before your post, AWS announced Transfer Family will publish events to EventBridge. Would you consider driving your automation off the Transfer Family events rather than the S3 events? Why?

    https://aws.amazon.com/about-aws/whats-new/2024/02/aws-transfer-family-publishes-events-amazon-eventbridge-sftp-connectors/

    1. Alex Neihaus Avatar
      Alex Neihaus

      Dan,

      Thank you very much for this!

      No matter how many RSS feeds one subscribes to, it’s nearly impossible to keep up with the pace of improvements in AWS.

      Of course this will be useful. A quick read of the Transfer Family doc seems to indicate that the events available are more “transfer oriented” than “file oriented.” For example, the “SFTP Server File Upload Completed” event indicates completion of an SFTP upload but does not include the S3 object and key that was uploaded.

      A different type of event (“Connector event“) does appear to include the S3 object in the local-file-location JSON object. A possible issue here: these are limited to AWS Transfer Family Connectors, which one may or may not wish to use in an MFT orchestration.

      While I was working on the design in this blog post, I spoke with the AWS product manager for Transfer Family, a call that was arranged by AWS Support in response to a question I had about the inability of an orchestration to know much about the state of the transfer once invoked. It’s a complicated topic — but the PM indicated the team was working on something in this area and I suspect this is the first instantiation of their work.

      An example I gave the PM was that if you use the Transfer Family API for an outbound SFTP transfer and incorrectly specify the S3 object and key, Transfer Family fails the request as syntactically invalid. But if you supply a syntactically valid bucket/key path but the object does not exist, the Transfer Family SFTP connector will accept the request — and the caller receives no notification that it wasn’t completed until one inspects the CloudWatch logs, if logging is implemented. All you get back is a TransferId which is simply a string you could use ex post facto to interrogate the log.

      I asserted that in this case, Transfer Family can and should be able to do something in “near real time” instead of making a Lambda function or Glue job have to wait to inspect the log. I suspect use cases like these are at the heart of these new notifications.

      In any case, more is more…and having more than just the results of an S3 Put-Object, these new dynamic events will only make AWS MFT much more powerful.

      Thanks again for reading and for your helpful comments.

Leave a Reply

Your email address will not be published. Required fields are marked *