How to find AWS S3 bucket size and number of objects via PowerShell

How to find AWS S3 bucket size

For reasons I’ve never understood, AWS’s S3 object file store does not offer metadata about the size and number of objects in a bucket. This meant that answering the simple question “How can I get the total size of an S3 bucket?” required a scan of the bucket to count the objects and total the size. This is slow, especially when you have millions of objects in a bucket.

In July, 2015, AWS started collecting S3 metrics in CloudWatch. Metrics include the storage type and number of objects and are available in the CloudWatch console. I wanted some way to programmatically access this data and to produce output from that script that could be input to additional scripts for further analysis. Retrieving CloudWatch data is, of course, orders of magnitude faster than counting the objects in the bucket so that was a major impetus to writing something as well.

What I came up with is below. It’s written as a PowerShell advanced function so that you can dot-source it and use it as you would any other cmdlet. I’ve written extensive help into the function which can be accessed via Get-Help Get-S3BucketSize -Full after you’ve loaded the function. Three items of note: the -BucketName parameter can be passed via the pipeline and the output is a standard PSObject, meaning you can pipe the results to Out-GridView, Measure-Object or even Export-Csv. Size output is reported in gibibytes (230), not gigabytes (109).

There’re lots of options and nuance regarding the CloudWatch S3 metrics which I have tried to handle in the optional parameters and in the help information. But if you have a default AWS credential profile and region stored, you can just issue Get-S3BucketSize and you should get something useful. If you have specific questions about the results or parameters, please post in the comments below and I’ll try to answer the question.

Now, for the best news of all. While I was writing this script, Microsoft open-sourced PowerShell (!!) and AWS announced support for the alpha (!!!). I have no words for how monumental this is…but maybe a screenshot will suffice. AFAIK, this screen shot may be of one of the very first PowerShell functions to do anything useful in AWS via PowerShell on macOS (this was actually on the macOS Sierra beta). Anyway, I am really excited by all this interoperability — and AWS’s quick support for it in a major way.

PowerShell running on macOS beta
PowerShell running on macOS beta (click to enlarge)
Import-Module AWSPowerShell
<#
    .SYNOPSIS
        Provides maximum or highest average bucket size in gibibytes and number of objects via AWS CloudWatch measurements for a specific S3 bucket or all buckets over a specificed period of days.

    .DESCRIPTION
        Accepts a single bucket name or an array of bucket names via the pipline to pass to AWS CloudWatch to retrieve metrics for all (default) or selected storage classes.

    .PARAMETER BucketName
        Lower-case name of S3 bucket.

    .PARAMETER StorageClass
        One of StandardStorage | StandardIAStorage | ReducedRedundancyStorage. Defaults to all storage classes. Results for storage class AllStorageTypes are always returned in order to provide the number of objects.

    .PARAMETER AWSProfile
        String containing name of credentals created via New-AWSCredential. Defaults to credentials stored in default AWS profile, that is whatever is authorized when no credentials are supplied.

    .PARAMETER Days
        The number of days for which to collect average or maximum CloudWatch metrics for S3 buckets. Defaults to 5.

    .PARAMETER Statistic
        The case-sensitive CloudWatch statistic to retireve. Must be one of 'Maximum' or 'Average'. Defaults to 'Average'. 'Average' returns highest average over the number of days selected.

    .INPUTS
        System.String

    .OUTPUTS
        System.Management.Automation.PSObject:
                Bucket = Name of S3 Bucket
                SizeGiB = Size in gibibytes of contents of bucket by storage class
                NumObjects = Number of S3 objects in bucket across ALL storage classes
                StorageClass = bucket storage class (exclusing GLAICER class)
    .EXAMPLE
        PS C:\> Get-S3BucketSize

        Outputs to the pipline a collection of type PSObject that lists the average bucket size and number of objects in all buckets over the previous five days. Uses the default AWS credential profile.

    .EXAMPLE
        PS C:\> Get-S3BucketSize -BucketName 'BucketName' -Statistic 'Maximum' -AWSProfile 'myprofile'

        Outputs to the pipline a (single member) collection of type PSObject that lists the maximum bucket size and number of objects over the previous five days. Selects buckets based on 'myprofile'.

    .EXAMPLE
        PS C:\> Get-S3BucketSize -BucketName 'BucketName' -Days 14

        Outputs to the pipline a (single member) collection of type PSObject that lists the maximum average bucket size and number of objects over the previous 14 days. Selects buckets based on 'myprofile'.

    .EXAMPLE
        PS C:\> Get-S3BucketSize | Measure-Object -Property SizeGiB -Sum

        Sums the maximum average size over the last five days of all S3 buckets.
    
    .EXAMPLE
        PS C:\> Get-S3BucketSize | Measure-Object -Property NumObjects -Sum

        Sums the maximum average number of objects over the last five days of all S3 buckets.

    .EXAMPLE
        PS C:\> Get-S3BucketSize -StorageClass StandardStorage | Measure-Object -Property SizeGiB -Sum

        Pipes the maximum average size of StandardStorage over the last five days of all S3 buckets available to the current profile to Measure-Object which sums the total size of all S3 objects in those buckets.

    .EXAMPLE
        PS C:\> Import-Csv .\lisofbuckets.csv | Get-S3BucketSize

        Accepts from pipeline a list of buckets to be retrieved for measurement. The .csv file can be easily created with Get-S3Bucket | Export-Csv .\listofbuckets.csv and edited as required.

    .EXAMPLE
        PS C:\> Get-S3Bucket | Get-S3BucketSize

        Outputs to the pipline a collection of all S3 buckets' size and number of objects. This is equivalent to Get-S3BucketSize since it will also invoke Get-S3Bucket when -BucketName is omitted.


.NOTES
        For more information on S3 metrics in CloudWatch, see http://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html

        (c) 2016 Air11 Technology LLC -- licensed under the Apache OpenSource 2.0 license, https://opensource.org/licenses/Apache-2.0
        Author's blog: https://yobyot.com
#>
function Get-S3BucketSize
{
    [CmdletBinding()]
    [OutputType([string])]
    param
    (
        [Parameter(ValueFromPipelineByPropertyName = $true,
                   Position = 0,
                   HelpMessage = 'Lower-case name of S3 bucket')]
        [System.String[]]$BucketName = 'All',
        [Parameter(HelpMessage = 'Specify storage class ')]
        [System.String]$StorageClass,
        [Parameter(HelpMessage = 'Enter the name of the AWS credential profile to be used')]
        [System.String]$AWSProfile,
        [Parameter(HelpMessage = 'Enter an integer for the number of days to collect metrics')]
        [ValidateRange(1, 14)]
        [System.Int16]$Days = 5,
        [System.String]$Statistic = 'Average'
    )
    
    begin
    {
        try
        {
            $obj = [ordered]@{
                'Bucket' = ''
                'SizeGiB' = ''
                'NumObjects' = ''
                'StorageClass' = ''
            }
            $results = @()
            $daysAgo = (Get-Date ([datetime](Get-Date).AddDays(- $Days)) -Format s) # Date formats for Get-CWMetricStatistics MUST be in ISO format
            $today = Get-Date -Format s # Date formats for Get-CWMetricStatistics MUST be in ISO format
            if ($AWSProfile) { Set-AWSCredentials -ProfileName $AWSProfile }
            if ($Statistic -cnotmatch '(Maximum|Average)\b') { $Statistic = "Average" }
            Write-Verbose "Today=$today, DaysAgo=$daysAgo, AWSProfile=$AWSProfile, Statistic=$Statistic"
        }
        catch
        {
            "An error occurred: $Error"
        }
    }
    process
    {
        try
        {
            
            switch ($BucketName)
            {
                'All' {
                    $BucketNameStrings = Get-S3Bucket | Select-Object -ExpandProperty BucketName
                    
                    foreach ($b in $BucketNameStrings)
                    {
                        
                        switch ("$StorageClass")
                        {
                            "StandardStorage" {
                                $results += (getBucketSize "$b" 'StandardStorage')
                                
                            }
                            "StandardIASStorage"  {
                                $results += (getBucketSize "$b" 'StandardIAStorage')
                            }
                            "ReducedRedundancyStorage" {
                                $results += (getBucketSize "$b" 'ReducedRedundancyStorage')
                            }
                            default
                            {
                                #Get all classes
                                $results += getBucketSize $b 'StandardStorage'
                                $results += getBucketSize $b 'StandardIAStorage'
                                $results += getBucketSize $b 'ReducedRedundancyStorage'
                            }
                        }
                        $results += (getBucketNumObjects $b)
                    }
                    
                }
                
                ($BucketName -ne 'All')
                {
                    
                    switch ("$StorageClass")
                    {
                        "StandardStorage" {
                            $results += (getBucketSize $BucketName 'StandardStorage')
                            
                        }
                        "StandardIAStorage"  {
                            $results += (getBucketSize $BucketName 'StandardIAStorage')
                        }
                        "ReducedRedundancyStorage" {
                            $results += (getBucketSize $BucketName 'ReducedRedundancyStorage')
                        }
                        default
                        {
                            #Get all classes
                            $results += (getBucketSize $BucketName 'StandardStorage')
                            $results += (getBucketSize $BucketName 'StandardIAStorage')
                            $results += (getBucketSize $BucketName 'ReducedRedundancyStorage')
                        }
                    }
                    $results += (getBucketNumObjects $BucketName)
                }
                default { Write-Verbose "Neither 'All' nor individual bucket selected; big problem since default bucket name is 'All' " }
            }
        }
        catch
        {
            "An error occurred: $Error"
        }
    }
    end
    {
        try
        {
            Write-Output $results
            Write-Verbose "Done"
        }
        catch
        {
            "An error occurred: $Error"
        }
    }
}
function getBucketSize ($bname, $stgclass)
{
    Write-Verbose "getBucketSize entered with $bname and storage class $stgclass"
    
    $metricSize = Get-CWMetricStatistics -Namespace 'AWS/S3' -MetricName 'BucketSizeBytes' `
                                         -Dimension @(@{ Name = 'BucketName'; Value = "$bname" }; @{ Name = 'StorageType'; Value = "$stgclass" }) `
                                         -Statistic $Statistic -Period 86400 -StartTime $daysAgo -EndTime $today
    $maxSize = '{0:N2}' -f (($metricSize.Datapoints | Measure-Object -Property $Statistic -Maximum).Maximum / 1GB)
    
    $functionObj = New-Object -TypeName System.Management.Automation.PSObject -Property $obj
    $functionObj.Bucket = [string]$bname
    $functionObj.SizeGiB = [decimal]$maxSize
    $functionObj.NumObjects = ''
    $functionObj.StorageClass = $stgclass
    $functionObj
}
function getBucketNumObjects ($bname)
{
    Write-Verbose "getBucketNumObjects entered with $bname and storage class $stgclass"
    
    $metricNumObjects = Get-CWMetricStatistics -Namespace 'AWS/S3' -MetricName 'NumberOfObjects' `
                                               -Dimension @(@{ Name = 'BucketName'; Value = "$bname" }; @{ Name = 'StorageType'; Value = 'AllStorageTypes' }) `
                                               -Statistic $Statistic -Period 86400 -StartTime $daysAgo -EndTime $today
    $numObjects = (($metricNumObjects.Datapoints | Measure-Object -Property $Statistic -Maximum).Maximum)
    if (!$numObjects) { $numObjects = 0 }
    
    $functionObj = New-Object -TypeName System.Management.Automation.PSObject -Property $obj
    $functionObj.Bucket = [string]$bname
    $functionObj.SizeGiB = ''
    $functionObj.NumObjects = $numObjects
    $functionObj.StorageClass = 'AllStorageTypes'
    
    $functionObj
}

 


Posted

in

, , ,

by

Tags:

Comments

6 responses to “How to find AWS S3 bucket size and number of objects via PowerShell”

  1. Efrain Jimenez Avatar
    Efrain Jimenez

    Very handy function, I look forward to more of your work! Thank you!

    1. Alex Neihaus Avatar
      Alex Neihaus

      Thank you!

  2. NathanT Avatar
    NathanT

    This seems to be a great function but I could not get it to work.  I tried shortening it down to a couple lines:

     

    $S3Dimensions = @(@{Name = ‘BucketName’; Value = $S3Bucket.BucketName}; @{Name = ‘StorageType’; Value = “StandardStorage”})

    $S3BucketData = Get-CWMetricStatistics -Namespace ‘AWS/S3’ -MetricName ‘BucketSizeBytes’ -Dimension $S3Dimensions -Period 86400 -StartTime $AWSWeek -Endtime $AWSNow -Statistic ‘Average’

    $S3BucketMaxSize = ‘{0:N2}’ -f (($S3BucketData.Datapoints | Measure-Object -Property $Statistic -Maximum).Maximum / 1GB)

    Most of my S3 usage is standard.  There error I get out of my Powershell IDE is this:

     

    Measure-Object : Cannot validate argument on parameter ‘Property’. The argument is

    null or empty. Provide an argument that is not null or empty, and then try the

    command again.

    When I look at the $S3BucketData.datapoints, it’s empty

     

    1. Alex Neihaus Avatar
      Alex Neihaus

      So, wait. You couldn’t get my function to work, so you rewrote it and are asking me to debug your code? Really? 🙂 I see a couple of things you might want to consider:

      • I don’t see a call to Get-S3Bucket
      • Your assignment to $S3Dimensions has several issues I can see. First, you have to pass two hash tables to Get-CWMetricsStatistics as an array of hash tables. The first key=value pair is "BucketName"=S3Bucketname; the second is "StorageType"=StorageClass. Your statement produces a single hash table with a jumble of key=value pairs. The first entry in your hash table would contain key=BucketName and for that first value you would have an array containing all the bucket names (assuming you have a call to Get-S3Bucket somewhere)
      • Check IAM to make sure you have access to the CloudWatch metrics and access to the S3 buckets you are trying to measure.

      Good luck.

      1. NathanT Avatar
        NathanT

        Thanks.  I did use Get-S3Bucket but did not include it in my segment.  In any case, I did figure out (facepalm) how to integrate your functions.  Very, very helpful and thanks for posting them.  A lot of googling and you seem to be the only person out there that has a good handle on really leveraging Powershell against AWS.

        1. Alex Neihaus Avatar
          Alex Neihaus

          I’m glad my S3 bucket size function is working for you. And I appreciate the props for the work I am posting about PowerShell and AWS. I really enjoy combining the two. I’m waiting for the day that PowerShell for macOS is finished. Then I think we’ll see more work in this area.

Leave a Reply

Your email address will not be published. Required fields are marked *