In this 3-part blog series, we’ll walk through the process of extracting data from the Amplitude API using Python, uploading the output files to an Amazon S3 bucket, and finally loading that data into a Snowflake table.
Part two: Testing your Python connection to your S3 bucket
Before building your full data pipeline, it’s a good idea to test your connection to AWS S3. This helps ensure your credentials are working, your bucket permissions are correct, and your file upload logic is sound.
Here’s a step-by-step outline to get you started:
Step 1: Create and Activate a Virtual Environment
Create an isolated Python environment for dependency management:
python -m venv venv
venv\Scripts\activate
Step 2: Set Up Environment Variables
Create a .env
file to store your AWS credentials securely:
AWS_ACCESS_KEY_ID = your_key_id
AWS_SECRET_ACCESS_KEY = your_secret_key
AWS_REGION = your_region
Step 3: Update .gitignore
To avoid pushing sensitive files to GitHub, update your .gitignore
:
.env
venv/
Step 4: Create a Git Branch for Testing
Create and commit your setup in a dedicated Git branch:
git checkout -b test-s3-connection
git add .
git commit -m "Initial setup for S3 connection test"
Python Script to Test S3 Upload
Now, let’s write a Python script to test your S3 connection and upload functionality.
Step 1: Import Required Packages
import boto3
import os
from dotenv import load_dotenv
Step 2: Load AWS credentials
load_dotenv()
s3 = boto3.client(
's3',
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
region_name=os.getenv('AWS_REGION')
)
Step 3: (Optional) Test External API Before Upload
If you're retrieving data from an API before uploading to S3, validate the response:
response = requests.get("your_api_url")
if response.status_code != 200:
raise Exception("API request failed with status code: " + str(response.status_code))
Step 4: Upload a Single File to S3 (Without Saving Locally)
def upload_to_s3(bucket, key, content):
s3.put_object(Bucket=bucket, Key=key, Body=content)
Test the function with a sample file:
upload_to_s3("my-bucket-name", "test/test_file.json", '{"test": "data"}')
Step 5: Upload Multiple Files (Scale Up)
def upload_multiple_data_to_s3(bucket, data_list):
for key, content in data_list:
print(f"Uploading to s3://{bucket}/{key}")
s3.put_object(Bucket=bucket, Key=key, Body=content)