Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools - Python/TypeScript - Add CDK and Lambda function for stats collection #116

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions aws_doc_sdk_examples_tools/stats/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
*.js
!jest.config.js
*.d.ts
node_modules

# CDK asset staging directory
.cdk.staging
cdk.out
6 changes: 6 additions & 0 deletions aws_doc_sdk_examples_tools/stats/.npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.ts
!*.d.ts

# CDK asset staging directory
.cdk.staging
cdk.out
43 changes: 43 additions & 0 deletions aws_doc_sdk_examples_tools/stats/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

# CodeCommitCloneStack

Deploys an AWS Lambda function that monitors a CodeCommit repository for updates, extracts data from a `.stats` file, and appends it to a `stats.csv` file in an S3 bucket.

## Architecture

- **Lambda Function**: Pulls `.stats` file from CodeCommit, parses it, updates `stats.csv` in S3.
- **EventBridge Rule**: Triggers Lambda on CodeCommit repository changes.
- **IAM Role**: Grants Lambda necessary permissions for CodeCommit and S3 access.

## Prerequisites

- **Node.js**
- **AWS CDK**: `npm install -g aws-cdk`
- **AWS CLI**: Configured for your account
- **Python 3.9** (Lambda runtime)

## Deployment

2. **Install Dependencies**:
```bash
npm install
```

3. **Deploy Stack**:
```bash
cdk deploy
```

## Testing

Commit to the monitored CodeCommit repo and verify that `stats.csv` in S3 updates. Check Lambda logs in CloudWatch for details.

## Cleanup

```bash
cdk destroy
```

---

**License**: Apache-2.0
103 changes: 103 additions & 0 deletions aws_doc_sdk_examples_tools/stats/app.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as iam from "aws-cdk-lib/aws-iam";
import * as events from "aws-cdk-lib/aws-events";
import * as targets from "aws-cdk-lib/aws-events-targets";
import { Construct } from "constructs";
import * as path from "path";

const repoName = "AWSDocsSdkExamplesPublic";
const awsRegion = "us-west-2";

class CodeCommitCloneStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);

// Create Lambda function
const cloneLambda = this.initCloneLambda();

// Create EventBridge rule to trigger Lambda on CodeCommit repository changes
this.initCodeCommitTrigger(cloneLambda);
}

private initCloneLambda(): lambda.Function {
// IAM Role and Policy for Lambda to access CodeCommit
const lambdaExecutionRole = new iam.Role(this, "CloneLambdaExecutionRole", {
assumedBy: new iam.ServicePrincipal("lambda.amazonaws.com"),
description: "Execution role for Lambda function to clone CodeCommit repo",
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName("service-role/AWSLambdaBasicExecutionRole"),
],
});

// Grant necessary permissions to CodeCommit and S3
lambdaExecutionRole.addToPolicy(
new iam.PolicyStatement({
actions: [
"codecommit:GetRepository",
"codecommit:GitPull",
"codecommit:GetBranch",
"codecommit:GetDifferences",
"codecommit:GetFile"
],
resources: [`arn:aws:codecommit:${awsRegion}:${this.account}:${repoName}`],
})
);

// Grant necessary permissions to S3 bucket "codeexamplestats" for Get/Put
lambdaExecutionRole.addToPolicy(
new iam.PolicyStatement({
actions: ["s3:GetObject", "s3:PutObject"],
resources: [`arn:aws:s3:::codeexamplestats/*`], // Allow all objects in the bucket
})
);

// Define the Lambda function, pointing directly to the source code dir
const cloneLambda = new lambda.Function(this, "CodeCommitCloneLambda", {
runtime: lambda.Runtime.PYTHON_3_9,
handler: "index.lambda_handler",
code: lambda.Code.fromAsset(path.join(__dirname, "lambda")),
environment: {
REPO_NAME: repoName,
},
timeout: cdk.Duration.minutes(5),
role: lambdaExecutionRole,
});

return cloneLambda;
}

private initCodeCommitTrigger(cloneLambda: lambda.Function): void {
// EventBridge rule for CodeCommit repo updates
const codeCommitRule = new events.Rule(this, "CodeCommitUpdateRule", {
eventPattern: {
source: ["aws.codecommit"],
detailType: ["CodeCommit Repository State Change"],
resources: [`arn:aws:codecommit:${awsRegion}:${this.account}:${repoName}`],
detail: {
event: [
"referenceCreated",
"referenceUpdated",
"referenceDeleted"
]
}
}
});

// Add Lambda function as target of the EventBridge rule
codeCommitRule.addTarget(new targets.LambdaFunction(cloneLambda));
}
}

const app = new cdk.App();
new CodeCommitCloneStack(app, "CodeCommitCloneStack", {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: "us-west-2", // Where codecommit is stored (internal requirement)
},
});
app.synth();
export { CodeCommitCloneStack };
3 changes: 3 additions & 0 deletions aws_doc_sdk_examples_tools/stats/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"app": "npx ts-node app.ts"
}
16 changes: 16 additions & 0 deletions aws_doc_sdk_examples_tools/stats/dataset/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"fileLocations": [
{
"URIs": [
"s3://codeexamplestats/stats.csv"
]
}
],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ",",
"textqualifier": "\"",
"containsHeader": "true"
}
}

8 changes: 8 additions & 0 deletions aws_doc_sdk_examples_tools/stats/jest.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module.exports = {
testEnvironment: 'node',
roots: ['<rootDir>/test'],
testMatch: ['**/*.test.ts'],
transform: {
'^.+\\.tsx?$': 'ts-jest'
}
};
81 changes: 81 additions & 0 deletions aws_doc_sdk_examples_tools/stats/lambda/index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import boto3
import os
import tempfile
from io import BytesIO, StringIO
import json
import csv
from datetime import datetime # Import datetime module


def lambda_handler(event, context):
# Initialize boto3 clients
codecommit = boto3.client("codecommit", region_name=os.environ["AWS_REGION"])
s3 = boto3.client("s3")

# Environment variables
repo_name = os.environ.get("REPO_NAME")
branch_name = os.environ.get("BRANCH_NAME", "mainline") # Default to "mainline"
bucket_name = "codeexamplestats"
csv_file_key = "stats.csv"

try:
# Step 1: Retrieve the .stats file content directly from CodeCommit
file_response = codecommit.get_file(
repositoryName=repo_name,
filePath=".stats", # Specify the path to the .stats file
commitSpecifier=branch_name,
)

# Convert .stats content to valid JSON if necessary
file_content = file_response["fileContent"].decode("utf-8")
try:
stats_data = json.loads(file_content) # Valid JSON parsing
except json.JSONDecodeError:
file_content = file_content.replace("'", '"') # Replace single quotes
stats_data = json.loads(file_content)

# Step 2: Fetch the current stats.csv file from S3
existing_rows = []
try:
csv_obj = s3.get_object(Bucket=bucket_name, Key=csv_file_key)
csv_data = csv_obj["Body"].read().decode("utf-8")
csv_reader = csv.DictReader(StringIO(csv_data))
existing_rows = list(csv_reader)
except s3.exceptions.NoSuchKey:
existing_rows = []

# Step 3: Append the new data from .stats file with a formatted timestamp
new_row = {
"sdks": stats_data["sdks"],
"services": stats_data["services"],
"examples": stats_data["examples"],
"versions": stats_data["versions"],
"snippets": stats_data["snippets"],
"genai_none": stats_data["genai"]["none"],
"genai_some": stats_data["genai"]["some"],
"genai_most": stats_data["genai"]["most"],
"timestamp": datetime.now().strftime(
"%d/%m/%Y %H:%M:%S"
), # Formatted timestamp
}
existing_rows.append(new_row)

# Step 4: Save the updated data back to CSV and upload it to S3
with StringIO() as output:
# Include 'timestamp' in fieldnames to write the timestamp in the CSV
writer = csv.DictWriter(output, fieldnames=new_row.keys())
writer.writeheader() # Ensure we have headers
writer.writerows(existing_rows)
s3.put_object(
Bucket=bucket_name,
Key=csv_file_key,
Body=output.getvalue().encode("utf-8"),
)

print("Successfully updated stats.csv in S3 with new .stats data.")

except Exception as e:
print(f"Error occurred: {e}")
return {"statusCode": 500, "body": "Failed to update stats.csv."}

return {"statusCode": 200, "body": "stats.csv updated successfully in S3."}
Loading
Loading