Automating Customer Feedback with Machine Learning

Written by: Evan Glazer

When it comes to sifting through customer feedback about your company and products, you’d be spending a lot of time and money without some type of automation in place. After all, there are a lot of steps to manually analyzing feedback: you probably have to log in somewhere, navigate to feedback, click into each feedback item, write down some notes, try to figure out a solution, and repeat. Pretty much a full-time job if your company has more than five products and a lot of feedback.

Of course, sometimes when we think about automating feedback analysis, we might think the process will take too long to develop or just might not be efficient in the first place. But automation will not only save money, it will allow you to focus on more precious tasks.

In this post, I'll explain how you can spin up a simple system that offers insight from customer feedback through machine learning and natural language processing. We can use Amazon services to help automate and comprehend the feedback, and for less than $15 a month, we can have a system that gives us quick analysis on our customer feedback.

Automation Components

We will be using AWS Comprehend for understanding the language and context of the feedback and its topic modeling to do a multiclass classification to categorize our product event. We'll use serverless techniques such as AWS Lambda to read the feedback from an S3 bucket, send it to get data from AWS Comprehend, and then parse out results so that we can predict the classification using a machine-learning concept known as topic modeling.

Natural language processing

Natural language processing uses machine learning to find relationships in text. Amazon Comprehend then identifies language, key phrases, and sentiment, and can then organize feedback into topics.

Machine learning + topic modeling

Machine learning will process those identifiers and predict what the feedback topics might be. For example, if a user leaves us a negative review, we can identify the negativity through sentiment, then categorize topics like broken item, warranty, etc. We can then classify the occurrences of each product and figure out what the root causes of product success or failure might be.

Building Our Automated System for Feedback Analysis

Okay, Lets begin!

If you haven't already, this is a good time to head over to Amazon and sign up using with AWS's free tier:

Set up an S3 bucket

Let’s start with setting up an S3 bucket. In the top menu, click Services and look for S3. We'll create a bucket named feedback-codeship.

Creating a Lambda Package

We're going to be creating our lambda package in this step. This will require some understanding or knowledge about Python and botocore. We'll be using Python 3.6 to build our Lambda function.

    from __future__ import print_function
    import boto3
    from decimal import Decimal
    import json
    from bson import json_util
    import urllib
    from random import random
    import base64
    from s3transfer.manager import TransferManager
    import datetime
    import time
    import os
    import os.path
    import sys
    import tempfile
    import botocore_deepinsight_beta
    import datetime
    #loading the functions from botocore
    #setting up boto
    root = os.environ["LAMBDA_TASK_ROOT"]
    sys.path.insert(0, root)
    import json
    import boto3
    import urllib.parse
    #s3 client setup
    s3client = boto3.resource('s3')
    #init comprehend
    deepinsight = boto3.client(service_name='comprehend',
                               region_name='us-west-2', use_ssl=True)
    #comprehend topic modeling attrs
    data_access_role_arn = "arn:aws:iam::008875219265:policy/....."
    input_doc_format = "ONE_DOC_PER_FILE"
    number_of_topics = 3
    #the input data config will be a content folder with the date on it so we can track the s3 uploads
    now =
    input_data_config = "s3://feedback-codeship/" + now.strftime("%Y-%m-%d")
    output_data_config = "s3://feedback-codeship/outputs"
     --------------- Main Lambda Handler ------------------
    def handler(event, context):
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.parse.unquote_plus(
            event['Records'][0]['s3']['object']['key'], encoding='utf-8')
            object = s3client.Object(bucket, key)
            # create temp file with the review and download the review to it
          file = open('/tmp/analysis-out.csv', 'w')
          with open('/tmp/analysis.txt') as f:
              for line in f:
                  text = line
                  ts = time.time()
                  st = datetime.datetime.fromtimestamp(
                      ts).strftime('%Y-%m-%d %H:%M:%S')
                  file.write(bucket + '/' + key + str(','))
                  file.write(str(st) + str(','))
                  # sentiment response
                  sentiment_response = deepinsight.detect_sentiment(
                      Text=text, LanguageCode='en'|'es')
                  # key phrases response
                  phrase_response = client.detect_key_phrases(
                      Text=text, LanguageCode='en'|'es')
                  # Comprehend data and write to a csv row
                  file.write("Sentiment" + str(','))
                  file.write(str(sentiment_response['Sentiment']) + str(','))
                  phrases = phrase_response["KeyPhrases"]
                  threshold = 0.80
                  file.write("Key Phrases" + str(','))
                  for phrase in phrases:
                      if (phrase['Score']) >= threshold):
                          file.write(str(phrase['Text']) + str(','))
                     file.write("Topics" + str(','))
                     topic_response = comprehend.start_topics_detection_job(NumberOfTopics=number_of_topics,
                    job_id = topic_response["JobId"]
                    describe_topics_detection_job_result = comprehend.describe_topics_detection_job(JobId=job_id)
                    list_topics_detection_jobs_result = comprehend.list_topics_detection_jobs()
                    file.write( json.dumps(list_topics_detection_jobs_result, default=json_util.default) + str(','))
          s3client.meta.client.upload_file('/tmp/analysis-out.csv', Bucket=bucket, Key='analysis/' + key + '.csv')
          return 'Analysis Successfully Uploaded'
      except Exception as e:
          file.write('Error: '.format(
              key, bucket) + str(','))
          file.write(str(e) + str(','))
          raise e

We start by setting the imports that we'll need and then setting up the initializer for botocore and s3client. Then we'll set the variables we'll need for the topic modeling attributes -- you can just fill in your own with my example.

The method name should be handler so that AWS Lambda can call AutoFeedback.handler when an event is triggered in our S3 bucket. Then we get the object from the event and content type and try to our resultants based off of the event.

We have also set up our output format to be uploaded as a csv like this:

bucket_name, timestamp, Sentiment, sentiment_response, KeyPhrases, item1, item2, item3, Topics, topics(json)

Then we'll take the Lambda package and zip it up:

zip -FSr . -x *.git* *bin/\* *.zip.

We will then save the zip to our S3 bucket.

Setting Up Permissions and Lambda Functions with CloudFormation

This will be a fun setup to have our yml file do all of our heavy lifting and get the Lambda linked with the correct permissions.

Using the Lambda package we zipped up to our S3 bucket, we'll now use that path in our setup.yml. CodeUri is the place to put the S3 path of our package, so that the Lambda we’re trying to create knows its function.


    AWSTemplateFormatVersion: '2010-09-09'
    Description: Automated way of understanding feedback.
    Transform: 'AWS::Serverless-2016-10-31'
        Type: AWS::S3::Bucket
          CodeUri: [S3 URL]
          Description: "Get Feedback Analysis"
          Handler: auto_feedback.handler
          MemorySize: 128
                Sid: "comprehend"
                Effect: Allow
                  - comprehend:*
                Resource: "*"
                Sid: "s3"
                Effect: Allow
                  - s3:*
                Resource: !Sub "arn:aws:s3:::${S3}/*"
              bucket: !Ref S3
          Runtime: python3.6
          Timeout: 20
        Type: AWS::Serverless::Function
          Ref: "S3"
          Ref: AWS::Region

This file that we're uploading into AWS CloudFormation will set up our permissions and Lambda functions. Here's a really good reference about the template anatomy of our Setup.yml.

Then we will need to check the following options and check the transforms and be able to execute CloudFormation to get us ready to add our event to the S3 bucket!

Back to the bucket

Now we need to navigate back to our S3 bucket:

Click the Properties Tab, then click Events, and finally add the following properties:

Time to Test

Upload your feedback as a .txt file into the directory of the S3 bucket or feel free to use this sample feedback text:

November 18, 2017 Color: Light Silver_|_Verified Purchase

Ultimately this will be a two-part review. The first part deals with appearance, quality, packaging, and shipping. The product is substantial, heavy, well boxed, and protected. Should there be a need, the packaging is suitable for return purposes. There was no need for that in this case, as everything arrived in pristine condition. Set up was simple and easy, directions/user manual do show evidence of translation issues and are a little rough but still effective.

The second part of this review will be posted after using. Will assess cleanup, function, ease of use, etc. More to follow.

And see our output file with some of the analysis we were looking for in our created output folder:

auto_feedback, 2018-05-10T14:10:10+00:00, Sentiment, POSITIVE, KeyPhrases, appearance, quality, packing, shipping, Topics, topics(json)

And there we are. A simple system that offers insight from customer feedback through machine learning and natural language processing, with AWS Lambda, Comprehend, and CloudFormation.

