Improving Conversations using the Perspective API

I recently came across an article by Rory Cellan-Jones about a new technology from Jigsaw, a development group at Google focused on making people safer online through technology. At the time they’d just released the first alpha version of what they call The Perspective API. It’s a machine learning tool that is designed to rate a string of text (i.e. a comment) and provide you with a Toxicity Score, a number representing how toxic the text is.

The system learns by seeing how thousands of online conversations have been moderated and then scores new comments by assessing how “toxic” they are and whether similar language had led other people to leave conversations. What it’s doing is trying to improve the quality of debate and make sure people aren’t put off from joining in.

As the project is still in its infancy it doesn’t do much more than that. Still, we can use it!

Starting with the API

To get started with using the API, you’ll need to request API access from their website. I managed to get access within a few days. If you’re interested in playing with this yourself, know that you might need to wait it out until they email you back. Once you get the email saying you have access, you’ll need to log in to the Google Developer Console and get your API key. Create your credentials with the amount of security you’d like and then you’re ready to get going!

Now you’ll need to head over to the documentation on GitHub to learn a bit more about the project and find out how it actually works. The documentation includes lots of information about what features are currently available and what they’re ultimately designed to achieve. Remember: the main point of the API is to provide a score of how toxic a comment is, so to do anything extra with that information will require some work.

Getting a Score with cURL

Let’s use PHP’s cURL command to make the request and get the score. If you’re not used to cURL, don’t panic; it’s relatively simple to get the hang of. If you want to try it within WordPress, it’s even easier because there are a native WordPress helper functions you can use. Let’s start with the standard PHP method.

Whilst we walk through this, it’s a good idea to have the PHP documentation open to refer to. To understand the fundamentals of cURL, we’ll go through a couple of the core options we may need to use.

$params = array(
  'comment' => array(
    'text' => 'what a stupid question...',
    'languages' => array(
      'en'
    ),
    'requestedAttributes' => array(
      'TOXICITY' => ''
    )
  )
);

$params = json_encode($params);

$req = curl_init();
curl_setpot($req, 'CURLOPT_URL', 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze');
curl_setpot($req, 'CURLOPT_POSTFIELDS', $params);
curl_setopt($req, CURLOPT_HTTPHEADER, array('Content-Type: application/json');
curl_exec($req);
curl_close($req);

The above seven lines simply perform different actions when you want to make a cURL request to a server. You’ll need to initialize the cURL request, set the options for the request, execute it, then close the connection. You’ll then get your comment data back from the server in the form of JSON data which is handy for a number reasons.

Send An Ajax Request

As you get the response from the API in JSON format, you can also make an Ajax request to the API as well. This is handy if you don’t want to dive too much into PHP and the method of using cURL requests. An example of an Ajax request (using jQuery) would look something like the following:

$.ajax({

        data: {
                comment: {
                        text: "this is such a stupid idea!!"
                },
                languages: ["en"],
                requestedAttributes: {
                        TOXICITY: {}
                }
        },
        type: 'post',
        url: 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=YOUR-API-KEY',
        success: function(response) {

                console.log(response);

        }

});

The data we get back is now logged to the console ready for us to debug it. Now we can decode the JSON data into an array and do something with it. Make sure you include your API key at the end of the URL in the Ajax request too otherwise it won’t work! Without it; you’ll get an error about your authentication being invalid. Also, you don’t have to stop here. You could also take the example above a step further and log the score in a database as soon as you’ve got the data back, or provide feedback to the user on the front-end in the form of an alert.

The WordPress Way

If you’re using WordPress (which is relevant here since WordPress has comment threads you might want to moderate) and you want to make a cURL request to the Perspective API, then it’s even simpler. Using the Toxic Comments plugin as an example, you can do the following instead thanks to WordPress’ exhaustive built-in functions. You won’t need to do any of the following if you use the plugin, but it’s worth explaining what the plugin does behind the scenes to achieve what we want to do here.

$request = wp_remote_post($arguments, $url);

This will make a post request to the external resource for us without doing much leg work for it. There are other functions that you can use too, like a get request but we don’t need to think about that right now. You then need to use another function to get the requested data back from the server. Yes, you’re completely right. WordPress has a function for that:

$data = wp_remote_retrieve_body($request);

So that’s great, but how do we actually use the API to get the data we want? Well, to start with if you just want to get the overall toxicity score, you’ll need to use the following URL which will ask the API to read the comment and score it. It also has your API key at the end which you need to authenticate your request. Make sure you change it to yours!

https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=YOUR-API-KEY

It looks quite plain and if you visit it, it’ll take you to a 404 page. But if you make a cURL request to it, either through your favorite CMS or via a simple PHP script, you’ll end up getting data that might look similar to this:

{
  "attributeScores": {
    "TOXICITY": {
      "summaryScore": {
        "value": 0.567890,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ]
}

The score you’ll get back from the API will be a number as a decimal. So if a comment gets a score of 50% toxicity, the score you’ll actually get back from the API will be 0.5. You can then use this score to manipulate the way the comment is stored and shown to the end user by marking it as spam or creating a filter to let users show less or more toxic comments, much like Google has done in their example.

There are other bits of useful data you may want to look into as well. Things such as the context of the comment which can help you understand the intent of the comment without reading it firsthand.

Ultimately, with this kind of data we can expect to receive, it makes it possible to filter out certain comments with particular intent and provide a nicer comment area where trolls can often take over. Over time when the API becomes more developed, we should expect the scoring to be more robust and more accurate on the analysis of the comment we send it.

Privacy and Censorship

This is a pretty hot topic these days. I can imagine some pushback on this, particularly because it involves sending your data to Google to have it analyzed and judged Google computers, which ultimately does have effect on your voice and ability to use it. Personally, I think the idea behind this is great and it works very well in practice. But when you think about it’s implementation on popular news websites and social media platforms, you can see how privacy and censorship could be a concern.

The Perspective API makes a great effort to score comments based on a highly complex algorithm, but it seems that there is still a long way to go yet in the fight to maintain more civil social spaces online.

Until then, play around with the API and let me know what you think! If you’re not up for writing something from scratch, there are some public client libraries available now in both Node and Python so go for it! Also, remember to err on the side of caution as the API is still in an alpha phase for now so things may break. If you’re feeling lazy, check out the quick start guide.

Jack Turner

# August 11, 2017

Perspective is a complete nightmare.

https://twitter.com/sashageffen/status/895650856011317251

This is a machine for reinforcing biases.

Google’s new AI-powered comment filter has a pro-oligarchy bias. pic.twitter.com/yKZrfZoGFw

— Yasha Levine (@yashalevine) August 10, 2017

Creepy biases, that we don’t fully understand.

Please don’t use this on a website that will be used by human.

Daniel James

Permalink to comment# August 11, 2017

It’s definitely not 100% accurate, but it’s an interesting basis for other people to build upon though. I’ve tried it out with a few variations of phrases and it tends to give a score you’d expect, but from those examples it’s clear it does have a particular bias, whether intentional or not.
Jack Turner

Permalink to comment# August 13, 2017

But what does it even mean for a completely opaque tool like this to be “100% accurate”? How do you define accuracy? Accurate for whom? People leaving comments (like this one) or the powerful ad company that made the system?

Lots of other questions: What training data was used? What biases are in those data sets? How was that accounted for? What are the social repercussions of releasing a system to the public that may further entrance biases?

While we try to ask and answer these questions it’s a pretty good idea not to use this stuff in the wild. I’m a straight white cis man. Jigsaw seems to strongly favor the status quo. Aka me. I don’t think we should endorse a tool that’s gonna reinforce the advantage of those with the most power.

Mike Hairston

# August 22, 2017

This is really disturbing. It reinforces the line of thinking that technology can solve any problem, even squishy complex things like language and social interactions. Similar to the BS “fake news detection” that Facebook and others are hawking currently.

Software can only be as impartial as its makers.