Skip to content

Tutorial: Visual Content Moderation for Trust and Safety

As a core product offering, coactive allows users to easily perform real-time monitoring at scale through millions of unstructured visual assets. In this notebook we'll demonstrate how you can use coactive's python SDK to perform real-time classification and analytics for content moderation.

1. Import dependencies

First, we import any necesary dependencies. Note that you will first need to install the coactive SDK package to your local python environment.

# From python core and other common packages
import os
from PIL import Image
from IPython.display import HTML
import pandas as pd

# From coactive SDK package
import coactive
from coactive.apis import ClassificationApi, QueryApi
from coactive.model.classification_request import ClassificationRequest

from coactive.model.query_response import QueryResponse
from coactive.model.query_request import QueryRequest

2. Authenticate

Next, we load the authentication and environment variables necessary for calling coactive's APIs. Note that these variables are environment specific and will be provided by Coactive.

1
2
3
4
5
6
7
8
from auth_credentials import COACTIVE_HOST, CLIENT_ID, CLIENT_SECRET
from env_variables import CLIENT_EMBEDDING_ID

access_token = f"{CLIENT_ID}:{CLIENT_SECRET}"
configuration = coactive.Configuration(
    host = COACTIVE_HOST,
    access_token = access_token,
)

3. Real-time analytics

coactive's real-time analytics engine allows you to perform classification using our Classification API. Additionally, you can query your unstructured visual data using SQL using our Query API. More specifically, coactive:

  • provides a structured view of your visual data (i.e. rows = visual asset, columns = metadata) giving you the ability to run analytical SQL queries
  • can generate metadata on demand for your visual assets using a library of visual concepts
  • gives you full control over these visual concepts used in classification and queries, allowing you to seamlessly create and update these concepts as your tasks or visual data change over time

Below, we highlight the core functions of our real-time analytics engine to monitor a dataset of user-generated visual assets for problematic content.

We define a custom functions to make it simple to run multiple API requests using the SDK and to visually validate the results using this notebook. These custom functions can be found below.

import time
from typing import List

from helpers import get_images_df_from_api_response, image_formatter

def image_classification(image_queries: List[str]) -> None:
    '''
    Example function that displays the results of calling the Classification API.
    '''

    # Call ClassificationAPI
    with coactive.ApiClient(configuration) as api_client:

        try:
            classification_request = ClassificationRequest(
              embedding_id=CLIENT_EMBEDDING_ID,
              paths=image_queries,
            )
            api_response = ClassificationApi(api_client).classify_assets(
                classification_request)
        except coactive.ApiException as e:
            print("Exception when calling ClassificationApi: %s\n" % e)

    # Display results
    df = get_images_df_from_api_response(image_queries, api_response)
    display(HTML(df.to_html(formatters={'image': image_formatter},
                            escape=False,
                            index=False)))


def run_visual_sql_query(query: str) -> pd.DataFrame:
    '''
    Example function that runs a SQL query using the Query API.
    '''

    # Call QueryAPI to start query
    with coactive.ApiClient(configuration) as api_client:
        try:
            api_response = QueryApi(api_client).execute_query(
                QueryRequest(query=query, embedding_id=CLIENT_EMBEDDING_ID))
            api_response = api_response.to_dict()
        except coactive.ApiException as e:
            print("Exception when calling QueryApi: %s\n" % e)
    query_id = api_response['query_id']
    status = api_response['status']

    # Check query status until it is complete
    while status != 'Complete':
        # Wait
        time.sleep(5)

        # Check query status until complete
        with coactive.ApiClient(configuration) as api_client:
            try:
                query_response = QueryApi(api_client).get_query_by_id(query_id)
                query_response = query_response.to_dict()
            except coactive.ApiException as e:
                print("Exception when calling QueryApi: %s\n" % e)
        status = query_response['status']

    # Display results
    run_time = query_response['end_dt'] - query_response['created_dt'] 
    print(f'Query run time (hrs:min:sec) = {run_time}')
    return pd.DataFrame([row['data'] for row in query_response['results']['data']])

3.1 Classify visual assets

In this example, we perform real time classification of 5 images using an existing library of visual concepts for content moderation.

Below, we show the classification probability for the visual concepts that were flagged in at least one image (i.e. 'fight', 'handgun' and 'syringe') along with the classification tags.

1
2
3
4
5
6
7
image_dir = 'query_images'
image_queries = [
    os.path.join(image_dir, file) 
    for file in os.listdir(image_dir)
]

image_classification(image_queries)
Output:

image fight syringe handgun classification
0.000372 0.000253 0.000031 []
0.000033 0.002870 0.997537 [handgun]
0.000581 0.000090 0.001383 []
0.855864 0.000447 0.000023 [fight]
0.000043 0.999982 0.169217 [syringe]

3.2 Run SQL queries over your visual data

In this example, we a few example queries to demonstrate how you can perform real-time analytical queries over your visual data to gain valuable insights.

3.2.1 Find additional examples

In this first example, we find the IDs and paths of 10 images that contain people fightig using the fight concept shown in the previous example.

1
2
3
4
5
6
7
8
sql_query = '''
    SELECT coactive_image_id, path
    FROM coactive_table
    WHERE fight = 1
    LIMIT 10
'''

run_visual_sql_query(sql_query)
Output:

Query run time (hrs:min:sec) = 0:00:04.226605
coactive_image_id path
0 a44e7d09-459f-46f7-8034-027342cecaaa s3://coactive-demo-datasets/trust_and_safety_1...
1 a01ec94e-1e93-416b-94d7-ed6bd0a041e2 s3://coactive-demo-datasets/trust_and_safety_1...
2 f51cf45c-d0d0-4c36-bdfc-24a07b30f8f6 s3://coactive-demo-datasets/trust_and_safety_1...
3 a1923c4f-2015-49c9-ba7d-2abbfdbbda94 s3://coactive-demo-datasets/trust_and_safety_1...
4 860bd2ce-7254-4a75-b1be-57d45dfca16b s3://coactive-demo-datasets/trust_and_safety_1...
5 3ae5251e-21a7-4d57-8d89-ed617d387603 s3://coactive-demo-datasets/trust_and_safety_1...
6 a8dc01d6-aa5a-493f-97e1-bcd10a3a95fb s3://coactive-demo-datasets/trust_and_safety_1...
7 37066042-e862-46af-b492-830e86a5c40d s3://coactive-demo-datasets/trust_and_safety_1...
8 e89facaf-d026-4e7f-ab38-04bc50b6cea9 s3://coactive-demo-datasets/trust_and_safety_1...
9 c12424aa-5dc6-436f-a55d-793234188cf8 s3://coactive-demo-datasets/trust_and_safety_1...

3.2.2 Count the number of occurrences

In this second example, we perform an aggregate count to find the number of images that contain either a handgun, rifle or shotgun. We define this custom count as weapon_count.

sql_query = '''
    SELECT count(coactive_image_id) as weapon_count
    FROM coactive_table
    WHERE
        handgun = 1
        OR rifle = 1
        OR shotgun = 1
'''

run_visual_sql_query(sql_query)
Output:

Query run time (hrs:min:sec) = 0:00:06.685277
weapon_count
0 181.0

3.2.3 Define custom classification metrics

In this third example, we show how you can find the top 10 images that maximize a custom classification metric (i.e. problematic_score) based on the classification probability of a subset of concepts that are particularly violent (i.e. blood), sexual (i.e. brassiere) and drug-related (i.e. syringe).

sql_query = '''
    SELECT
        coactive_image_id,
        path,
        GREATEST(
            blood_latest_prob,
            brassiere_latest_prob,
            syringe_latest_prob) as problematic_score
    FROM coactive_table_adv
    ORDER BY problematic_score DESC
    LIMIT 20
'''

run_visual_sql_query(sql_query)
Output:

Query run time (hrs:min:sec) = 0:00:08.380099
coactive_image_id path problematic_score
0 ecc5a391-85ae-47d4-a0c5-282c5bf80f1d s3://coactive-demo-datasets/trust_and_safety_1... 1.000000
1 4a683536-fca3-498b-b802-10a2a734951c s3://coactive-demo-datasets/trust_and_safety_1... 1.000000
2 00cb0324-bf9c-44d1-9bc1-51c7172e47b8 s3://coactive-demo-datasets/trust_and_safety_1... 1.000000
3 ff6e65b4-5229-45b7-95b1-56fff246d471 s3://coactive-demo-datasets/trust_and_safety_1... 0.999998
4 b0022a11-16f5-4c08-ba3e-9ad20e057fab s3://coactive-demo-datasets/trust_and_safety_1... 0.999998
5 0dd0143e-9eb8-44f3-936e-f1c3fcb8e1f3 s3://coactive-demo-datasets/trust_and_safety_1... 0.999997
6 96420eb5-34dd-4ede-b76c-b8eec6e810b7 s3://coactive-demo-datasets/trust_and_safety_1... 0.999994
7 e22a2d6c-f5a2-4a83-9ac8-88368595bbba s3://coactive-demo-datasets/trust_and_safety_1... 0.999990
8 d51ac9d7-8852-431a-92d0-83bccb5012ff s3://coactive-demo-datasets/trust_and_safety_1... 0.999981
9 18ceef73-11c8-4358-a1d5-1e0259391e95 s3://coactive-demo-datasets/trust_and_safety_1... 0.999979
10 d89a94f1-84d1-43c1-b097-d04050f12173 s3://coactive-demo-datasets/trust_and_safety_1... 0.999979
11 7440c3ca-74d8-40c6-a014-b410fa043b7d s3://coactive-demo-datasets/trust_and_safety_1... 0.999978
12 c98035cb-7d66-484a-98a6-79e7efaaef1e s3://coactive-demo-datasets/trust_and_safety_1... 0.999972
13 d37d1d65-5da0-4490-a450-ab59c2c201fa s3://coactive-demo-datasets/trust_and_safety_1... 0.999969
14 2eb49b9a-9cb6-478f-9103-b885863e6c2b s3://coactive-demo-datasets/trust_and_safety_1... 0.999968
15 8703245e-13ed-4fdf-89a5-3c9511549d71 s3://coactive-demo-datasets/trust_and_safety_1... 0.999964
16 234330f4-5f27-4c87-bba0-234dcf8ae64c s3://coactive-demo-datasets/trust_and_safety_1... 0.999956
17 34a38325-b6d6-4e72-ab60-e9bea628eba5 s3://coactive-demo-datasets/trust_and_safety_1... 0.999953
18 14807279-afd1-4d62-8e49-232b6663f229 s3://coactive-demo-datasets/trust_and_safety_1... 0.999946
19 617be723-3ece-41ef-9281-6ed043e3da3f s3://coactive-demo-datasets/trust_and_safety_1... 0.999940