Skip to main content

Content Hash List

The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.

Key features and use cases

The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.

The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.

Open API References:

Hash List by Ideology

Once you're authenticated you can use your JWT to access the Content Hash List

import requests

url = "https://app.terrorismanalytics.org/hash-list/v2/all?<params>"

headers = {
"Authorization": f'Bearer {token}"
"Content-Type": "application/json"
}

response = requests.get(url, headers=headers)
print(response.json())

This Hash List API endpoint retrieves a hash list file filtered by a specified ideology.

Query Parameters

ParameterTypeDescription
ideology'islamist' | 'far-right' | 'all'Specifies the ideology to filter the results.
limitNumberNumber of results per page (default: 1000)
offsetNumberStarting position for pagination
order'asc' | 'desc'Sort order for results
after'{UTC},{id}'Query from the last piece of content you ingested. Each response will include the checkpoint string.

Checkpointing

The API implements checkpointing using timestamp and ID pairs. You can request to pick up where you left off using after={checkpoint} query paramater

Response

The Hash List endpoint returns the following data on each request:

ParameterTypeDescription
countIntegerTotal number of hash records available
nextStringURL of the next page results
previousStringURL of the previous page results (null if first page)
checkpointStringTimestamp-based checkpoint identifier for synchronization
resultsArrayArray of Hash objects

Hash Object Fields

ParameterTypeDescription
hash_digestStringThe computed hash value
algorithm'MD5' | 'SHA256' | 'SHA512' | 'PDQ'The algorithm used to generate the hash
ideology'islamist' | 'far-right' | 'all'Content classification category
file_typeStringSource file format
deletedBooleanIf the file has been removed from the system
updated_onFloatUnix timestamp of the last update

Pagination

The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next and previous URLs provided in the response.

Example Response

{
"count": 19676,
"next": "http://app.terrorismanalytics.org/hash-list/v2/all?{params},
"previous": null,
"checkpoint": "1730213563.621023,123",
"results": [
{
"hash_digest": "000aaabbbccc111dddeeefff333",
"algorithm": "MD5",
"ideology": "",
"file_type": "mp4",
"deleted": false,
"updated_on": 1730204429.302388,
"id": 1
}
...
]
}

Implementation Notes

  • Multiple hashing algorithms provide redundancy and enhanced detection capabilities
  • Checkpoint field enables efficient delta updates for client-side caching
  • Each hash entry includes metadata for content categorization and tracking
  • Real-time updates reflected through updated_on timestamps
  • Deleted flag allows for soft deletion while maintaining hash history

Best Practices

  1. Implement local caching using the checkpoint mechanism
  2. Process updates incrementally using the pagination system
  3. Consider implementing parallel processing for multiple hash algorithms
  4. Store hash values in their original format to maintain precision
  5. Monitor the deleted flag for deprecated hash values

Usage With Threat Exchange

If you want to use the Hash List through Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.

Step 1 - Install threat exchange

$ pip install threatexchange

Step 2 - Configure the default credentials

$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'

Step 3 - Set up config

$ threatexchange config collab edit tat --create 'TAT'

Step 4 - Fetch hashes with verbose logging

$ threatexchange -v fetch

Step 5 - View dataset

$ threatexchange dataset

Step 6 - Match a piece of content

$ threatexchange match ~/path/to/image.jpg

For more information on Threat Exchange integrations see the docs

Open API References: