Content Hash List

The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.

Key features and use cases

The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.

The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.

Open API References:

Supported Algorithms

Algorithm	Description
`MD5`	A widely used cryptographic hash function producing a 128-bit hash value.
`SHA256`	A cryptographic hash function generating a 256-bit hash value, part of SHA-2.
`SHA512`	A cryptographic hash function generating a 512-bit hash value, part of SHA-2.
`PDQ`	A perceptual hash algorithm optimized for image similarity detection.

Hash List by Ideology

Once you're authenticated you can use your JWT to access the Content Hash List

Python
TypeScript
cURL

import requests

url = "https://app.terrorismanalytics.org/hash-list/v2/all?<params>"

headers = {
    "Authorization": f'Bearer {token}"
    "Content-Type": "application/json"
}

response = requests.get(url, headers=headers)
print(response.json())

const url = "https://app.terrorismanalytics.org/hash-list/v2/all?<params>"

const makeRequest = async (url: string) => {
  const response = await fetch(url, {
    method: 'GET',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json'
    },
  });

  const data = await response.json();
  console.log(data);
};
const result = await makeRequest(url)

curl -X GET \
  https://app.terrorismanalytics.org/hash-list/v2/all?<params> \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \

This Hash List API endpoint retrieves a hash list file filtered by a specified ideology.

Query Parameters

Parameter	Type	Description
`ideology`	`'islamist' \| 'far-right' \| 'all'`	Specifies the ideology to filter the results.
`limit`	`Number`	Number of results per page (default: 1000)
`offset`	`Number`	Starting position for pagination
`order`	`'asc' \| 'desc'`	Sort order for results
`after`	`'{UTC},{id}'`	Query from the last piece of content you ingested. Each response will include the `checkpoint` string.
`file_type`	`One supported file type eg 'jpg'`	Returns the hash list filtered by a provided supported file type
`algorithm`	`One supported alogorithm eg 'MD5'`	Returns the hash list filtered by a provided supported algorithm

Checkpointing

The API implements checkpointing using timestamp and ID pairs. You can request to pick up where you left off using after={checkpoint} query paramater

Response

The Hash List endpoint returns the following data on each request:

Parameter	Type	Description
`count`	`Integer`	Total number of hash records available
`next`	`String`	URL of the next page results
`previous`	`String`	URL of the previous page results (null if first page)
`checkpoint`	`String`	Timestamp-based checkpoint identifier for synchronization
`results`	`Array`	Array of Hash objects

Hash Object Fields

Parameter	Type	Description
`hash_digest`	`String`	The computed hash value
`algorithm`	`'MD5' \| 'SHA256' \| 'SHA512' \| 'PDQ'`	The algorithm used to generate the hash
`ideology`	`'islamist' \| 'far-right' \| 'all'`	Content classification category
`file_type`	`String`	Source file format
`deleted`	`Boolean`	If the file has been removed from the system
`updated_on`	`Float`	Unix timestamp of the last update

Pagination

The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next and previous URLs provided in the response.

Example Response

{
    "count": 19676,
    "next": "http://app.terrorismanalytics.org/hash-list/v2/all?{params},
    "previous": null,
    "checkpoint": "1730213563.621023,123",
    "results": [
        {
            "hash_digest": "000aaabbbccc111dddeeefff333",
            "algorithm": "MD5",
            "ideology": "",
            "file_type": "mp4",
            "deleted": false,
            "updated_on": 1730204429.302388,
            "id": 1
        }
        ...
    ]
}

Implementation Notes

Multiple hashing algorithms provide redundancy and enhanced detection capabilities
Checkpoint field enables efficient delta updates for client-side caching
Each hash entry includes metadata for content categorization and tracking
Real-time updates reflected through updated_on timestamps
Deleted flag allows for soft deletion while maintaining hash history

Best Practices

Implement local caching using the checkpoint mechanism
Process updates incrementally using the pagination system
Consider implementing parallel processing for multiple hash algorithms
Store hash values in their original format to maintain precision
Monitor the deleted flag for deprecated hash values

Usage With Threat Exchange

If you want to use the Hash List through Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.

Step 1 - Install threat exchange

$ pip install threatexchange

Step 2 - Configure the default credentials

$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'

Step 3 - Set up config

$ threatexchange config collab edit tat --create 'TAT'

Step 4 - Fetch hashes with verbose logging

$ threatexchange -v fetch

Step 5 - View dataset

$ threatexchange dataset

Step 6 - Match a piece of content

$ threatexchange match ~/path/to/image.jpg

For more information on Threat Exchange integrations see the docs

Open API References:

Supported File Types

File Type	MIME Type
`doc`	`application/msword`
`docx`	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`
`gif`	`image/gif`
`html`	`text/html`
`jpeg`	`image/jpeg`
`jpg`	`image/jpeg`
`m4a`	`audio/mp4`
`m4b`	`audio/mp4`
`m4v`	`video/x-m4v`
`mov`	`video/quicktime`
`mp3`	`audio/mpeg`
`mp4`	`video/mp4`
`oga`	`audio/ogg`
`pdf`	`application/pdf`
`png`	`image/png`
`txt`	`text/plain`
`webm`	`video/webm`
`webp`	`image/webp`

Key features and use cases​

Supported Algorithms​

Hash List by Ideology​

Query Parameters​

Checkpointing​

Response​

Hash Object Fields​

Pagination​

Example Response​

Implementation Notes​

Best Practices​

Usage With Threat Exchange​

Supported File Types​