Content Hash List
The Content Hash Fingerprinting API provides robust access to a comprehensive database of content fingerprints designed for advanced content moderation systems. The API leverages multiple hashing algorithms including MD5, SHA256, SHA512, and PDQ to ensure maximum coverage and accuracy in content identification.
Key features and use cases
The TCAP Archive is a repository of known terrorist or violent extremist content (TVEC) media files, including images, videos and documents. The TCAP Archive Hash List API allows platforms to ingest the hashes produced from these media files in bulk, so they can use them in their content moderation processes.
The TCAP Archive’s hashes are distinct from any existing TVEC hash lists, as they complement Tech Against Terrorism’s proactive monitoring of terrorist internet usage by its team of open-source intelligence specialists. By leveraging this expertise, alongside a suite of automated monitoring capabilities, the TCAP Archive hash list reflects content created and uploaded over a number of years by a range of violent Islamist and violent far-right terrorist entities.
Open API References:
Hash List by Ideology
Once you're authenticated you can use your JWT to access the Content Hash List
- Python
- TypeScript
- cURL
import requests
url = "https://app.terrorismanalytics.org/hash-list/v2/all?<params>"
headers = {
"Authorization": f'Bearer {token}"
"Content-Type": "application/json"
}
response = requests.get(url, headers=headers)
print(response.json())
const url = "https://app.terrorismanalytics.org/hash-list/v2/all?<params>"
const makeRequest = async (url: string) => {
const response = await fetch(url, {
method: 'GET',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
});
const data = await response.json();
console.log(data);
};
const result = await makeRequest(url)
curl -X GET \
https://app.terrorismanalytics.org/hash-list/v2/all?<params> \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
This Hash List API endpoint retrieves a hash list file filtered by a specified ideology.
Query Parameters
Parameter | Type | Description |
---|---|---|
ideology | 'islamist' | 'far-right' | 'all' | Specifies the ideology to filter the results. |
limit | Number | Number of results per page (default: 1000) |
offset | Number | Starting position for pagination |
order | 'asc' | 'desc' | Sort order for results |
after | '{UTC},{id}' | Query from the last piece of content you ingested. Each response will include the checkpoint string. |
Checkpointing
The API implements checkpointing using timestamp and ID pairs. You can request to pick up where you left off using after={checkpoint}
query paramater
Response
The Hash List endpoint returns the following data on each request:
Parameter | Type | Description |
---|---|---|
count | Integer | Total number of hash records available |
next | String | URL of the next page results |
previous | String | URL of the previous page results (null if first page) |
checkpoint | String | Timestamp-based checkpoint identifier for synchronization |
results | Array | Array of Hash objects |
Hash Object Fields
Parameter | Type | Description |
---|---|---|
hash_digest | String | The computed hash value |
algorithm | 'MD5' | 'SHA256' | 'SHA512' | 'PDQ' | The algorithm used to generate the hash |
ideology | 'islamist' | 'far-right' | 'all' | Content classification category |
file_type | String | Source file format |
deleted | Boolean | If the file has been removed from the system |
updated_on | Float | Unix timestamp of the last update |
Pagination
The API implements cursor-based pagination using timestamp and ID pairs. Results can be traversed using the next
and previous
URLs provided in the response.
Example Response
{
"count": 19676,
"next": "http://app.terrorismanalytics.org/hash-list/v2/all?{params},
"previous": null,
"checkpoint": "1730213563.621023,123",
"results": [
{
"hash_digest": "000aaabbbccc111dddeeefff333",
"algorithm": "MD5",
"ideology": "",
"file_type": "mp4",
"deleted": false,
"updated_on": 1730204429.302388,
"id": 1
}
...
]
}
Implementation Notes
- Multiple hashing algorithms provide redundancy and enhanced detection capabilities
- Checkpoint field enables efficient delta updates for client-side caching
- Each hash entry includes metadata for content categorization and tracking
- Real-time updates reflected through
updated_on
timestamps - Deleted flag allows for soft deletion while maintaining hash history
Best Practices
- Implement local caching using the checkpoint mechanism
- Process updates incrementally using the pagination system
- Consider implementing parallel processing for multiple hash algorithms
- Store hash values in their original format to maintain precision
- Monitor the deleted flag for deprecated hash values
Usage With Threat Exchange
If you want to use the Hash List through Threat Exchange you can create a collaboration configuration for our API, fetch and compare PDQ Image and MD5 video hashes.
Step 1 - Install threat exchange
$ pip install threatexchange
Step 2 - Configure the default credentials
$ threatexchange config api tat --credentials '<TCAP_USERNAME>' '<TCAP_PASSWORD>'
Step 3 - Set up config
$ threatexchange config collab edit tat --create 'TAT'
Step 4 - Fetch hashes with verbose logging
$ threatexchange -v fetch
Step 5 - View dataset
$ threatexchange dataset
Step 6 - Match a piece of content
$ threatexchange match ~/path/to/image.jpg
For more information on Threat Exchange integrations see the docs
Open API References: