Warspy

Methodology

Data Sources

Warspy ingests from two free, public APIs:

We store only titles, URLs, metadata, and brief snippets. No full article content is reproduced. All outbound links open the original publisher.

Deduplication & Clustering

Multiple sources often report the same event. We cluster overlapping reports deterministically using a weighted similarity score:

sim = 0.35 × textSim(title, clusterHeadline)
    + 0.25 × jaccardSim(keywords, clusterKeywords)
    + 0.25 × timeProximity(reportedAt, clusterUpdated)
    + 0.15 × geoProximity(distanceKm)

If similarity ≥ 0.70, the report is attached to the existing cluster. Otherwise a new cluster is created. No machine learning is used.

Scoring Formula

Each cluster is assigned a score (0–100):

score = 0.45 × severity + 0.35 × credibility + 0.20 × recency

Confidence Labels

High

≥3 distinct source domains, or ReliefWeb (UN-verified) is among the sources. Does not mean the event is verified — it means multiple independent outlets have reported it.

Med

2 distinct source domains. Corroborated but limited.

Low

Single source. Treat with caution; may be preliminary or unverified.

Summaries

All summaries are extractive only — sentences are drawn directly from article titles and the first 500 characters of ReliefWeb body text. No language model or paraphrasing is applied. "According to [source]" prefixes identify which outlet provided each sentence.

Limitations

Legal Note

Warspy is an aggregator. We store only titles, URLs, and brief metadata. No full article content is cached or reproduced. All article links open the original publisher. GDELT data is published under an open license. ReliefWeb content is provided under Creative Commons.