Data Connections

The real story isn't in any single dataset. It's in the links between them. These queries cross-reference public records that are rarely seen together.

How this works
1
We collect public data from 15+ federal sources — Congress.gov, Regulations.gov, FEC, Senate LDA, GovInfo, USAspending, and more.
2
We normalize names, IDs, and organizations across datasets. A senator's bioguide ID links their votes, trades, speeches, and campaign donors.
3
We run cross-reference queries that surface patterns no single dataset reveals on its own. Every query links to the live data so you can verify.
Influence & Testimony

Who testifies AND lobbies?

Organizations testify before Congress as expert witnesses while simultaneously paying lobbyists to influence the same committees. Both activities are legal and publicly disclosed — but the connection between them is almost never surfaced.

Hearings × Lobbying
Hearing Witnesses Who Also Lobby Congress
Organizations that testified as expert witnesses at congressional hearings while also paying lobbyists. They appear in the public record as neutral experts, but the lobbying record tells a different story.
hearing_witnesses lobbying_activities
4,035 organizations overlap
Explore this data
Data Sources
  • Hearing witnesses: GovInfo MODS XML metadata for congressional hearings. Each hearing lists witnesses with name and organization.
  • Lobbying clients: Senate Lobbying Disclosure Act (LDA) filings via lda.senate.gov API. Each quarterly LD-2 filing lists the client organization.
Matching Method
Exact case-insensitive string match: UPPER(TRIM(witness_organization)) = UPPER(TRIM(client_name)). This is a pre-computed materialized table (witness_lobby_overlap) rebuilt during each database update.
Limitations
Exact name matching means we miss cases where the same organization uses slightly different names across filings (e.g., "American Petroleum Institute" vs "API"). Fuzzy matching would increase coverage but also increase false positives. We chose precision over recall.
Comments × Lobbying
Organizations That Comment on Rules AND Lobby
When a company submits a public comment on an EPA rule while also paying a lobbyist on the same issue, that's dual-channel influence on the same regulatory process. Both are public record.
comment_details lobbying_activities
788 organizations overlap (growing)
Explore this data
Data Sources
  • Comment organizations: Regulations.gov API comment detail records. The organization field is self-reported by the commenter.
  • Lobbying clients: Senate LDA quarterly activity reports (LD-2 filings) with client_name.
Matching Method
Exact case-insensitive match: UPPER(TRIM(organization)) = UPPER(TRIM(client_name)). Materialized as commenter_lobby_overlap.
Limitations
Currently based on ~429K comment details (about 4.4% of 9.76M total comments, up from 2% in March 2026). As full comment text collection continues against the Regulations.gov 1,000-req/hr rate limit, this overlap count grows daily. Organization names are self-reported and unstandardized, so matches depend on commenters using their organization's exact legal name.
Money & Committees

Follow the money to the committee room

Committee assignments determine which industries a member of Congress oversees. Financial disclosures reveal what they trade and who funds them. We connect the two.

Committees × Stock Trades
Stock Trading by Committee Members
Members of Congress actively trading stocks while serving on committees that oversee related industries. Not necessarily illegal, but a pattern worth watching.
committee_memberships congress_members stock_trades
61,895 trades by sitting members
Explore this data
Data Sources
  • Committee assignments: congress-legislators GitHub repository (unitedstates project). Maps bioguide_id to committee_id.
  • Stock trades: House Financial Disclosure PTR PDFs parsed via pdftotext, plus Senate eFD periodic transaction reports scraped from efdsearch.senate.gov. Both are government sources.
Matching Method
Members are linked via bioguide_id, the universal congressional identifier. This is deterministic — no fuzzy matching. The query joins committee_memberships to stock_trades through congress_members.
Limitations
100% of stock trades are now matched to a bioguide_id; an April 2026 rework of the name-and-chamber matcher closed the previous 14.5% gap. As of April 2026, trades also carry an SEC CIK identifier for the traded issuer on 67.7% of records, populated via a ticker-SIC crosswalk. Committee assignments reflect current membership only; historical rotations are not tracked.
Committees × Campaign Finance
Top Donors to Committee Members
Which PACs and organizations fund the members of specific committees? Cross-references FEC contribution records with current committee assignments and bioguide IDs.
committee_memberships fec_crosswalk fec_contributions
4.4M contribution records linked
Explore this data
Data Sources
  • Contributions: FEC bulk data from fec.gov S3 bucket. Committee-to-candidate contributions across all election cycles.
  • Member linkage: fec_candidate_crosswalk maps FEC cand_id to congressional bioguide_id (1,712 matched members).
  • Committees: congress-legislators current membership data.
Matching Method
Four-table join: committee_membershipsfec_candidate_crosswalk (via bioguide_id) → fec_contributions (via cand_id) → fec_committees (via cmte_id). Pre-computed as committee_donor_summary (threshold: $10,000+ total).
Limitations
The FEC-to-bioguide crosswalk covers 1,712 members. Contributions are PAC-to-candidate only (not individual donors — see the PII section of the methodology page for why individual-donor data is held locally but not deployed). Committee assignments are current session only — we don't track historical rotations, so a member's past committee work won't show here.
Legislative Activity & Trading

Trading around legislation

Members of Congress trade stocks and also give floor speeches, sponsor bills, and vote on legislation that can move markets. We connect the timing.

Floor Speeches × Stock Trades
Floor Speeches Within 7 Days of Trades
Members who gave floor speeches within a week of making stock trades. This doesn't prove anything — but the temporal proximity is a pattern that researchers and journalists need to be able to see.
stock_trades crec_speakers congressional_record
7 days trade-to-speech window
Explore this data
Data Sources
  • Stock trades: House PTR PDFs + Senate eFD reports. Transaction dates, tickers, amounts.
  • Floor speeches: GovInfo Congressional Record (CREC) daily packages, 1994–present. MODS XML provides speaker bioguide IDs for 99.6% of entries.
Matching Method
Join stock_trades to crec_speakers via bioguide_id, then filter where ABS(julianday(transaction_date) - julianday(speech_date)) <= 7. Pre-computed as speeches_near_trades.
Limitations
Temporal proximity does not imply causation. Members give many speeches and make many trades — some overlap is expected by chance alone. We chose 7 days as a meaningful window, but this is an editorial choice. The speech content is not automatically matched to the traded stock's sector; users must evaluate relevance manually.
Committee Jurisdiction × Trades
Trading in the Sectors They Regulate
Members of Congress sit on committees with jurisdiction over specific industries. Using SEC EDGAR's SIC classifications for 2,027 traded tickers, we check whether members are trading stocks in sectors their committees regulate.
committee_memberships committee_jurisdiction (SIC ranges) ticker_sic (SEC EDGAR) stock_trades
2,027 tickers classified by SIC code via SEC EDGAR
Explore this data
Data Sources
  • Committee jurisdiction mapping: A curated reference table (view it) that maps 27 congressional committees to the SIC code ranges under their primary jurisdiction.
  • Ticker SIC codes: Downloaded from SEC EDGAR — each company's CIK looked up via data.sec.gov/submissions/ to get its SIC industry classification. 2,027 traded tickers classified across 333 unique SIC codes.
  • Stock trades: Same House PTR + Senate eFD sources as above.
  • Committee assignments: congress-legislators GitHub data.
Matching Method
For each committee member, we look up their stock trades, classify each ticker by its SEC-assigned SIC code, then check if that SIC code falls within any of the committee's jurisdiction SIC ranges. The join is: CAST(ticker_sic.sic_code AS INTEGER) BETWEEN committee_sic_ranges.sic_start AND sic_end.
Limitations
SIC codes cover publicly traded companies only — mutual funds, ETFs, and ADRs don't have SIC classifications and are excluded. The committee-to-SIC mapping currently covers primary jurisdiction only; some committees (Appropriations, Budget, Foreign Affairs) are omitted because their jurisdiction is cross-cutting rather than sector-specific. The mapping reflects editorial judgment about which SIC ranges fall under each committee.
Lobbying ↔ Legislation
Which Bills Get Lobbied the Most?
We parse bill numbers (H.R., S., etc.) from the "specific issues" text in 2.7M lobbying activity reports, then match them to legislation records. See which bills attract the most lobbying and from how many different clients.
lobbying_activities lobbying_bills legislation
2.7M lobbying reports text-mined
Explore this data
Data Sources
  • Lobbying activities: Senate LDA quarterly LD-2 filings. Each filing has a specific_issues free-text field describing lobbying activities.
  • Legislation: Congress.gov BILLSTATUS bulk XML. 376K+ bills across Congresses 93–119.
Matching Method
Regex extraction of bill references from specific_issues text. Patterns: H.R., S., H.J.Res., S.J.Res., H.Con.Res., S.Con.Res., H.Res., S.Res. followed by a number. Congress number inferred from filing year using (year - 1789) // 2 + 1. Matched to legislation table via constructed bill_id.
Limitations
Not all lobbying filings reference specific bills — many describe issues generally. Congress number inference from filing year may be off for filings near session boundaries. The regex doesn't catch informal references like "the infrastructure bill" or bill names without numbers.
Lobbying ↔ Congress
The Revolving Door: Former Members Who Lobby
Lobbying filings include "covered positions" — when a lobbyist previously held a government role. Cross-referencing with congress_members reveals which former legislators now lobby their former colleagues.
lobbying_lobbyists covered_position filter congress_members
180 former members matched to lobbying filings
Explore this data
Data Sources
  • Lobbying disclosures: Senate LDA filings include a covered_position field where lobbyists disclose former government roles. 1.85M records have this field populated.
  • Congress members: 12,765 historical and current members with bioguide IDs.
Matching Method
We filter lobbying records where the covered_position text indicates the lobbyist was a former member of Congress (matching patterns like "U.S. Senator", "U.S. Representative", "Member of Congress", "Former Member"). Then we join on UPPER(full_name) = lobbyist_name. For ambiguous names (e.g., multiple "Thomas Davis" in history), we pick the most recent member.
Limitations
Name matching misses ~50–80 members who go by nicknames in lobbying filings (e.g., "Billy Tauzin" vs. "William Tauzin", "Dick Gephardt" vs. "Richard Gephardt"). We plan to add a manual nickname mapping table to capture these. The position text is free-form and inconsistent, so some staffers who list their boss's title may be incorrectly included or excluded.
Oversight & Foreign Influence

Accountability and foreign interests in government

Cross-referencing GAO oversight reports with legislation, and foreign agent registrations with congressional hearing testimony.

GAO ↔ Legislation
GAO Oversight of Specific Laws
GAO reports reference specific public laws. Matching these to legislation reveals which laws have been subject to the most government accountability scrutiny.
gao_reports legislation
2,479 GAO report-to-bill matches via public law numbers
Data Sources
  • GAO reports: 16,500+ reports from GovInfo (1994–2008). Each includes a public_laws field listing referenced Public Laws.
  • Legislation: Congress.gov BILLSTATUS. Bills that became public laws have action text "Became Public Law No: X-Y."
Matching Method
Parse the comma-separated public_laws field using json_each(), strip the "Public Law " prefix, and join on legislation_actions.action_text to find the corresponding bill.
Limitations
GAO coverage is now dual-sourced: GovInfo for the historical archive through mid-2008, plus direct scraping from gao.gov for 2008–present. The combined gao_reports table holds 73,725 reports. Many GAO reports reference laws by USC section rather than Public Law number, which this matching doesn't capture. Approximately 4,990 reports have public law references, and 2,479 match to bills in our legislation table (which covers Congresses 93–119).
FARA ↔ Hearings
Foreign Agents Testifying at Hearings
FARA registrants are organizations representing foreign governments. Matching them to hearing witness lists reveals which foreign-agent organizations testified before Congress.
fara_registrants hearing_witnesses
83 FARA registrants matched to hearing witnesses
Explore this data
Data Sources
  • FARA registrants: 7,043 organizations registered under the Foreign Agents Registration Act, from DOJ bulk data.
  • Hearing witnesses: 109,000+ witness entries from GovInfo CHRG MODS XML, with organization names.
Matching Method
Exact name match on UPPER(TRIM(fara_registrants.name)) = UPPER(TRIM(hearing_witnesses.organization)). Foreign principals and countries are pulled in via the FARA registration number.
Limitations
Exact name matching is conservative — organizations that testify under slightly different names (e.g., "Hogan Lovells" vs "Hogan Lovells US LLP") won't match. The 83 matches represent a lower bound. Fuzzy matching could increase coverage but risks false positives.