Builds a relationship-rich finance dataset for text-to-RDF-triple extraction
from SEC fund disclosures, the dataset for the thesis 'Magical RDF Triples and
how to synthetize them'.
- build_rdf_dataset.py: gold (N-CEN graphs), fetch (EDGAR prospectus prose,
all books per trust), samples (per-fund segmentation, marker + plain
serializations), split (trust-level 80/10/10, no leakage)
- score_baseline.py: no-model string-match baseline + strong-model scorer
- dataset_description.{tex,pdf}: scientific description of the dataset
- data/rdf_poc/gold_graphs.jsonl: structured gold knowledge graph (2025Q3)
- Large prose/sample files and raw SEC downloads are gitignored (reproducible)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
298 lines
15 KiB
Markdown
298 lines
15 KiB
Markdown
# ESMA Fund Data: Registers, APIs, and Reference Data Fields
|
|
|
|
## 1. Overview: ESMA's Fund Data Ecosystem
|
|
|
|
ESMA (European Securities and Markets Authority) maintains **six distinct data systems** relevant to fund data. Unlike the US SEC which centralizes prospectus filing and structured extraction, ESMA's fund data is **fragmented across regulatory registers** focused on authorization, cross-border notification, and instrument identification rather than on prospectus content.
|
|
|
|
| System | Content | Access | Fund Relevance |
|
|
|--------|---------|--------|----------------|
|
|
| **Fund Register** (AIF/EuSEF/EuVECA) | Authorized AIF funds & managers | Solr API (JSON/XML) | Direct - fund-level identity |
|
|
| **Cross-border Marketing Register** (UCITS + AIF) | Fund passporting notifications | Solr API (JSON/XML) | Direct - UCITS & AIF cross-border data |
|
|
| **Entity Register** (MiFID/UCITS/AIFMD) | Management companies, AIFMs | Solr API (JSON/XML) | Manager-level data |
|
|
| **FIRDS** (Financial Instruments Reference Data) | MiFIR instrument reference data | XML bulk files + API | Instrument-level (ISIN, CFI) |
|
|
| **Prospectus Register** (Prospectus Regulation III) | Approved EU prospectuses + supplements | Solr API (JSON/XML) | Securities prospectuses (not fund KIIDs) |
|
|
| **Money Market Fund Register** | MMF authorizations | Solr API (JSON/XML) | MMF-specific |
|
|
|
|
---
|
|
|
|
## 2. Fund Register: AIF/EuSEF/EuVECA Funds
|
|
|
|
### API Endpoint
|
|
```
|
|
https://registers.esma.europa.eu/solr/esma_registers_funds/select?q=...&wt=json
|
|
```
|
|
|
|
### Available Fields
|
|
|
|
| Field Name | Type | Description | openfunds Equivalent |
|
|
|-----------|------|-------------|---------------------|
|
|
| `funds_national_name` | text | Fund name (national identifier name) | OFST010110 Legal Fund Name |
|
|
| `funds_lei` | text | **LEI** of the fund | OFST010030 LEI Of Fund |
|
|
| `funds_legal_framework_name` | text | Legal framework: AIF, EuSEF, EuVECA | OFST160100 Legal Form |
|
|
| `funds_other_legal_framework_name` | text | Additional legal framework info | — |
|
|
| `funds_status_code_name` | text | Fund authorization status | — |
|
|
| `funds_mgmnt_structure_code_name` | text | Management structure type | OFST010420 Open-ended/Closed-ended |
|
|
| `funds_domicile_cou_code_name` | text | **Fund domicile country** | OFST010010 Fund Domicile |
|
|
| `funds_mgmnt_status_code_name` | text | Management status | — |
|
|
| `funds_manager_nat_name` | text | **Management company name** | OFST001020 ManCo |
|
|
| `funds_manager_lei` | text | **Manager LEI** | — |
|
|
| `funds_manager_cou_code_name` | text | Manager country | — |
|
|
| `funds_manager_legal_framework_name` | text | Manager legal framework | — |
|
|
| `funds_host_country_code_name` | text | **Host member states** (marketing countries) | OFST6000XX Country registrations |
|
|
| `funds_fund_mrkt_status_code_name` | text | Marketing status per country | — |
|
|
| `funds_notification_event1_date` | date | First notification date | — |
|
|
| `funds_notification_event2_date` | date | Second notification date | — |
|
|
| `funds_notif_legal_framework_name` | text | Notification legal framework | — |
|
|
| `funds_ca_cou_code_name` | text | Competent authority country | OFST010060 Supervisory Authority |
|
|
|
|
### Example API Call (all AIF funds, JSON format)
|
|
```bash
|
|
curl "https://registers.esma.europa.eu/solr/esma_registers_funds/select?q=type_s:*&fq=funds_legal_framework_name:%22AIF%22&fl=funds_national_name,funds_lei,funds_domicile_cou_code_name,funds_manager_nat_name,funds_manager_lei&rows=100&wt=json&indent=true"
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Cross-border Marketing Register (UCITS + AIF)
|
|
|
|
### API Endpoint
|
|
```
|
|
https://registers.esma.europa.eu/solr/esma_registers_funds_cbdif/select?q=...&wt=json
|
|
```
|
|
|
|
Same field structure as the Fund Register above, but covers **UCITS funds** as well. This is the only ESMA register that includes UCITS fund-level data.
|
|
|
|
### Filter by fund type
|
|
```bash
|
|
# UCITS funds
|
|
...&fq=funds_legal_framework_name:"UCITS"
|
|
|
|
# AIF funds
|
|
...&fq=funds_legal_framework_name:"AIF"
|
|
|
|
# ELTIF funds
|
|
...&fq=funds_legal_framework_name:"ELTIF"
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Entity Register (Management Companies)
|
|
|
|
### API Endpoint
|
|
```
|
|
https://registers.esma.europa.eu/solr/esma_registers_upreg/select?q=...&wt=json
|
|
```
|
|
|
|
### Fields for UCITS Management Companies (`ae_entityTypeCode:UCI`)
|
|
|
|
| Field Name | Type | Description | openfunds Equivalent |
|
|
|-----------|------|-------------|---------------------|
|
|
| `ae_entityName` | text | Entity (ManCo) name | OFST001020 ManCo |
|
|
| `ae_lei` | text | Entity LEI | — |
|
|
| `ae_entityTypeCode` | text | Entity type (UCI, AIF, MIF, etc.) | — |
|
|
| `ae_competentAuthority` | text | NCA name | OFST010060 Supervisory Authority |
|
|
| `ae_homeMemberState` | text | Home member state | — |
|
|
| `ae_hostMemberState` | text | Host member state(s) | — |
|
|
| `ae_status` | text | Authorization status | — |
|
|
| `ae_authorisationNotificationDate` | date | Authorization date | — |
|
|
| `ae_website` | text | Entity website | — |
|
|
| `ae_legalform` | text | Legal form | OFST160100 Legal Form |
|
|
| `ae_commercialName` | text | Commercial/brand name | OFST001000 Fund Group Name |
|
|
| `ac_serviceName` | text | Licensed services | — |
|
|
| `no_of_funds` | string | Number of managed funds | — |
|
|
|
|
### Example: List all UCITS Management Companies
|
|
```bash
|
|
curl "https://registers.esma.europa.eu/solr/esma_registers_upreg/select?q={!join+from=id+to=_root_}ae_entityTypeCode:UCI&fq=(type_s:parent)&rows=1000&wt=json&indent=true"
|
|
```
|
|
|
|
---
|
|
|
|
## 5. FIRDS (Financial Instruments Reference Data System)
|
|
|
|
FIRDS contains MiFIR reference data for **all financial instruments** traded on EU venues, including fund shares/units. Funds are classified with CFI codes starting with "C" (Collective Investment Schemes).
|
|
|
|
### Access Methods
|
|
1. **Full/Delta XML files**: Downloaded from ESMA registers portal
|
|
2. **Python package**: `esma_data_py` on GitHub
|
|
3. **API**: Via the ESMA API store
|
|
|
|
### Key Fields for Fund Instruments
|
|
|
|
| Field | Description | openfunds Equivalent |
|
|
|-------|-------------|---------------------|
|
|
| ISIN | International Securities Identification Number | OFST020000 ISIN |
|
|
| CFI Code | Classification of Financial Instruments (ISO 10962) | OFST350015 CFI Code |
|
|
| Instrument Full Name | Name of the instrument | OFST020060 Full Share Class Name |
|
|
| Issuer LEI | LEI of the issuer/ManCo | OFST010030 LEI Of Fund |
|
|
| Notional Currency | Currency of the instrument | OFST020540 Share Class Currency |
|
|
| Trading Venue MIC | Where the instrument is traded | OFST060000-range Listing data |
|
|
| Maturity Date | For dated instruments | — |
|
|
| Nominal Value | Face value per unit | — |
|
|
|
|
### Python Access
|
|
```python
|
|
from esma_data_py import EsmaDataLoader
|
|
edl = EsmaDataLoader()
|
|
# Load FIRDS data for Collective Investment Schemes
|
|
df = edl.load_latest_files(instrument_type="FULINS", cfi_codes=["C*"])
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Money Market Fund Register
|
|
|
|
### API Endpoint
|
|
```
|
|
https://registers.esma.europa.eu/solr/esma_registers_mmf04/select?q=type_s:parent&wt=json
|
|
```
|
|
|
|
### Fields
|
|
|
|
| Field Name | Description | openfunds Equivalent |
|
|
|-----------|-------------|---------------------|
|
|
| `mmf04_lei` | Fund LEI | OFST010030 LEI Of Fund |
|
|
| `mmf04_national_name` | Fund national name | OFST010110 Legal Fund Name |
|
|
| `mmf04_domicile_name` | Fund domicile | OFST010010 Fund Domicile |
|
|
| `mmf04_type_name` | MMF type (CNAV/LVNAV/VNAV) | OFST351300 Money Market Type |
|
|
| `mmf04_lgl_framework_name` | Legal framework (UCITS/AIF) | OFST160100 Legal Form |
|
|
| `mmf04_is_passported_name` | Cross-border passported? | — |
|
|
| `mmf04_auth_status_name` | Authorization status | — |
|
|
| `mmf04_manager_lei` | Manager LEI | — |
|
|
| `mmf04_manager_nat_name` | Manager name | OFST001020 ManCo |
|
|
| `mmf04_manager_domicile_name` | Manager domicile | — |
|
|
| `mmf04_auth_start_date` | Authorization start date | OFST010240 Fund Launch Date |
|
|
| `mmf04_auth_end_date` | Authorization end date | — |
|
|
| `mmf04_ca_cou_code_name` | Competent authority country | OFST010060 Supervisory Authority |
|
|
| `mmf04_auth_ca_code_name` | Authorizing CA | — |
|
|
|
|
---
|
|
|
|
## 7. Prospectus Register (Prospectus Regulation III)
|
|
|
|
This covers **securities prospectuses** under the EU Prospectus Regulation (not UCITS KIIDs/KIDs). However, some fund-related securities (listed fund shares, ETFs) may appear here.
|
|
|
|
### API Endpoint
|
|
```
|
|
https://registers.esma.europa.eu/solr/esma_registers_priii_documents/select?q=...&wt=json
|
|
```
|
|
|
|
### Document Types
|
|
| Code | Type |
|
|
|------|------|
|
|
| URGN | Universal Registration Document |
|
|
| REGN | Registration Document |
|
|
| SECN | Securities Note |
|
|
| SMRY | Summary |
|
|
| BPFT | Base Prospectus Final Terms |
|
|
| BPWO | Base Prospectus without Final Terms |
|
|
| STDA | Standalone Prospectus |
|
|
|
|
### Searchable Fields
|
|
- `issuer_lei` — Issuer LEI
|
|
- `issuer_name` — Issuer name
|
|
- `issuer_residency` — Issuer country
|
|
- `offeror_lei` / `offeror_name` / `offeror_residency`
|
|
- `guarantor_lei` / `guarantor_name` / `guarantor_residency`
|
|
- `approval_filing_date` — Document approval date
|
|
- `document_type` — Type code (see above)
|
|
|
|
---
|
|
|
|
## 8. AIFMD Reporting (Not Publicly Available)
|
|
|
|
ESMA collects detailed fund data through **AIFMD Article 24 reporting**, but this data is **not publicly accessible**. It is submitted by AIFMs to NCAs and forwarded to ESMA for supervisory purposes only.
|
|
|
|
### Fields collected (not public):
|
|
|
|
| Category | Fields |
|
|
|----------|--------|
|
|
| **Fund Identity** | AIF name, national code, LEI, domicile, inception date |
|
|
| **Fund Type** | Predominant AIF type, investment strategy, sub-strategy |
|
|
| **Assets** | Gross Asset Value (GAV), Net Asset Value (NAV), base currency |
|
|
| **Leverage** | Gross method leverage, commitment method leverage |
|
|
| **Investor Types** | Breakdown by investor category (professional, retail, etc.) |
|
|
| **Geographic Focus** | Geographic breakdown of investments |
|
|
| **Asset Allocation** | Breakdown by asset type (equities, bonds, derivatives, etc.) |
|
|
| **Liquidity** | Portfolio liquidity profile, investor redemption frequency |
|
|
| **Counterparty Risk** | Top 5 counterparty exposures |
|
|
| **Risk Measures** | VaR, stress test results |
|
|
|
|
This is the richest structured dataset but is **confidential** and only available to regulators.
|
|
|
|
---
|
|
|
|
## 9. Comparison: ESMA vs SEC Fund Data
|
|
|
|
| Aspect | SEC (US) | ESMA (EU) |
|
|
|--------|----------|-----------|
|
|
| **Prospectus text** | Full prospectus filed as HTML/XML on EDGAR | Not centralized; filed with national NCAs |
|
|
| **Structured prospectus data** | XBRL Risk/Return Summary (fees, performance, objective) | **Not available** — no EU-wide structured extraction |
|
|
| **Fund identity register** | Series/Class CSV (CIK, Series ID, Class ID, ticker) | Fund Register (LEI, name, domicile, manager) |
|
|
| **Portfolio holdings** | N-PORT (quarterly, position-level) | **Not public** — AIFMD reporting is confidential |
|
|
| **Instrument reference data** | Limited (CUSIP in Series/Class CSV) | FIRDS (ISIN, CFI, LEI, currency, trading venue) |
|
|
| **Fee data (structured)** | XBRL: management fee, TER, loads, 12b-1 | **Not available** in ESMA registers |
|
|
| **Performance data (structured)** | XBRL: 1yr/5yr/10yr returns, bar charts | **Not available** in ESMA registers |
|
|
| **Risk data (structured)** | N-PORT: DV01, credit spread, VaR | AIFMD reporting (confidential) |
|
|
| **Cross-border/passporting** | N/A (single market) | Full cross-border notification register |
|
|
| **API quality** | Excellent (REST JSON, free, no auth) | Good (Solr JSON/XML, free, no auth) |
|
|
| **Bulk download** | ZIP files (submissions, XBRL, N-PORT) | FIRDS XML bulk files; fund register via Solr pagination |
|
|
|
|
### Key Difference
|
|
The SEC provides **structured data extracted from prospectuses** (fees, performance, objectives via XBRL), making it directly useful for LLM training with ground-truth labels. ESMA provides **authorization/registration data** (who is authorized, where, by whom) but does **not** centralize or structure the content of fund prospectuses/KIIDs/KIDs.
|
|
|
|
For EU fund prospectus content (KIID/KID), you would need to go to:
|
|
- Individual NCAs (AMF in France, BaFin in Germany, CSSF in Luxembourg, etc.)
|
|
- Commercial data providers (Morningstar, Refinitiv, FE fundinfo)
|
|
- Fund company websites directly
|
|
|
|
---
|
|
|
|
## 10. What openfunds Fields Can Be Found in ESMA Data?
|
|
|
|
### Directly Available (from ESMA registers)
|
|
|
|
| openfunds OF-ID | Field Name | ESMA Source |
|
|
|-----------------|-----------|-------------|
|
|
| OFST001000 | Fund Group Name | Entity Register: `ae_commercialName` |
|
|
| OFST001020 | ManCo | Fund Register: `funds_manager_nat_name` |
|
|
| OFST010010 | Fund Domicile | Fund Register: `funds_domicile_cou_code_name` |
|
|
| OFST010030 | LEI Of Fund | Fund Register: `funds_lei` |
|
|
| OFST010060 | Supervisory Authority | Fund Register: `funds_ca_cou_code_name` |
|
|
| OFST010110 | Legal Fund Name | Fund Register: `funds_national_name` |
|
|
| OFST020000 | ISIN | FIRDS: ISIN field |
|
|
| OFST020540 | Share Class Currency | FIRDS: Notional Currency |
|
|
| OFST160100 | Legal Form | Fund Register: `funds_legal_framework_name` |
|
|
| OFST350015 | CFI Code | FIRDS: CFI Code |
|
|
| OFST351295 | Is Money Market Fund | MMF Register: presence in register |
|
|
| OFST351300 | Money Market Type | MMF Register: `mmf04_type_name` |
|
|
|
|
### NOT Available in ESMA Public Data
|
|
|
|
| Category | openfunds Fields | Notes |
|
|
|----------|-----------------|-------|
|
|
| **Fees** | Management fee, TER, loads, subscription/redemption fees | Not in any ESMA register |
|
|
| **Performance** | Returns, Sharpe ratio, volatility | Not in any ESMA register |
|
|
| **Investment Objective** | Strategy text, objective text | Not in any ESMA register |
|
|
| **Risk Data** | SRRI, VaR, risk narrative | AIFMD reporting (confidential) |
|
|
| **Asset Class** | Detailed asset allocation | AIFMD reporting (confidential) |
|
|
| **Distribution Policy** | Distributing/accumulating | Not in any ESMA register |
|
|
| **Minimum Investment** | Min subscription amount | Not in any ESMA register |
|
|
| **Benchmark** | Benchmark index name | Not in any ESMA register |
|
|
| **Portfolio Holdings** | Position-level data | AIFMD reporting (confidential) |
|
|
|
|
---
|
|
|
|
## 11. Summary for Your LLM Use Case
|
|
|
|
**ESMA data is useful for fund identity and cross-referencing** (LEI, domicile, manager, legal framework, cross-border marketing status), but it does **not** provide the structured prospectus-derived data (fees, performance, objectives, risk) that the SEC's XBRL Risk/Return Summary provides.
|
|
|
|
**Practical implications:**
|
|
- For **US funds**: SEC EDGAR provides both the prospectus text AND structured ground-truth data — ideal for supervised LLM training
|
|
- For **EU funds**: ESMA provides identity/authorization data only. To get the prospectus text + structured reference data for EU funds, you would need to combine ESMA register data with prospectus documents sourced from national regulators or commercial providers
|
|
|
|
**Recommended approach for EU data:**
|
|
1. Use ESMA Fund Register + FIRDS for fund identity (LEI, ISIN, domicile, ManCo, CFI)
|
|
2. Source KIID/KID documents from national NCAs or fund company websites
|
|
3. Use openfunds-format data from commercial providers as ground truth
|
|
4. Or focus on the SEC dataset first (much richer, more accessible) and extend to EU later
|