# Openfunds Fields: Public Structured Data Availability ## Executive Summary This document maps each openfunds field category to publicly available **structured data sources** — data that is machine-readable, downloadable, and free (or freely accessible via API). The focus is on fields describing the fund itself (asset class, settlement, risk, currencies, hedging, ESG, fees, etc.) rather than EU-specific regulatory fields. ### Key Public Structured Data Sources | Source | Format | Access | Coverage | Cost | |--------|--------|--------|----------|------| | **SEC Series/Class CSV** | CSV | Direct download | ~100K+ US share classes | Free | | **SEC XBRL Risk/Return** | XBRL → flat files | Quarterly download | All US mutual fund prospectuses | Free | | **SEC N-PORT Data Sets** | XML → flat TSV | Quarterly download | Monthly holdings for all US funds | Free | | **SEC N-CEN Data Sets** | XML → flat TSV | Annual filing, quarterly sets | Service providers, classification | Free | | **SEC Submissions API** | JSON | REST API | All SEC filers | Free | | **SEC XBRL Company Facts** | JSON | REST API | XBRL-tagged financial data | Free | | **GLEIF LEI Database** | JSON/CSV | API + bulk download | 3.19M+ global entities | Free (CC0) | | **OpenFIGI** | JSON | REST API | Hundreds of millions of instruments | Free | --- ## 1. Key Fact: Company (OFST001000–004999) — 40 fields These fields identify the management company, custodian, transfer agent, auditor, and other service providers. ### Fields with Structured Public Data | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST001000 | Fund Group Name | SEC Submissions API | `subs.name` (entity name) | **Yes** — JSON | | OFST001020 | ManCo | SEC N-CEN | ADVISOR table (investment adviser) | **Yes** — TSV | | OFST001030 | LEI Of ManCo | GLEIF LEI Database | LEI lookup by entity name | **Yes** — JSON/CSV | | OFST001035 | Domicile Of ManCo | GLEIF LEI Database | `entity.legalAddress.country` | **Yes** — JSON | | OFST001050 | Fund Guarantor | — | Not in public structured data | No | | OFST001055 | Address of ManCo | GLEIF LEI Database | `entity.legalAddress` | **Yes** — JSON | | OFST001060 | City of ManCo | GLEIF LEI Database | `entity.legalAddress.city` | **Yes** — JSON | | OFST001065 | Fund Website of ManCo | SEC Submissions API | `subs.website` | **Yes** — JSON | | OFST001100 | Fund Promoter Name | — | Not publicly structured | No | | OFST001105 | LEI of Fund Promoter | GLEIF LEI Database | If name known → LEI lookup | **Partial** | | OFST001300 | Fund Administrator Name | SEC N-CEN | SERVICE_PROVIDER table | **Yes** — TSV | | OFST001400 | Custodian Bank Name | SEC N-CEN | CUSTODIAN table | **Yes** — TSV | | OFST001410 | LEI Of Custodian Bank | SEC N-CEN + GLEIF | N-CEN has LEI fields (since 2025) | **Yes** — TSV | | OFST001415 | Domicile Of Custodian Bank | GLEIF LEI Database | Via custodian LEI | **Yes** — JSON | | OFST001430 | Trustee Name | SEC EDGAR HTML filings | Unstructured (prospectus text) | No | | OFST001450 | Portfolio Managing Company Name | SEC N-CEN | ADVISOR table + sub-advisors | **Yes** — TSV | | OFST001500 | Fund Advisor Name | SEC N-CEN | ADVISOR table | **Yes** — TSV | | OFST001510 | Sub-Investment Advisor Name | SEC N-CEN | Sub-advisor entries | **Yes** — TSV | | OFST001600 | Auditor Name | SEC N-CEN | AUDITOR table | **Yes** — TSV | | OFST002000 | Market Maker Name | — | Not publicly structured for funds | No | | OFST002700 | Transfer Agent Name | SEC N-CEN | TRANSFER_AGENT table | **Yes** — TSV | | OFST002900 | GIIN of Fund | — | IRS FATCA list (not easily matched) | No | **Summary**: ~15 of 40 company fields are available as structured public data, primarily from SEC N-CEN (service providers) and GLEIF (entity LEI/address data). --- ## 2. Key Fact: Umbrella (OFST005000–009999) — 10 fields | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST005000 | Has Umbrella | SEC Series/Class CSV | Inferred: multiple Series under same CIK | **Derivable** | | OFST005010 | Umbrella | SEC Series/Class CSV | `Entity Name` (trust name) | **Yes** — CSV | | OFST005015 | Domicile Of Umbrella | SEC Submissions API | `subs.stateOfIncorporation` | **Yes** — JSON | | OFST005025 | CBI Code of Umbrella | — | Ireland-specific, not in SEC | No | | OFST005030 | CSSF Code of Umbrella | — | Luxembourg-specific, not in SEC | No | | OFST005040 | GIIN of Umbrella | — | Not publicly structured | No | | OFST010035 | LEI Of Umbrella | GLEIF LEI Database | LEI lookup by trust name | **Yes** — JSON | **Summary**: 4 of 10 fields available. Umbrella concept maps to SEC "Trust/Registrant" level. --- ## 3. Key Fact: Fund (OFST010000–019999) — 73 fields This is the richest category, covering fund identity, investment strategy, structure, currencies, hedging, and product type flags. ### 3A. Fund Identity & Dates | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST010010 | Fund Domicile Alpha-2 | SEC Submissions API | `subs.stateOfIncorporation` → derive | **Partial** (US state, not ISO) | | OFST010020 | Legal Fund Name Including Umbrella | SEC Series/Class CSV | Concatenate Entity Name + Series Name | **Derivable** | | OFST010030 | LEI Of Fund | GLEIF LEI Database | LEI search by fund name | **Yes** — JSON | | OFST010110 | Legal Fund Name Only | SEC Series/Class CSV | `Series Name` | **Yes** — CSV | | OFST010240 | Fund Launch Date | SEC XBRL Risk/Return | `InceptionDate` element | **Yes** — XBRL | | OFST010250 | Fund Valuation Point | — | Prospectus text only | No | | OFST010300 | Investment Objective | SEC XBRL Risk/Return | `ObjectivePrimaryTextBlock` | **Yes** — XBRL (text) | | OFST010410 | Fund Currency | SEC N-PORT | `FUND_REPORTED_INFO.total_assets` currency context | **Partial** (all USD for US funds) | | OFST010440 | Fiscal Year End | SEC Submissions API | `subs.fiscalYearEnd` (MMDD format) | **Yes** — JSON | | OFST013000 | Prospectus Date | SEC Submissions API | Filing date of latest 485BPOS/N-1A | **Yes** — JSON | ### 3B. Fund Structure & Product Type Flags | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST010420 | Open-ended Or Closed-ended | SEC N-CEN | Fund type reported | **Yes** — TSV | | OFST010500 | Is Fund Of Funds | SEC N-CEN | Fund-of-funds flag | **Yes** — TSV | | OFST010580 | Is ETF | SEC N-CEN | ETF table presence | **Yes** — TSV | | OFST010620 | Is Tokenized Fund | — | Not in SEC data | No | | OFST010630 | Is Leveraged | SEC N-PORT | Borrowing data (Item B.2) | **Derivable** | | OFST010635 | Maximum Leverage In Fund | — | Prospectus text only | No | | OFST010640 | Has 130/30 Strategy | — | Prospectus text only | No | | OFST010650 | Is REIT | SEC N-CEN + XBRL | Classification data | **Partial** | | OFST010660 | Is ETC | — | US concept is different | No | | OFST010665 | Is ETN | SEC N-CEN | Product type | **Partial** | | OFST010670 | Is Short | — | Derivable from fund name/strategy | **Derivable** (heuristic) | | OFST010690 | Is Life Fund | — | Not a US concept | No | | OFST010695 | Is Pension Fund | — | Not in SEC fund data | No | | OFST010720 | Is Passive Fund | SEC N-CEN | INDEX table (tracked index) | **Derivable** | | OFST010730 | Management Approach Type | — | Prospectus text only | No | ### 3C. Currencies & Hedging | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST010205 | Has Duration Hedge | — | Prospectus text only | No | | OFST010211 | Currency Hedge Portfolio | — | Prospectus text only | No | | OFST010220 | Has Embedded Derivatives | SEC N-PORT | Derivatives tables (non-empty) | **Derivable** | | OFST020261 | Currency Hedge Share Class | — | Prospectus text only | No | | OFST020530 | Is Multicurrency Share Class | — | Prospectus text only | No | | OFST020540 | Share Class Currency | SEC XBRL Risk/Return | Currency context in fee/performance tables | **Partial** (USD implied) | **Currency/hedging fields are almost entirely prospectus-derived and NOT available as structured public data.** This is a key gap: US funds are almost all USD-denominated, and hedging is described in prospectus narrative text. For LLM training, these fields represent extraction targets. ### 3D. Replication & Securities Lending | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST010900 | Replication Methodology First Level | — | Prospectus text only (ETFs) | No | | OFST010901 | Replication Methodology Second Level | — | Prospectus text only (ETFs) | No | | OFST011000 | Has Securities Lending | SEC N-PORT | SECURITIES_LENDING + BORROWER tables | **Yes** — TSV | | OFST011100 | Has Swap | SEC N-PORT | Swap derivative tables | **Derivable** | | OFST011110 | Swap Counterparty Name | SEC N-PORT | Counterparty fields in swap tables | **Yes** — TSV | **Summary for Fund section**: ~25 of 73 fields available as structured data. The major gaps are: currency hedging, replication methodology, valuation timing, management approach, and leverage limits — all prospectus-narrative fields. --- ## 4. Key Fact: Share Class (OFST020000–049999) — 75 fields ### 4A. Identifiers | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST020000 | ISIN | OpenFIGI | FIGI → ISIN mapping | **Yes** — JSON | | OFST020005 | CUSIP | SEC Series/Class CSV | Not directly, but derivable from ISIN | **Partial** | | OFST020020 | Bloomberg Code | — | Proprietary (not free) | No | | OFST020025 | FIGI Code | OpenFIGI | Direct lookup by ticker/ISIN | **Yes** — JSON | | OFST020040 | SEDOL | — | Proprietary (London Stock Exchange) | No | | OFST020045 | NFN Identifier | — | Nasdaq proprietary | No | | OFST020050 | Share Class Extension | SEC Series/Class CSV | `Class Name` (parse letter/suffix) | **Derivable** | | OFST020060 | Full Share Class Name | SEC Series/Class CSV | `Series Name` + `Class Name` | **Yes** — CSV | ### 4B. Share Class Characteristics | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST020300 | Valuation Frequency | — | Prospectus text only | No | | OFST020400 | Share Class Distribution Policy | SEC XBRL Risk/Return | Derivable from dividend narrative | **Partial** | | OFST020540 | Share Class Currency | — | Implied USD for US funds | **Partial** | | OFST020545 | Share Class Lifecycle | SEC Submissions API | Filing history + Series/Class CSV status | **Derivable** | | OFST020560 | Share Class Launch Date | SEC XBRL Risk/Return | `InceptionDate` per share class | **Yes** — XBRL | | OFST020566 | Termination Date | SEC Series/Class CSV | Class status (active/inactive) | **Partial** | | OFST020580 | Is Share Class Eligible For UCITS | — | Not applicable to US funds | No | | OFST023100 | Investment Status | — | Prospectus text only | No | | OFST023200 | Benchmark | SEC XBRL Risk/Return | `IndexNoDeductionForFeesExpensesTaxes` | **Yes** — XBRL | | OFST023800 | Index Name (ETF) | SEC N-CEN | INDEX table | **Yes** — TSV | | OFST024000 | SRRI | — | EU-specific risk indicator | No | **Summary**: ~10 of 75 fields available. Share class operational details (valuation frequency, dealing days, settlement cycles) are entirely prospectus-derived. --- ## 5. Key Fact: Listing (OFST060000–064999) — 14 fields | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST060000 | Bloomberg Code Of Listing | — | Proprietary | No | | OFST062000 | Listing Date | — | Exchange data (not SEC) | No | | OFST062010 | Listing Currency | — | Implied USD for US-listed | **Partial** | | OFST062025 | Launch Price | SEC XBRL Risk/Return | Inception price context | **Partial** | | OFST062030 | Market Identifier Code | — | Not in SEC data directly | No | | OFST062040 | Exchange Place | SEC N-CEN (ETFs) | Exchange information for ETFs | **Partial** | **Summary**: 0-2 fields fully structured. Listing data is primarily from exchanges, not SEC filings. --- ## 6. Legal Structure (OFST160000–164999) — 7 fields | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST160039 | Is EU Directive Relevant | — | EU-specific | No | | OFST160040 | Type Of EU Directive | — | EU-specific (UCITS/AIF) | No | | OFST160100 | Legal Form | SEC Series/Class CSV | `Entity Org Type` | **Yes** — CSV | | OFST160150 | Home Country Legal Type Of Fund | SEC N-CEN | Fund type classification | **Yes** — TSV | **Summary**: 2 of 7 fields available. Most are EU-specific. --- ## 7. Classification (OFST350000–399999) — 12 fields | OF-ID | Field Name | Public Source | Source Field / Method | Structured? | |-------|-----------|--------------|----------------------|-------------| | OFST350009 | Is Sharia Compliant | — | Not in SEC data | No | | OFST350015 | CFI Code | OpenFIGI | FIGI metadata includes CFI | **Partial** | | OFST350050 | Clearstream Asset Category | — | Proprietary classification | No | | OFST350100 | EFAMA Main EFC Category | — | EU classification system | No | | OFST351295 | Is Money Market Fund | SEC N-CEN + N-MFP | Money market fund flag | **Yes** — TSV | | OFST351300 | Money Market Type Of Fund | SEC N-MFP | Fund type in N-MFP data | **Yes** — TSV | **Major gap**: There is **no free, structured, universal fund asset class classification** in SEC data. The SEC does not tag funds as "equity", "fixed income", "mixed", etc. in a single structured field. Asset class must be derived from: - Fund name heuristics ("Growth Fund" → equity, "Bond Fund" → fixed income) - N-PORT holdings data (aggregate asset types held) - XBRL strategy narrative text This is a critical finding for LLM training: **asset class classification is an extraction target, not ground truth.** --- ## 8. Purchase Information / Settlement (OFST400000–449999) — 95 fields This is the **largest gap** between openfunds and public data. Settlement and dealing information is almost entirely found only in prospectus text. | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | OFST400200 | Minimal Initial Subscription Category | — | No | | OFST400230 | Minimal Initial Subscription In Amount | SEC XBRL Risk/Return | **Partial** — `MinimumInvestment` element exists but inconsistently tagged | | OFST401002 | Pricing Methodology | — | No | | OFST402500 | Maximal Number Of Possible Decimals Shares | — | No | | OFST405521-405532 | Subscription Trade Cycle / Dealing Days | — | No | | OFST410060 | Cut-off Date Offset for Subscription | — | No | | OFST410100 | Cut-off Time For Subscription | — | No | | OFST410700 | Settlement Period For Subscription | — | No | | OFST410950 | Has Lock-up For Redemption | — | No | | OFST420200-420265 | Redemption Minimums/Maximums | — | No | | OFST420630 | Bank Details (SSI for Payments) | — | No | | OFST425561-425572 | Redemption Trade Cycle / Dealing Days | — | No | | OFST430100 | Cut-off Time For Redemption | — | No | | OFST430150 | Settlement Period For Redemption | — | No | **Summary**: **0-1 of 95 fields** available as structured data. Settlement cycles, cut-off times, dealing days, minimum investments, and payment details are exclusively in prospectus text. This is arguably the highest-value category for LLM extraction — these fields are critical for fund operations but exist only in legal documents. --- ## 9. Fees, Costs and Expenses (OFST450100–499999) — 62 fields This is the **strongest area for SEC structured data**, thanks to the XBRL Risk/Return fee tables. | OF-ID | Field Name | Public Source | Source Field | Structured? | |-------|-----------|--------------|-------------|-------------| | OFST451027 | Has Performance Fee | SEC XBRL Risk/Return | Fee narrative/table | **Partial** | | OFST451030 | Performance Fee in Prospectus | SEC XBRL Risk/Return | Not separately tagged | **Partial** | | OFST451305 | Applied Subscription Fee | SEC XBRL Risk/Return | `MaximumSalesChargeImposedOnPurchasesOverOfferingPrice` | **Yes** — XBRL | | OFST451320 | Maximum Subscription Fee | SEC XBRL Risk/Return | `MaximumSalesChargeImposedOnPurchasesOverOfferingPrice` | **Yes** — XBRL | | OFST451385 | Has Early Redemption Fee | SEC XBRL Risk/Return | `RedemptionFeeOverRedemption` | **Yes** — XBRL | | OFST451390 | Has CDSC Fee | SEC XBRL Risk/Return | `MaximumDeferredSalesChargeOverOther` | **Yes** — XBRL | | OFST451405 | Redemption Fee | SEC XBRL Risk/Return | `RedemptionFeeOverRedemption` | **Yes** — XBRL | | OFST452000 | Management Fee Applied | SEC XBRL Risk/Return | `ManagementFeesOverAssets` | **Yes** — XBRL | | OFST452100 | TER Excluding Performance Fee | SEC XBRL Risk/Return | `NetExpensesOverAssets` (OER equivalent) | **Yes** — XBRL | | OFST452200 | Ongoing Charges | SEC XBRL Risk/Return | `TotalAnnualFundOperatingExpensesOverAssets` | **Yes** — XBRL | | OFST453151 | Is Trailer Fee Clean | — | Not in SEC data | No | | OFST454150 | Has Separate Distribution Fee | SEC XBRL Risk/Return | `Distribution12b1FeesOverAssets` | **Yes** — XBRL | | OFST454160 | Distribution Fee | SEC XBRL Risk/Return | `Distribution12b1FeesOverAssets` | **Yes** — XBRL | | — | Fee Waiver / Reimbursement | SEC XBRL Risk/Return | `FeeWaiverOrReimbursementOverAssets` | **Yes** — XBRL | | — | Other Expenses | SEC XBRL Risk/Return | `OtherExpensesOverAssets` | **Yes** — XBRL | | — | Expense Example (1yr/3yr/5yr/10yr) | SEC XBRL Risk/Return | `ExpenseExampleYear01` through `Year10` | **Yes** — XBRL | ### Key XBRL Fee Elements (complete shareholder fee table) ``` MaximumSalesChargeImposedOnPurchasesOverOfferingPrice MaximumDeferredSalesChargeOverOther MaximumSalesChargeOnReinvestedDividendsAndDistributionsOverOther RedemptionFeeOverRedemption MaximumAccountFee ManagementFeesOverAssets Distribution12b1FeesOverAssets OtherExpensesOverAssets AcquiredFundFeesAndExpensesOverAssets TotalAnnualFundOperatingExpensesOverAssets FeeWaiverOrReimbursementOverAssets TotalAnnualFundOperatingExpensesAfterFeeWaiverOverAssets ExpenseExampleYear01 / Year03 / Year05 / Year10 ExpenseExampleNoRedemptionYear01 / Year03 / Year05 / Year10 ``` **Summary**: ~15 of 62 fee fields are available as structured XBRL data. The SEC fee taxonomy is detailed for US-style fees (sales charges, 12b-1, management fee, expense ratio) but does not cover European concepts like custodian fee breakdown, trailer fee clean status, or performance fee details (hurdle rate, high water mark). --- ## 10. Solvency II (OFST500000–519999) — 13 fields | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | All 13 fields | SCR Market Risk, Tripartite Reports | — | **No** — entirely EU insurance regulation | **Summary**: 0 of 13 fields available. Solvency II is a European directive not applicable to US SEC data. --- ## 11. Taxes (OFST800000–819999) — 27 fields | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | OFST809200 | Is US Tax Forms W8/W9 Needed | — | Prospectus text only | No | | OFST809210 | Is US K1 Reporting Required | SEC N-CEN | Partnership/LP fund flags | **Partial** | | OFST809250 | Is Flow-Through Entity By US Tax Law | — | Prospectus text only | No | | OFST809511 | FATCA Status | — | IRS data, not SEC structured | No | | OFST809520 | Subject To FATCA Withholding | — | Prospectus text only | No | | OFST801011 | Is Austrian Tax Reporting Fund | — | Austria-specific | No | | OFST802001–802045 | German Tax fields (8 fields) | — | Germany-specific | No | | OFST802500 | Luxembourg Taxe d'Abonnement | — | Luxembourg-specific | No | | OFST808008–808100 | Swiss Tax fields (3 fields) | — | Switzerland-specific | No | | OFST809015 | Has UK Reporting Status | — | UK-specific | No | **Summary**: 0-1 of 27 fields available. Tax fields are overwhelmingly jurisdiction-specific (DE, AT, CH, LU, UK, FR, ES) and not in SEC data. The few US-relevant fields (FATCA, K-1) are in prospectus text. --- ## 12. ESG Data (OFST820000–849999) — 65 fields | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | OFST820110-820280 | Carbon Intensity / Footprint / Absolute GHG (18 fields) | — | **No** — not yet required in SEC filings for funds | | OFST820290-820360 | Fossil Fuel Exposure (8 fields) | — | **No** | | OFST820370-820380 | Net Zero Commitments (2 fields) | — | **No** | | OFST820390 | Implied Temperature Rise | — | **No** | | OFST820440-820460 | GHG Reduction Goals (3 fields) | — | **No** | | OFST820470-820540 | Climate Stewardship (8 fields) | — | **No** | | OFST820600-820675 | AMAS / ACT Signatory fields (8 fields) | — | **No** — Swiss specific | | OFST830000-830210 | UK SDR fields (12 fields) | — | **No** — UK specific | | OFST001025 | Is UN PRI Signatory | UN PRI website | **Partial** — searchable but not API | **Current state of SEC ESG data**: The SEC adopted climate disclosure rules in March 2024 (effective May 2024), but these apply to operating companies, **not investment funds**. The Investment Company Names Rule (addressing ESG fund naming) has compliance dates of June 2026 / December 2026. As of February 2026, there is no SEC-mandated structured ESG data for funds comparable to EU SFDR. **Summary**: **0 of 65 ESG fields** are available as structured public data from SEC. ESG fund data is available from commercial providers (Morningstar, MSCI, Sustainalytics) but not from any free structured public source. --- ## 13. Dynamic Data: Prices & AuM (OFDY000001–000999) — 20 fields | OF-ID | Field Name | Public Source | Source Field | Structured? | |-------|-----------|--------------|-------------|-------------| | OFDY000010 | Price Currency | — | Implied USD for US funds | **Partial** | | OFDY000035 | Valuation NAV | SEC N-PORT | Not directly; XBRL Company Facts for some | **Partial** | | OFDY000060 | AuM Fund | SEC N-PORT | `FUND_REPORTED_INFO.total_assets` | **Yes** — TSV | | OFDY000070 | AuM Share Class | SEC N-PORT | Per-class AuM when reported | **Partial** | | OFDY000075 | NoS Share Class | — | Not in SEC structured data | No | **Summary**: 1-2 of 20 fields available. Daily NAV prices are not in SEC structured data (available from commercial sources). Fund-level AuM is in N-PORT. --- ## 14. Dynamic Data: Performance & Risk (OFDY025000–049999) — 4 fields These 4 fields are Germany-specific (equity participation ratio, total fund asset share, etc.) and not available from SEC. **Additional performance data in SEC**: While openfunds has few performance OFDY fields, SEC XBRL Risk/Return provides: | SEC Element | Description | Structured? | |-------------|-------------|-------------| | `AnnualReturn20XX` | Calendar year annual returns (1yr–10yr) | **Yes** — XBRL | | `HighestQuarterlyReturnLabel/Value` | Best quarter return | **Yes** — XBRL | | `LowestQuarterlyReturnLabel/Value` | Worst quarter return | **Yes** — XBRL | | `AverageAnnualReturnYear01/05/10/SinceInception` | Average annual returns | **Yes** — XBRL | | `BarChartClosingTextBlock` | Performance chart narrative | **Yes** — XBRL (text) | And N-PORT provides: | N-PORT Field | Description | Structured? | |-------------|-------------|-------------| | `MONTHLY_TOTAL_RETURN` | Monthly returns by class | **Yes** — TSV | | `MONTHLY_RETURN_CAT_INSTRUMENT` | Returns by asset category | **Yes** — TSV | | `FUND_VAR_INFO` | Value-at-Risk | **Yes** — TSV | | `INTEREST_RATE_RISK` | DV01/DV100 by maturity bucket | **Yes** — TSV | --- ## 15. Portfolio Holdings (OFPH000001–999999) — 92 fields N-PORT is the primary source. SEC requires monthly portfolio disclosure. | OF-ID | Field Name | N-PORT Source | Structured? | |-------|-----------|--------------|-------------| | OFPH000010 | Holding as at Date | Reporting date | **Yes** | | OFPH000020 | Portfolio Currency | Fund currency context | **Yes** (USD) | | OFPH000100 | Holding ISIN | IDENTIFIERS table | **Yes** | | OFPH000130 | Holding Ticker | IDENTIFIERS table | **Yes** | | OFPH000145 | Holding CUSIP | IDENTIFIERS table | **Yes** | | OFPH000170 | Holding FIGI | — | No (use OpenFIGI to map) | | OFPH000200 | Holding Name | `FUND_REPORTED_HOLDING.name` | **Yes** | | OFPH000210 | Holding Instrument Type | `FUND_REPORTED_HOLDING.asset_cat` | **Yes** | | OFPH000250 | Holding Market Value | `FUND_REPORTED_HOLDING.balance` + `val_usd` | **Yes** | | OFPH000300 | Holding Net Weight as % | `FUND_REPORTED_HOLDING.pctVal` | **Yes** | | OFPH000400 | Holding Currency | `FUND_REPORTED_HOLDING.curCd` | **Yes** | | OFPH000420 | Holding Risk Country | `FUND_REPORTED_HOLDING.invCountry` | **Yes** | | OFPH000430 | Holding Asset Class | `FUND_REPORTED_HOLDING.asset_cat` | **Yes** | | OFPH000440 | Holding Credit Rating | DEBT_SECURITY fields | **Yes** | | OFPH000450 | Holding Number of Shares | `FUND_REPORTED_HOLDING.balance` | **Yes** | | OFPH000460 | Holding Coupon Rate | DEBT_SECURITY fields | **Yes** | | OFPH000465 | Holding Modified Duration | — | No | | OFPH000480 | Holding Maturity Date | DEBT_SECURITY fields | **Yes** | | OFPH000600-650 | Interest Rate Type / Index / Margin | DEBT_SECURITY fields | **Yes** | | OFPH000700 | Holding Issuer Name | `FUND_REPORTED_HOLDING.issuerConditionalName` | **Yes** | | OFPH000710 | Holding Issuer LEI | `FUND_REPORTED_HOLDING.lei` | **Yes** | | OFPH000712 | Holding Issuer Domicile | `FUND_REPORTED_HOLDING.invCountry` | **Yes** | | OFPH000730 | Holding Strike Price | Derivative tables | **Yes** | | OFPH000800-870 | Underlying Asset fields | Derivative tables | **Yes** | **Summary**: ~35-40 of 92 fields available from N-PORT. The main gaps are: modified/effective duration (calculated, not reported), GICS sector codes (not in N-PORT directly), and European-specific fields (CIC, NACE, EUSIPA, WKN, Valor). --- ## 16. Fund Ratios and Exposures (OFRE000001–999999) — 42 fields | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | OFRE000010 | Number Of Positions | SEC N-PORT | **Derivable** — count holdings | | OFRE000200 | Exposure To Cash | SEC N-PORT | **Derivable** — sum cash-type holdings | | OFRE000300-320 | Credit Quality fields | SEC N-PORT | **Derivable** — aggregate from holdings | | OFRE000330 | Average Effective Maturity | — | Not directly in N-PORT | No | | OFRE000335 | Average Effective Duration | — | Not directly in N-PORT | No | | OFRE000350 | Yield To Maturity | — | Not in N-PORT | No | | OFRE000500 | Top Ten Positions | SEC N-PORT | **Derivable** — sort by weight | | OFRE000520 | Country Breakdown | SEC N-PORT | **Derivable** — aggregate by country | | OFRE000540 | Currency Breakdown | SEC N-PORT | **Derivable** — aggregate by currency | | OFRE000560 | GICS Equity Sector Breakdown | — | GICS not in N-PORT | No | | OFRE000570 | Market Cap Breakdown | — | Not in N-PORT | No | | OFRE000580 | Credit Rating Breakdown | SEC N-PORT | **Derivable** — aggregate by rating | | OFRE000590 | Maturity Breakdown | SEC N-PORT | **Derivable** — aggregate by maturity | | OFRE000600 | Asset Class Breakdown | SEC N-PORT | **Derivable** — aggregate by asset_cat | **Summary**: ~10-15 of 42 fields are derivable from N-PORT holdings data. Pre-computed ratios (YTM, duration, OAS) are not available. --- ## 17. Portfolio Manager Data (OFPM000001–999999) — 8 fields | OF-ID | Field Name | Public Source | Structured? | |-------|-----------|--------------|-------------| | OFPM000010 | Portfolio Manager Name | SEC XBRL Risk/Return | `PortfolioManager` text block | **Partial** (text, not structured) | | OFPM000060 | Portfolio Manager Brief Biography | SEC XBRL Risk/Return | SAI supplement text | **Partial** (text) | | Others | Year of birth, experience, role | — | Not structured | No | **Summary**: 0-1 of 8 fields. Portfolio manager data is in prospectus SAI text, not structured. --- ## Grand Summary: Structured Data Availability by Category | Category | Total Fields | Structured Public | Derivable | Not Available | |----------|-------------|------------------|-----------|---------------| | **Company** (service providers) | 40 | **15** | 0 | 25 | | **Umbrella** | 10 | **4** | 1 | 5 | | **Fund** (identity, structure) | 73 | **15** | 10 | 48 | | **Share Class** | 75 | **8** | 2 | 65 | | **Listing** | 14 | **0** | 2 | 12 | | **Legal Structure** | 7 | **2** | 0 | 5 | | **Classification** | 12 | **2** | 0 | 10 | | **Purchase / Settlement** | 95 | **0** | 1 | 94 | | **Fees** | 62 | **15** | 0 | 47 | | **Solvency II** | 13 | **0** | 0 | 13 | | **Taxes** | 27 | **0** | 1 | 26 | | **ESG** | 65 | **0** | 0 | 65 | | **Prices / AuM** | 20 | **2** | 0 | 18 | | **Performance / Risk** | 4 | **0** | 0 | 4 | | **Portfolio Holdings** | 92 | **38** | 0 | 54 | | **Ratios / Exposures** | 42 | **0** | 14 | 28 | | **Portfolio Manager** | 8 | **0** | 1 | 7 | | **TOTAL** | **659** | **~101 (15%)** | **~32 (5%)** | **~526 (80%)** | --- ## Implications for LLM Training Dataset ### What this means: 1. **~15% of openfunds fields** have directly available structured public data (primarily from SEC EDGAR: XBRL fees, N-PORT holdings, N-CEN service providers, Series/Class CSV identifiers). 2. **~5% are derivable** from structured data (e.g., aggregating N-PORT holdings into country/currency/rating breakdowns, counting positions, inferring ETF status from N-CEN index tracking). 3. **~80% are NOT available** as structured public data and exist only in prospectus narrative text. ### The 80% gap = the LLM opportunity The fields that are **not** available as structured data but **are** specified in prospectus text represent the core value proposition for LLM extraction: | Category | Key Extraction Targets | |----------|----------------------| | **Settlement / Dealing** | Cut-off times, settlement periods, dealing days, minimum subscriptions, pricing methodology | | **Currencies / Hedging** | Share class currency hedging, portfolio hedging, multicurrency options | | **Risk Limits** | Maximum leverage, redemption gates, lock-up periods, side pockets | | **Asset Class** | Fund classification (equity/bond/mixed/alternative), investment strategy | | **Fee Details** | Performance fee mechanics (hurdle rate, high water mark, crystallization), custodian fees | | **ESG** | Sustainability approach, climate targets, exclusion criteria | | **Tax** | FATCA status, K-1 requirements, flow-through entity status | ### Recommended approach for training data: - **Ground truth (structured data)**: Use SEC XBRL fees, N-PORT holdings, N-CEN service providers, and Series/Class CSV as verifiable reference data. - **Extraction targets (unstructured → structured)**: Use the 80% of openfunds fields that exist only in prospectus text as the fields the LLM should learn to extract. - **Validation**: For the ~15% structured fields, compare LLM extraction from prospectus text against SEC structured data to measure extraction accuracy. --- ## Appendix: Data Source URLs | Source | URL | |--------|-----| | SEC Series/Class CSV | https://www.sec.gov/data-research/sec-markets-data/investment-company-series-class-information | | SEC XBRL Risk/Return Data Sets | https://www.sec.gov/data-research/sec-markets-data/mutual-fund-prospectus-riskreturn-summary-data-sets | | SEC N-PORT Data Sets | https://www.sec.gov/data-research/sec-markets-data/form-n-port-data-sets | | SEC N-CEN Data Sets | https://www.sec.gov/data-research/sec-markets-data/form-n-cen-data-sets | | SEC Submissions API | https://data.sec.gov/submissions/CIK{cik}.json | | SEC XBRL Company Facts API | https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json | | GLEIF LEI Database | https://search.gleif.org/ / https://www.gleif.org/en/lei-data/gleif-api | | OpenFIGI API | https://www.openfigi.com/api | | SEC N-MFP (Money Market) | https://www.sec.gov/data-research/sec-markets-data |