From 63e650fa148d6471d6e85bc6e70458cbdf3bc6c5 Mon Sep 17 00:00:00 2001 From: Florian Herzog Date: Wed, 3 Jun 2026 11:21:23 +0200 Subject: [PATCH] Update dataset description with full 2025Q3 build statistics Full build: 2,326 prospectus filings across 393 trusts -> 852 samples (659 segmented per-fund + 193 fallback), trust-level split 655/122/75, no-model baseline F1=0.79. Co-Authored-By: Claude Opus 4.8 (1M context) --- dataset_description.pdf | Bin 343381 -> 343980 bytes dataset_description.tex | 75 +++++++++++++++++++++------------------- 2 files changed, 40 insertions(+), 35 deletions(-) diff --git a/dataset_description.pdf b/dataset_description.pdf index 0a30536a5d3ac705ae36f0acae62fe9e79aede5e..f6dd5e0b0270aa581604bca34d8feff12414c7ca 100644 GIT binary patch delta 14035 zcmajFWl$Y3*DVYj4zA_k?(XjHF2&v5-45;)m*QUB-HN+Y9E!WUw_l(4x!?Qi&fH`& zv)0~wR`O$KCfUi-8>sp(s9shdnw^`CBt`U+8eD_7;ke25BxuCS9BT_9pGI@GY)YKW zLEE^hxIPk~-7HC(z{9RA{7Db| zc2#C_(b_xFQ^}+k@wkQc#ytr1baN8O()+&IGN_%fXniqs>TGZBTq3EL=93HY&BF*; zuZu7#IU+%pHoGRj)Nr3lb991zsUNPWvt=wiqT-V??r}K*Li8ZMj2jb`;O)-8Tco6q zLpUZ>0(&YGFN5MxyDgG&XSSQ(RD>H$)R4{Y7?xlGW+88jeESK-xN)}I-CMnZtG9>@ zP;RflwK!@q5X^*_`ELHZ)Ic-Mj92?3sG9q|Hlx90B{!Om=VcGhwWZrFeR>6UED^|A z58;F!>ATp6&VRbgfUG}~Hj~{ad)zCO_F3~zmaHwDjHUlzml@Vgrj<=bqm~9BrTP+}iAP`cz@Dtkt#5Ia}lh>ReFlKbw+@&_`tutw#@dA+Vs z&SG~AEw+GBS%8*YM+$A5iUSRtlGu({O5?a*3DVf2I#{?s$wo7qL`>kMJ01SG0_l`9!AbP>POL;>EpDScYn-|C-stkdj=5QSEk<%`N z-+0ITIWOZBC|Vb)YlW4eHlMd_usZzi;?K>{NApT zv9Rsl59|q;Rct|-3lRy?WI>@;2QH|Zh2)O-)~~BPX!ZkCh>UK^3=yG!&9|ItWCt!| zqQXTEN#WZj$W3@5`qaS%IzkEoK1*=bZ11kNQdiZw$cZ6gQ%Esk36^N-)O0=zJU$HE zy_%+mF0r5HJz|5ukQdYb&V`^iFi3iw*Ejd@_Crim&-4%V4bAG!^*1`c7{gKJLvKI- zIp@B%IO+}&8eCN_m?y~#TuC1MB@c*K@3bK}lVfT08jN^ldou$cCjH)d*O+e2vwNxzKy zKihQ8%eW|h~is0IDl9zh=xz$tn_>MX)NL4azTkTSsuv3SCa;@CVp#$#+!qn&`$ z<|4jenV1As?h{Rf0{iSfk3=xl$4PNRa5)=@+B&0)e70=7JWaDgh1|J- zXw!1^vJyN;9^)~F!F;6bq)pL9^y!^byg}U(%&OL{4+RMgA?|bcJ5rdXjNgsO$J+WV zclY!#P@-WUXih-g6!Il>asPw~nyY$tk@kD!4tYccK!zzWn)l ztR`Ao1b;f%6eBKAwTDpx$Z!7Tu`>a_jpHTSN=Qibu2dt_qTZ(MRCG&>_4b#MR*ixVk$Yz1B-pr z#6t-InO-cO08>y&cjf6Ty`u$;B%nk?7{Ld@WO;jK-e533xUOi)x=;rq(L&L8um*Rg=@nZdPy8exO zy)|n%2H)6qlTmN-Y)m&KdMqBxpWt8!2Eim5kz!=8`8%>*}wbiwSvc((K_Q^R(5s1lC zEQpC)F}xV9K2o3!83E&TH?f%x-;!eZFunqtwzbP%4kSN-0(RqifD*Z*5KON>!J-d! zFh2-K3onUcY|X?eJJNFu2VchUQJTh)3C=(hu4JRIrCb#;KZM5+R~@RB=_Y>T{Vv{Z z$Ayk9tbRV=!B$%?uwzQS6vz4wjOn$o$x`HsG7HG)iD_mij)9oY1C6juCDOqCQ( zrkZo)A~yOM`|8E>ZqQ&Vg@q-WE}!+XXt2hqjCgXVLk_greoifz?=LTkH}4!r^gs*w z;U{8Pw6u7zRaL5Oo@flq>K`reExS_7%Oh3tc6bGs1@j8aEuBHl5bU00*-)$y>O}wy z=WIobTC#kp_u5m?_yU;GpXEuR@PYb)hT!r!No^Bfj9a%Xd4kFJX(l&$rqRY3rOnHv zHchFU_66}yW$Y`q$s08qV8nAZ`_Ey2^S*i%JLf&A`aCgto}G!Zfwr%giOw5!qFPws z;>y1DI_6U)jl@%#9C$U`P39ndg=axD4|_IU>j&Sk34ug74+trl|F5Wf9BmLnQm4dB zuAwyw|JFkV8KOp#u=LioaI56EQLeND(b``Z)%@;Ou?XmgAs+YNC&iW1LVp*n9QZS4 zO@;uxhWEv*E@=6Y`|0vX$FdMRF;V4B&~Y zz``Ar;^FgE{Zi=1y5P5oogdNnap>~BS@PWVB}0)G@@f2gjFBc#KA4E5{mZ8iH0pai zrY9DrN3LKJW;OT{&X1A^W?w)=!=UzcL|QY_moOgH1Po|2jcMXl;s?x}&%S`b`tS(o zkGZT#5nYh+cgT8`#R)<1nv#h4Zvhe0MUM0~FNjG!WUg1b@^E3?_Kn~sIM~7E${K_c z@dem2I7n$RW>k)E$0C_`8G^7yQLdJxWFbz{%P-()ckQ*p@sB|SNy!@`a9h1t@tu24 zwG*G|fE{|x5+z!!@3Lr1%<}V4n5{F`&Uo(c$tjT!fr>AZt<7)+($|-8srq6Qdr$|q zbiF81Y};&10z86dZeD`Xe!u(8(rUk&gM`vgr1TN;&ZA?+v-XxMDHx7o-#sljhXA@Y z%)h~$Xhs^nyNKWidT4+2NsKrtm8%;oh=0dbmR~}6NSmeX^IZ;SErf&h_6BJI!*j4+ zlyAVr19Uu<{Ml*7pIL{|0WmOj>jtWvFu`23&%F4fyOLavVS5|v9BC=V&x8lIrbO5k zic@Q$HItHApmQD)rWE1_ksZ&;^7IF_VGaocTF#vVH};xz4|QhUIxXrKiVYlas*z{k zg&@sbnnXhD@5GTL(4UwGNQ+63e=+E&3H~LSA&L$DvMJK$?Joox@h4i!t_=+|8yCla zZ&bOuIw>0?=z%MBJ9_nB5gD!^5BB@N9k_n!GPSvH_N=JyOO$|y;`o+!Vw7Lk1q<7; zBlU#HYBj5%Ol5av@>P#bp7l-a zY}l$vluhbsIIi}-LsyOZ#z1_wSy`X|c{Av>>i zcP|djZEsCI%dfAnmlZIW*H~M0EsZY6Iw*XEHAqrxhC31_bW>xMXCo*f#z%5!#_`~u z^yS+zjpcl|npnrrat?a;EoNT2`M0GZ3LnM9L^H;8QrAf=IOP+vY|$bh>$eyh^b1tO!z>OYga~b(Gn8hKMdvWFuV;KEJ#AxsmV2 z{X*Vj)8}V)zQs^&1%kS5)4pL01=VQl5U#SPkUInR1f4Z&XH}Z~)!j%GJ3MR+PQ-GT zYa6xYe4|NJ&G~H47B?4;U-d(b%3Nnkd!jL3mXEOeq`dXPbFmEp=D5CTEFpM1?->WP zn3J!E-x@0<@XOJ?pTFDRvu7=$n5&TuZ20#v-3|LR zO4ZMavGRF>_;RBhe^lRwO7YRzR=p9kl26FW)@}+n zux=$-QQxt2K93-`asiSWh&sJAQSNUaH*X8(67OTsvs4)paM9;YC}D5sp7g4jnx#@i zT>YH=d@V57uLK`i)mo>X^Yy;}KpMo&5_xyyfZX2o?iHq!!~-CwK5i`{ z`>aDSCuRQ!{>^2byu%0iP9GNNaD#=eVGj2Z<{`vv(utn2rYTW66AX!DE=vu=x$A~u0 zy1W9ne{7ff=o!3i6T8!=vnZ=RyxA8H3mfjAxz8oj{tF!8F+J%s_?SiyqJ%^YKb?WHI)tj2nNw~|=QYS)rGi92C% z4Ux#*CKUL(e2NZm4fA(oYn`53hYfvn=Uh)3L|<;fxNrJ$pcDa-E4e0?Rke8DBy)1* z;Ohy-vq!*~<$kWZ?!Zo&A)O8-g!KRTRbRNfC0Q@S+wmTd?t7y5iKb3D{3YArr_82I}Q_veET zFJeNN(MMv%W!~7XRgC><-qNiS#KTq8_}JvMfJqef5Y0Thm=a_(G0@@35voHw--B_c zp`M`O2yUTUK7>?+5m9aP{8uS5Yd{ehpwaKs@yQ)utfpM)%22pj4@RIhQ=33zTzf2V z&R)p0#+!QkdIRB3w*!4Hn39!B1UTS~2HP8PX`srn&#zSf&R;r1he`WXM9x}!6oxc| z+ZZpGhB3E8s6YBI*av=LQ^$?$>pWz9TT}5IwIP{^CBazU)k%!Lno?oE-Lzo2F}7BD zX?I*RyJjd+2n{#uan9Cs9o`LW654#X8)U|mG0khC@LWmbXSFwBlN`{PIWPm0nM9dS z?=eF*5Yu6xE#lnuJz#z<3K?4GK$x#5kM|#!5@uS?8lisRtG8m0>G(orfWlRWK-2#M zw{qCpmB#a9VQtj@2d{9>oranvkJ?%jd|x@sB_!mukeyRgI(~zpKAqx{ho&h$nfUiCx9#NsZMy985&lSsz0CBtVY zT2>Hua$4?G$^$8`UYGsE0<)=sy=oOQI2OGWzN)!aTx&QMP(r$)`G&`zB(ePKP$@+LkXt1AAa~IuE&-03cZkDoUCe zqr?fldKRCptB4|wD9fDAT9Z$usimo`4iWrD9;X-{6skt*Xnr|C&DvY>V&Pw@*7u{w zK{uQJ4k@3?3X%;2Lwp>suU67-(X51QcQi)#Ig|wQOQAXDPpoO6rl;N)roaA zUBEcCi@E|y=yq+nkEJF#vH{B0^>hh?6;YE7uk2YG@}~@FuQc~-i)vrY1&na>@Kfko zl3REB@Es1=GX7vq+W?(DDCg?&ESYqGA`saqA;zUG5sTHOB1^IePrOi$Z=}s5DV^rr zYP@1(yfkxL%64(_7hDa?roiM6TVDuFGAah67LJ_4Cvq6=uo`>c)D>n4zj+)I)pX(3 zgLq}=&eM^<3I%2#^bl4i3VW*a;u;?GHs)_mP{`(5ACO!^em)dvl65)e<-NMcz1O;W zTgUJ8o=70$RGHPY%5fe*MEJoh6synsrZ>y!DvBAZh2<#&mCG#chDEFSzBs&o`xkx% zf+xzMl!H5B$iIM|6fKs2mmCll4+#s2lc_C&fB*usf`y}%yEO?X7h488B9I;|@3g^! z)O)OTLIX&l{|Uku^w85a9%XFv!~yBKG_R$T(%Ur|3Cp>wnY%H4k-fc5B6d*H9T;DX znSCC;JiBTrml@cP(Sl#Q8qXRnMKfpgo{|)8MfNG{iYle#^!}4YmJTl}t+Jw_^D9SC z31zs4Q&28@RvFbu)qfY*wm=JBJ~iZD@{j6u`nf5XPFt}jS2BY`h+&;eX=gvrz=7Gi_Arw34U^^ z>eOQH%1q_@>PVel(wvlH%M!nSCyc{gcc$~lQ)CP_Jk4E>Qlk9l>bPvbTDiBtmNue{ zUUr}WjA2vA{#>^YOZ6z&sX&!5i^|GmP@CD{a*}w?mnq~nFDrZ+-djMO41i4>AO>{B z8-ph=Dcr=HeEl(BDYc@My-Tnt)Qh?i)BCNStSG{I1exCE#Zxl!C7doE{*QLa2*x z#EZ7ODNRC&IEISP_z99-6gct`%|!FjpGm8t&!6^u#9s1~i$@L!$YOB$QBn3h8{Fm%ILmI*Z*jQNpC$3G)7xFIlAN?NR_wzjD= z=xwNsx4-|m-M^zmo9ESkcqssg)%K640y0!!fmj*6FVGAbmzU7dZ4;0G@xLPgG1|Ic z{v%a_48+Lz2buosC;0r2UjYq>(SicZ1Y|T}05Q@~fQoG}&_H=W#v=B==%1I+ifs+B zKv_V>9X=4$hJpx8gUa|q@(;El15=@(*xA|?(SQ~JD0cR?W(*(^G$eb5D(t^0kFbHr z(2$&M5BNZ0Xh^O$5DAbSfXc(m^S?WrgPS|$7LFEtsjuUz%Cc6v z4C*|Jl%W}3T45TlOKa(Lf#e+7rxhn+68M;8hP4U-2g(qUf_ZEIfCAB^bSdnj#1W9n zd=VTGlRarmn3I-Ff`<_eE~1Tv38*!4K=0U2Ld3po^&V}8ipk};`?rPyEH*$i8Hzg*J%Nz ztvyVz83YPRU1N9%vv%?b^UhwwT4)ZDUP#HXC=kY1i>As1CR0Q*!Ju$%-I#I!3i>7t zl8=mmCHPyEJj5IbfxzA*Ck22@ppnSlXAOp`<%Q7HHKD}ff+&nDx*>x?_U}%EvS1E@ zKpwM>HCBl`fgBHFY={~t4~DWY3VEbOqx9V=43pH(4aPGjX@;72kqYiq{X&H%%CGN7 zS_%@OHi0f+K!Yjg{h?haR}N9$=Z@f4D*XtpArGMgI#YqMUWD>EQnjL?^$DswFRBS%|GQ_Iy=Qij93p1nBrd5Nz7`wFF5p2Ys&f%%-qvIvV_P zfHH^+LOfg+35lAaBWQ-rhTO!p0)uE3nno;ScKT*!t3C##_2TGq^>Xbh zgl5dX5m34PaS=WZ!4V-_kQ_@nJD7vt^c$uI=6Vc7%Cof%LXdO>QH|s-L|y+GrcSEI zs?_q))Xs_#?BV`i3UsuDi$*H3&>wghHl%tP?R_-MTtj6l>QL}sKl}=KTe#M4jGA=E zlSjwqi4jt~Tc+k6De;GboK_m*X{x5z+oh9cIu2AHPF<#|SI|caQ@+36AFyXP0VDm5 zo;^+~Y}0Au%i~Is07Oj}#~H*FJk>5AkDAONYM+~!A|M80Q1;=ABF7RCPp?ST+R3I` z;(VAdO0PF-Rw<#%;uhZA-(MxwsnTh(Nar#8S_=~atcq-`x@f^^z$EgQVx&Nz607O_ z)uyF@M_Q4&vg!6@_d@R$9*l3_EDj8&!r?`_g|__nk_Y@wog|#8WYu?z0n`S;0~iA9 zWJKIU?_S>SlWX%6et?lDBAZ@yIDdQ}l6`F!U?2`ib`NgM9VjBvMM5s0e&Ye52TY@~ zlC9B17|Is-i4+iHA38O0H$naSm~?&cQA1Xi)|$Mv=S#gM+Itu*0lS>k(;*SU$e>4@ zhzEUJ;vXwAXF}91Nvo0Ul)=IcqGf8Z>fv>0H2m=kE%Y~2GYkm_{{mfI1YT>Zfi)={ zYVMyxqEtV{(QGbni7TD#ZrbPeE7zJ@NwgJxM&LunS4CX5=%+?vH} zK3&qtcJxcJDi0FwEpRv%SHB6zot4CIMb(y;(Aq5=Wquwb<6lXE1_X`_XUm!(U(F%5 z?ctaYCCvvw(hTeAg%@%W_wFK=q^hAoj?px;5OzME&&24`%B_vZ!^R+n`QtbQ-XHUu zalcih8~j@GgHl_#wTgT3x!6?g$df4EvUY4}^F` zY3qa+p%p+4Br#S+rCAOsyaHA>wTf(=_6+PH4?0Lq2j|;F4FCt0)DVB#RHQsHdH+f} z%(}EF^k<)?t4)IzrPSh))=ED1%bG%T|HCK zAmh=PrNQWiks|ZGjCvYKdDOJdB9_%G&Nzw$X(T1{ZHgkY@zM$6uPvhAkx$}JNq9zr z*!>bQzrpm8tEgbLTbj+~BcCYmB4Z(aMR{}-JWZzQX}&mJ4cW!H3>JT$w&(%Wo(x&7 zuN!_rYpL`iZUv~*Ew?S-*Jz*8>0Hs;$)>eMFMtcFI*>{%m?W_fB}eTf@2~&fo?DH` zo|94HD#ANtW}HfuY9IZvW%p1gR=p;kyyHoiWRKgzJKkjzLt3n8Ha5iD%Yr}2cEDc| z8;U9>{SNh$6Q`p?BBone-G7ho>GDmsxWYWp$A-473`>lCOoW$dqy*xNnL^oxgiTyE z)H66DN>$m+(~NwifyUY@cxy=g3BgF_3jq}|PQ67XY$RkXDT0h>VJvlevf#JUf(695 z;>#@c_yd!e&eD;s#yU5`Clnb+o0Y-&&8(Z99+emO=}wt_JlzX!q7?^*w36?w3918l zf%fa+E>8XCnG4rJSRp)bcrM8*h;gh^#v@>CbB9p%w>x?^K_fB@VwzP5t%M~?ehDJNHGs` zIAyq*$rCsL$_EETv%EwI3iat$a<^2ZoXy)sL(1mtRsux(ujzNT%OE>0@oz7*YbRiN zyP~;A>tpLSYV5xKXfwWgOrA{4$H-DeiyrY{>jPz4YxuJgmBbXcll?|yHM~@`#^ros zl6OD8dz6mu(0$I#f`DswKrQ$6IEyejE&^dRK zBK%~M+5mtM=zSUxgi8m7Q&=rL-43Gx!xqWT%>U8U@To4h^D-G@{t#j_6|u57>gkc& z;__62&h*_U-aNv}@Q5luVOHgGvdDFR8eR8QV!Z(;p5VCK9QuvYj(RY_Oc$Kv)6Yk$ z60hd!zrIt#U;=LOS(|m&m~c-W&6f41>euDl*esM_dbvfpjy)`Oad!&QYcAw>Zol6_ z*+cH=c59}VUN-=#+Q|RlFSIEkdk0*{3O%w56Mgqy#GT5D;8we3ey;7Pcb5?&H5on@gbejjUPPSHdcCf3m>9NP>CGN%?ZL_nPqZ z4;O&jG7fqi-rQUbm!ZXe6lK=z57QM06qG`wK{fC|$M~#;-a;s(8-M(R+O~ zaVQzZ<1Y{W52k9ErYc0tRg+#XK_dkpi7=u#P((&jZry8t{u))?`GY+LnVBb;4>PW{j4Ew- znf}sIA6oD-{@AzqIx~Trs&$C@2aBy@CPP0PBcsYzC@(kd-c^;UvW~@@^^YbKRq=3! zs}9r8sj5o-KkW2%lXZu}hi!L$uiJM&m;5~w8u)Mq-^?{$6X>~5ccvo~Mz4uE`BXOC zcT{0t&`<>lDO$j@?q~|qu-+Z9C=#1Rq0)!@C81UY;l#DUOcP!Ec2E7!TcIncM#8ZE zMsevzH~^N1;#&dL_Y&w-z;E*>j@z&zCgSnEhc~=`>Nt07=CI@s(J;O{w+_a{S9bvU*h5fKD) z^PUH-ZunYv#JtoIyBut1966##jS-al(WGSCFw1R_-ZDeT)&pS4O(LfFB%aod@_Gfm zUk)tdqc%{Y(y($viEt>6>%Cy0$%kM>7Fy>oaO}aUH{)F4Ffl?%ehz@ygg@R-?D;VvK8{pGA?A|1&Nt}Ce)jCp^{6?9|GT8?=TfU%8fj{43X_uG@N@?1Z(*Ms4hzN6ZD!$Fs@QB-M9kRt zdDY)xUR3P$B+;D)zsm2`PxD>yH~(=<}Z6qU0(zsI?6=ef=7MM{oNm(pMqdM*GT?`s0PYOyJ#L-+D2Q z#(gSf+rtw2_`LbTJm^t2!5{M;fBla$cyhlv@=1CIg~h3iG0BZ;BK2&jlS?M7+PM-2 zLv21yeQXhbmXtV_#%&8z)$5m*M7G)X(I-8=R`G~b=;!Jt9!f818ZK(D(gKmLlBc@) zju}N|l+sK%GG}j?&n^W3nKgWuCWJ2G=DF~WU>i}a&@)?}bGKEc>{~`uQYml2J4l_rYsg|2nUe3hQ^QrRvj-}jz~KZn^B zRrQSRbhMfF5xpTjTiTH&qgA!!$yL=bxPXo+W ztJY3@bbCuU{!3&Ru!+;#uCYhL4R@xvWA|qPy)WCj-h~zS=9v*@2W-=Yy&Y+a_7&MJ zqh||#!dg|?{C3R!i5(mLaupfu61L9gX1w+>*DhNMT#q-_GK7C@yu*F`>{k;%7<&&C zha|zqQCMD-Zdl#^ZTw5 zRKf1-h*jbWe&FmkAFOUxOoHwkoT2THk@s<|KfV}$-wd|jwx%B}cexB3h|XUE`W}4u zCK5ANCgBMzvEWC@rejF3s7yvaO01>6?cyMwn){x2v&f>ypV%0I13Bw@_lzXRSPu(} zh;-{*Kj#k6z&jh!QqQjx0i?t|@d;Be)hX1&^gAi7&5!p@eWxwDso{GprPB2H7vO z?*S^OMofHF^)jei?ccyb#b|-mRf8zi5^`yC2?PYAqLAsRYrw=2;<^-DV z)L)3|YsU`+3rBM|cUKD&hyS4e6tdYkxidPrfT2KEHda>F3{h?%(f@(w+`t4#jto+A zAORLT0<)BjtD8FsD+}-cmd7~LQ2$lE!@o{`%Vj~@y2mt3mjmtzz%5vpR5qQfU%G#t z4(Y&%PrQ-`G@Wd(M9dvQp2fTjH6$eL2g!*6$~oZdPV7XAN}UiZ%{D@8)~@<&8$AF; zq9x%Fq}64>kVNysbdppxpe;~LUHcd47n~uEuFTiXP0(Y;p986v$0Hu4@b;3P*sE^Qq`+FmaO!p%Ms+p2#wDw4WNgQQ6a9wUS81=zh{;z*EeHD; z_~nrj8aDo-z*Fr>XhY>>J(WWjMcK^3hV4j1xhc|4Ay(NfM?tk8A7TC*&-%Oexn%ZLJ*zM zI9m)=#L;B^WD}$KtYEqT6=y+YHUIBuMSJF(fkXfsFCI;>==C^jG0$Uy zv=-spIT?a->XTr6TcWhgMh&B!Np=l#%(BG_+iYg;+vMH;jfAuc6Z^M6l%m2-C-0Gg zGb$G2S4@#!!exIf*1h%oorO(5*tL6?!2xQQDYgyo6dwscp9JnBnGf3V4`<>J;wFF9 z?R;KWg}#XkKaNhise;KRnGVR0$ztBl&)o@T*iEs`ZFrcb4L*d8^7KF5+^g6LyE^aR zUu*k{D6QTB?H{MD|F;z6$=Kutl7V>*5SZ0%d@V@WIR9mt7Kt7!2|EevKckwHllwm- z*T2UXH3Vi^M@y%FjQjsE2@*X432|Og9!V)yNeNzFDK<7yaV{Qi9u_Ha9tjC{PIfUK zA(H=hhylX?W;Rzw-z79Xm@su*86*mdJb!d2eWyVo+bghVqB&<_VQ_@2-9cy%+An|| zTndej&J)b`x;Os-AVtp)g;M?gz=(_HMkP?-(TW7Xp~LwR1Ub+qLki}J6+ zb`Et^6AlH=^-nAs!{opm_t84T*Nbk;c1oQ=0t)o>AGP$(Ab6lrLJgh6_-9hyM1&6o z+QoRGBSaw;f<#@w9c`_@;b=lsSwGgfgBV05Y&*_w-*ad zqO+QU_aC^D+(Ga zHpt#eHtj7hJ2b*uMNC6>s}U?MfS20ULNxvl6bCl_ zLO5b5ZS#pMY5w){3e#=l{Pt1yr_Df@Y>#vpiHSM}1u(&Um3?DP@JOYM3N($l7dbSla%8h%;uFDd)p z*cYAX7s=Om#i5^=)I!lev94FhZPUr|XqxKCv(m}3ge95-W5@o+pG<8`XcuW0VT#PA z<=}%Zjn2d%D V20xxpw6@`qCSEt+Xx#I~xyiijoC56|m~(h5d#E>UVaJhyh{|NW#(eecG)#{h=kVyS0h>9tL2~+sO0H{zv15#m?ZcHIP&G ztva~cKDm3k)78qlxM>RYyk!fZ!B(?_@jTyF*3o?#(v8iFxS>Z8#gFjwe9`((E!~D| z3w{cjM*J-cUGvX3t9^AlbFzXi<1Pl%{CYwvEW+TomRT`bgFcAyi-=||=Yi&?kS%}$ ztjI$W%s#b6Um`bo^KWhY6lUPL9xwqU-8W<@@JR=*ikvTd_f1 ze#J3lBfdngtDUBU)bICjnL7uf&c?Re!3VRpj!nA8db_$aakb$Al}>EIQPP6T4}ny8 zaisw0TlBgz2HI6)GLB0O|3t^a2lCp5Jcj4yE_Wg^n|#UjfIwfkgIGyZFJK&$XFeM4 zZrAtUkrD%@Z;ywIl8P9?PT;5Tk0FaJ)V*V$)154|uaU7-_dec+$?0zH`e2!8d3lC? z)VOBGs#W+$+^2kpcKSy<_aj&iTqI<~szI_SG<%&zDT48Kp`npt(G8cNo}dY(8s*p8 z4f{L=Q^T1=$AIw4 zx(wek%z+?xfg;;%d35VFvdbpt_PoV44#07|Cah`rWpxh_e``feULbm|f>UegV>4~* zB$ep}!xG!<>tii#Az4_G7sr(m@_w=HQ^I)G8CNt{ezdkrwQ)_RXm5Y;b%)3Vm-EG8 zc6d&c!ms-_S60qwtWpkKIjf-!F^t|}@bOka#cXUN9QCW#DynTSDIJ5XB7Lyvm+97x zWWt|`qwpjl3lf@^X<%X~KJ+flEWLfmvc9H>4i+A-8o#D_6{MxGoH?X=za$&?M~+>F@Fil-?KM2KRkrQ!LVlLd_cA07w zn1&0cDXp=F0Q(nVToj~Lp2!f7$6JxIpC@vDqg;D2;VCTGbCS9dc7WLvM=5*e`3fpS z!!pb{v=%4PCF=#*9*j+EFk6L;ixGD)Kiqr-{~KnZHjvULCMa024NV~VSX)CQU1kVc;Z%dtkE(8pGRU? zhFRt8S{sRV;n2F+>Ao$Lds#+~{mN#(&X|9L8T^RwSK6^8Q4}DPR@^n9Uhk@v%XL_X zN1wJ61-Po@X4i1wrhbYt^ux5y;$;D|GG0EychmIr>1xchmD=fCzg* z-SM=otYTcfZ|ka9vgrWS6rU)K3m23VJu$5M=k*m#x7zPBxbvfLF2s!%8U=L}|5hOb zetjln=W9cI!t|d!7NA_*hgv027Sv-^^6-k z5I{1@*3ev`X{FF6(7uf6F^|1;U6kB%XE%qJEBEG;d5_~(&gBtRNii9Sc_>JHh8c@C zP<+L?_^AP-d`S#y(Eb_d$-g}`mitGrjEzTbTvJ(!%sd1}_oKYlY+99gCDOcpb}$Lh z_e+O^@yc$xM)t|OjvSYFXXFk{haD;~m%u_K!ZNfoqYfwVt`xgmtm$28<2V8fCsQwj z>h)nPB@xG{t^T++%)+mmR+?~3tJo5fCcj_^$LxUIfWbm~sv~1kr2bWU1+S){BOMML zoKD|a_V6)#dm6Bz`6%Qo$fSqjO~o+HgGca zPwd>a!!E`NS?oy-PVs{^;v-84ala78TZ9IuVkNyVOvJvo=_@Rlqwu*jI*N)riCu4B zV9^81_A+7XZQ|;rbIlt5P4Ixr%Q_+a{o`x{ktD2{(k|SriHT-V3=6v|F)2vkX0k~; zK+0p6%8@;uI$FY_cS%+@?j?#|5_pXU3qnYWSV91417A&oI$)RBfTB}*TDi+O>pRl{@YmrV>OJiY7fiN}5nUu`_%%642;3jPP; zPkiy^(F=tS0f1JTXn-=ol?FOZ40G8G=2W19cUZ^yzM^BZ~`HlP9|ikIQ8>?MoJ zPXj_U7OFKoK^fsvTE7HHhwKpj%1E+_K&r|BpT-xEJp)0W3gx;xArsOxb-d5P%6{tJ zw)#^6yG*v>bn{8nG?wuM`v=R+H4jFOae5FYip9HF^pJ4?8|JpBUFj9=8qM{Cz(Tl~ z;{isYU0BmCUMnBDzOvg%Yn5tUD&;?D1uvFwH^UDj~aV?)jt^3X{_(Ley zmUk%LkO>mCwA*FI-F;E`K@F9a4gT%!hbU164x(lqF zHq#12lK40n9K*I0tPlDiEi=zYcK^ZKXG2G;47^Z<-c>*rt|gVq&h8WOgp~c;NH)|H zYDk)%Vt4&pOmZ5>*er~9aGBaScc?lh`z_8RJMP9*iUAVkQplBw?X1AnI>Xxc zH&9a6JraIP(5aOvVl9x1HVp|QN=6c(qoU|rU$Bo|R`08_a;J&Cs=Ev|PEhI+PSNmk zH*e^>c}JIuc$8)#o?hmSlSFb0{W!@wTP`9Qt#(2yN4z&PPVSj3w=YPQ0Ni9i!x-`p26-@TSf@y!mWmQcfGH4zP@ARU z4@9)skB7kcgn)AvR_23zNpLcv1z`y=O^4l>w&Zk)ocju#)xVdV`brMfAjpxaQ(30k zZcTu)^djRrxS3_XaOl~=O7*;bO)*YnOOQMzYbto{6mUC*i2>6^BV-6LKTfyad6I(9rt%&->)864M9FQJD(o!_seVb%$ok*nLs|> zuC1#1-kEsQB+5J@$=7q+HZQ8}>ISJ(TifOA_%@}P6@ApAeY1_MVZz~jpwQny!gAlz z@NENcEA%j6#+Kop@4f!^_n_f>Gox+ALJ9c6MiW8In9#5_lbp((^Pp-EgrmcGdBHVd zm+@tY+trv{Vv^luv$%IY33zN?&)4IX;GUsA_G?}JU{Cc^2!{|chs@CaB1ybjBk!t= z$JI35dNFf}(UrK6dreFUaQ*LQ-&Kq?2;oKEeRF5@lQmEhn!?ST8u{I5fR3Ics|7D( z-kANpzzP}ezSNddro_4x_5(2-OY_ES%i-=RM2(lp@z9 zCr14qdT3%|S_?d6iYK7-&5gINC&N)R4XL|Nf~TsV;m#+1FiZc9iMq--Q(rP|2Lo%^ zLY@xe1m(oeE2!(5Nk^du5fT$XjBH;mZfFUm$0ixijv!XrFFh|J7-KU01zR`JG^6oU zTp>3ji&2T;1TGl@^yEG!$0p7w)E|00 z$F`rRZs7%TH#Xe`vr)OVSGS~-<3R96tJ0vZtG&@^fp2r~Mb$jcW-Xo55OK=+$IVeH zMVFC_V|gialsKlIr3t&ZHI0j)P-M_6Bu7_IJgeJ}-w@sb&z($vWtwq4opOCG=rN5xQL4kb?Q;;2g-ogKcRLFAzn;C{~)mgmvkTLVz zSqV_+(^n(YLVN7rQ{T|f$)7D9^N)p4?BIZ;->e_lilEc&vHo(P`3%bpqzSIGkOsp` zmOLwBdZP$c)8lym0-U;~Mv$Qfb|3xe<`@?pQ7WVXA_+;z5}J4SNv$=Npc(EO{WaoV zufK0`?EoI8`EEX$*IG*PR++W+Lq&C;f6qPC5>iEoRsR~x31B_#yR6DiNPG1#77+#y z7B!pNvAa|v_A$>8VRF3&P;A=r=_C9K*x%^V|q5|pC6v)DidAP=M-Gg9*^Fhqd)k#MC9-5eVF&~1gK0{65<*y!@uBLdBxtpihI>N-PY1hxu( zQRlN7B6dXM<4)>JQMO9LCxCE_PxNocaK%-d9@PA zHn!&{RzhsKxxTOxt8ky}@>wu(PFCPYRH1H&FNvwfiSFGyOepd!jy3hkjN>MPL8Wh<} zNd1Ol696^+SBs*G8-Ld*vg+@rC9)nWv@ARrUlJDEBAU z6;RYhSz$maBEC=8?1L1W={~@vsC0XFPL5`f)f=t};C_eNoOCgW&fEHe#C>~CEvq)6Hu!LGXlCJSFU@>t%=GTkYoDq!N}^R^p`g zVm5s9-)GcaO4~l29zx^39|;9Mkn_#Ly^3=7#jE_(4{&!8`!SXY&|VhG8cJj!&>hs$ zg41g0imEOCHy>!}!Jr1&q`UaJIN78$9cx^CKouu{iFws^zW}}X3eSr;!Su-(@=0gFFBNEb&7)_VW-Vng3_cDA`OWFh>KH!$hYfzeno-Sh6?L-faZ|4>6R@ff?Ccu_bx8WA(E&~$0P9o zDzp5#WIQ`LF|mBX)9Gj@oIElFMdGIR(FYj6vao6yLObD|pUhv(cX@u~Xj*UClaf=a z{P!UlM5Cxtu5CV~e`GTE0*3k6i@JH534ZYg?zMH&)YmZg(0hynXPb=YP#U2sgt%An z2#Xb#xXE)xeB5Ub*cR9gBj$`*(9h6GXo!bKSZm@rgOjD{n@!YBWO^KEyGNr-*;KJh zZEaQ?IqRL5s44!AWkXM}XXz=gsVv07!K1_d!lLAK3&ox?+x7giojP|fo+Udy(Z888rqBBv#tkrJyMllct%-L|2j$Eu)1{SSf)91PD|2-#U(cXZ>w1uTs;Mlkekz z$IPX~OxSiB^FBmvC&x5g%9KqZV!+6nqtK*uz=A&Hbp}BKx)ntBIwcnwmyj&3MR7bo z6|-bDXR{yBc1_m9IUWa`%uZWp+lUx~vVfYx%&f(OzkpZafV#BV_b4N_(@4uR~MFapaAsD5%q;drAkcPMvd&wsaow=#OWmLZ+*>P$K6rzTW@T<%w8>fe`%?&EWdPvkb zPCKo5`%DxR0>TX!2m652kqFa2EqOP0@w}V8EMkmgZ&zROo#%cmEb<9bn%YoFgN`;} zn^;sTnW{P4Ofox>`>8NXOWU{DVrIoPQ@Vms?z5%Ivh)oI%MAxVFMOfR!Z9Z<4yz;^XPSei+v(=m;-BQLr^ z?$3uI1`Jc_zlH6Y=mT2;jkQ4&3k_0>jA-RR&whc~pH^Xy4#}D84ArY|&6^q9WxA$- z-H+yohU&^`DI1M$OUu(Kv0-!rBg><`!0OjXQzV!$Vr-h&^1$D7{~A*Ws?7?PBqN(W z7In)uE?tRXNehU=7x0&tu?9>?Pp9E1QECeO>~q zX(oX1&djI^Vyt(;iUlXgeXCB0d1`1}JCoiFf%BN{>U^%)4Y`ai^;WXg6-cby`K1e!DT&&28z%pOqcpt=IHS z1q2*DXYF9L>rH7g)(wDp=nfe&$&Mt*0zK>=#DInUHk@R4v{%PhXX7Z<5xG)lOM7>r zwTk7d_bKB&Wiz#hWLSuBwWdg}o|gHk*a813fXs+`z_CWY?tJ|7fVmM%-7QG`ESzcD z%k5-}@mDh)OgFdE9qi%kjPW~sjJVsU^~5(QU3g&ef7&B3ZmyKR5D4@J9nU&fFm9gz z5R3-08-IIHmJ}2VaP*coeNYl~P`3XiH%_4XFyJga9RFi#G4%nxi}**m{u9M$v4sX> z)BZ8TnW?=!w%+d1Cc4-90{J(o>K^p@Pl?7w| zue}Z(q6q$<{XaWuu_1v_GY4gB`A4!1!B{w2j7}lS!Bf;pAkbO}FCo&v!C1Ib{%Pnf zQ#TNa;3+*J5NIj(FaU*=?k|X6E%bjOcEG?`c>X6PW&90-J|*W1BCKWk`JbL*c>1rx zKk=Ur{t^P?f8s6w;+EhktvCNd_+Wr!@D$^Rf1!U+Hl+&&0N+9g2@w33|NKw zfJlKt0HC#?68;C)BK?D`qyQ3d6dq<~SSC3$I|~;}Vs;*8w*U69SGs!68_o0G&o#Sd z57}W-KHzLwYza(UcJq{V3+z^uSmE-#Kawak<2Q~!-rRElQUUqoveHzV7Px)8#&x=U z@BRP^4NG;kpxO4^=yKSb(|&Xjm^W*^rHXelx42TB_7Qz0mT0pLS*O$$k|{)RfxzLOi)U@@mweeuK@OUz=3;Rxq!vFc3mN z&cwKqUJl@JZ!yT@%k>vhA5W2M3cNi+-ST7W^XaJItkk(t5$S?N6jVoq^BEFha$}`r zrDAVI0Dil4A`1p$@9y|TaZ1oQIpgNhOgy8u%<>dZRG`Lz&!<(64u=&+O~5mo@% zd<(k3j$TP%?N3m9m_}|+qODwFd_bM1B$zqrCOZR4b7e+6V zl%CJV-Vg>fz97hie>lMiqC>&qNeWf#P~&h|r7jQv9)gKET?WP%6{~#dL{6|@N&1~Q zD94-6wD1sqty%XqcTu0HRJ7E}?d^q!QbYl~!fx{^2#IMnKG?q}FOi}#O z|8xL0d@UhSzc>5iavh6Os)i5I=^ zRon<@*3F;xiY8g=&xxXP`F?rLHByk?kFI0Ndk+5o4KZ}~2@Pw2DvZmJXiIy z=QzZbw{`Wl>8k}WZ#I&WVlD)!P9D#YkOYMmz2$v(1@OBY|R7aW1{_F2p}*tqB{T5n;{X zXeL7ze_5iT@ZmL4nM3^GAqVm}>ibH9JL9IO6iN!%l{GgbC&kgj%@|32T1p+C`HKZ` zNZ*y67>d!3bj*!Tv8xD`SnrfCISxC@4>eaC>WafTtV!t#T{_E}i~4`}%y7LU?&3!_i_ ziSLN`dg!c2mv@1q?s!a*eg`e>UwCM8N;|U`dMSHCdG4(s^Am8WJfGr|oly9Qp$sOD zN>La7;G&Bbz-<_*u(TUtHDOJ>-)GdhZ@2=3QV%^*pShRxoO#vJEGm8gKNuER&J#8$ zeBvXOh(Q*$x=fh9UkkLy|Er1n!@6GQu;37z?^-UzKz4{@>^3uzu2S22uz~U&Wy`CO z7OrFA3{@p{0S1kX0KL%UMO1Ffy=4lq)f_EF+uD6Xh`o3lxK1e2wkBtL-Ev%XVdGh7 zW18gQ)R&2w99d;SH_|-rWV95qUjBIle7(>)mNo&$!m*WQUj&+!bI`2w>C2nj5t>{N zIBRE*)Y}AaRypLXLXI2JHU)3WD(f#0)~(GM&p?^tiw`CCE%e68F>v3r4877ku{7UM zO&xQNPv{Kra42KTGx4poj!I{{?sW$npQO|V>r&|;HjN1y+PP=NBPe^o~b-ud-vkEC~wBeWs7TqJr9 zgI0hEmnf2dssB_|Bfq;A?iABymG;=d}w=MUE8s*^+1WB4#aYkA} z;Td0;qI=S;ofQveuFQa|Yol7DO)&mMLVi<<%4A;p+VZ{K42j=CSd`BpoeMKvy5b^d z(-fk3xg(ITs8IcmxFjHxzqotVOt!w!$LA~6fUA>M$3SVWU-9o z4?dt=e;P!WOY>_urpm1&=FHH&hK?^M3XRMI8uumNs&M9UY)kUyK>sRn23{_|9G7qgF8VE!CIMq*SQBJ_ZPKb}AU7ysByLc60r z^8!5K)O{0{mE8o!NMcRvkqOyzM_sASZNrT%fT(J5WC3eYky!+ zzr~9^y?QCNBMhh#8f+XcdP-N8x`+M*-bhoJB;@2Q0alFoA7jBL32?Cke|;uio#zzB6;SNATj_$+v^LW?b>8WQHr}T$_gm$W(IV%0p+7psfd-p5STX zIzs3Mq%Wsbtm@PzW_Yor86-wHX)|63THQO{ySHUVr{s*Qx4ku%UN}9jbapa)(|^-` zGYY1BGkohX9Y2$5jG+C@x$UY+x~}iLXt|m8{k^y@bY1Tsh~V>(ftjak4qoHU*?T0w zS=x_?$45&|6Fo+t~%(4|Jm5xo)wLi;iwUB zc)YIsP@LiYhCBWXYy5d{7sW!yjF~QO03TF1j5*<@8~d`vETv)nI9s68`^)Q zr_QEeXMPBvVIF>BbPCKdByd` zBR9nqQ}1pr*U@gG@UmHLFHP-Dzy0^F3X#3Gi|aLQsk>-^3_DIjzdN%OF4XZf=Lp(Q z%WIf+VIS$gUJu;(#0U3w3gJ4>f>$$i<<7;sGU+G2@snZdo`wm&slQvZYR)5KRLyM| zNtp~>|4K#9{pJswc-;ru;7*csJKS${V2#)}*tBn8k&_biz^K0b)}Fztys`h~+ao`f zUd4Eqs=G(S)6c!uOvxrcO3214)^bCPW3(LHF;`4HrANy8e zq?*Yl#;UjcyoCPnY_H5R`U?CV$whuwM?-#>nArW zN{wx=H%uxDnfdL2xK5JY>HE2Rr46po3XZ>pZGfi$`&H2gvxMkC=xEHn_emUbT0XIh zg11|5%)VU9O)U!S_I|irY0zA=!{OLIrWUYGcA@;Fl^l}+HE}NH+bWz*4x!$W;YRt5jd~N-s=^Ko zXFi&`4a5gfzETJ4;6dxdk%=_;`sT5mLRAnEeqy8BmJ_$AjB^9TV4y1pX(+gBZ|u^? zmnBUP1vz0tmNvxVrSH_+$UU)%yo{lx?S--M;Dc@N$GejX_0}}C+Gxa%yq(y|MdRh6 z-;{KW3C1IWJ#h8jcx~X_q!Lky>meE2nzA}Ey}upkC?%w!wdN;Xbh=J+c$Re%>n@Jm zb`_rMbxap9R(Es$)f831VPf59WOt1ZRg+>m9f(ELE;uGAP%IU$@$mw;(v|l^8x@QiqED-=bz_LZ{gjX&Lss(kou0{^Ko9l630;YfW zwJK1p%jjOJpbH7^makDoi(w&Zj#Hxay4ZC6*VO6oYaHgC*G zUaTXcrEZ)zzk38EhFEuW2^cJBjQTelXEZZj7F7Mv-KcuJYOe=xQU8oIWHozU!z%;| z=&p5ib8F2Q-y^z@X1S-jOevi(tePd@fA^my`O*X!Eo%wSR@K;Om*XfUanqumA+)NgX zBI>8F^D4``9fgbU!y{|fmGEAw2;P8I>$MGgfH8{g*v=s~zb6jK&ZFst9C^ml>Xo~z zu2jwltstyBIsks){?1GCQwYKR5$EgFd)e96pdPCa8W@KsG#_y= z@Lh}_R~5?HrR!;EKe|?VCc#cBkpB1$veDy5p$JYtEsq#2RUPq;Dv#Ew zONO*B`?U|bClI(JFfHofEaDF6Id|!QX1e!f7SMC%ZU$6x!G)oD*Y1N7i}$<7oSZz* zXdDck>&NL|e_ora8^jEDk=g+b&VJ6jMsD3OQfanCT-v^$6Z-x{8!48B%{~gBhb9j# z$BmVa4Gs0p%}v);)Uf{bH%+jTPOhL~<09tKn0%i>%sV1_nsy8UJQamunaEEgiEtj{ z)Ld#McjHmL@~oNDJvW48%fx>vmQaXDhyte{TXzG9X?$HZvs#CY zQ>XcBAj^TShzlOX$1E89k{0ijS*G&y%>`S${mMFQyh?tp4O>h6Z7<6Z11H!;OI%&_ zM`b=|E>cD5VF_2GdLRBBkhazqP%E*IZ#nG(Q9ezMs+Bih4O56*Z>?`!4YpkTfTNX& zBWB~=5Ny%Fsnse!pYkZngF5_e6t`Tnu*!S+1A{Sl=m!R3E|iLqZvL|;@p5h z05dB)3v0?0H-PZJ@Rl18^KWM9#{Me zs*%GCJLayE!`pt$Pfu`=y{&vZ}Wa;IQO_epPt`TM9CZAdD9m+uM<{x+8*3ID^tqCn%S8;rxd?IAcAqQr4aG~NP+BnuuQ5}US`Cs9PF@6n#4LR#B9VY|3npg zdzXJA3o$dX3M`Yfow@x##`<3-My$gp$|c6e%`CyiBFZYpBFw`nAtEl$$}KF+!pYts(S!T{r?#AIDJ0Jjv_OJHS$hR#^x)?oJ3O{Y=SlgEMXqO`zN%`nRuzrq z*G1y+KiCpd-G9{&A|ZzjMnZh|jt<6Rhl`2i^FWf9es(5M-K_6yzhC}de#k-=S(OsG zy~@SB3>^&!8lsOyWOd6jB?i5U)FZ9AO z!7b~N#~(>?`}ExeAfp3W98mEA?Ke2bS7cdaS5SJnYa(Nkwyq%fU<|n!jdjNHi=aPf zO8V%{kWK-=cbQL{l^$f?AAy0O%+SGFZt3h`F1gBZk7(&W5DuGi&)85U)NOLZO{~N9 zFb-f2eR7fud{=w`7lC5#gim^te%90`=Gt<}bMeW*2cazGiA7ls?X48ddvVe+=CT$D zyLi=zB4kZ;GIU#G%L7D^VnV7DSE%s)bD0NS`=)v|ESS55HK!d79% zSpbJ8YUY*ns5Jh8VEoKi-*|2JqSg3N^aY6|IW-)X`aeg+#mUUb_CJTk-uM@+vx||F Yi-(h$IV>AHD?2w2ECq$Qq6F;!0ixBijQ{`u diff --git a/dataset_description.tex b/dataset_description.tex index df38a9a..10f764c 100644 --- a/dataset_description.tex +++ b/dataset_description.tex @@ -453,18 +453,18 @@ reports this on the proof-of-concept slice after primary-custodian scoping, multi-book fetching and per-fund segmentation. Because the baseline requires an \emph{exact substring} match within the fund's \emph{own} section, its recall is a strict lower bound: a fund's adviser, for instance, must be named in that fund's -segment under a literal spelling. With the right prospectus books present, the -adviser is recovered with recall $1.00$ and the micro-averaged $F_1$ reaches -$0.65$; the recovered custodian recall is $0.63$ (up from $0.07$ unscoped and -$0.37$ after scoping alone). The residual gap from $1.0$ is attributable to -surface-form variation (``State Street Bank and Trust Company'' vs.\ ``State -Street'') that a trained model handles but exact matching does not. +segment under a literal spelling. On the full quarter the adviser is recovered +with recall $0.93$ and the micro-averaged $F_1$ reaches $0.79$; the recovered +custodian recall is $0.63$ (up from $0.07$ unscoped and $0.37$ after scoping +alone). The residual gap from $1.0$ is attributable to surface-form variation +(``State Street Bank and Trust Company'' vs.\ ``State Street'') that a trained +model handles but exact matching does not. \begin{table}[h] \centering -\caption{No-model string-match baseline on the proof-of-concept slice, after +\caption{No-model string-match baseline on the full 2025\,Q3 build, after primary-custodian scoping, multi-book fetching and per-fund segmentation -($141$ samples). Precision is $1.00$ by construction; recall is a strict +($852$ samples). Precision is $1.00$ by construction; recall is a strict exact-match lower bound.} \label{tab:baseline} \small @@ -472,15 +472,15 @@ exact-match lower bound.} \toprule Relation & Recall & Gold edges \\ \midrule -\code{advisedBy} & 1.00 & 513 \\ -\code{subAdvisedBy} & 0.94 & 232 \\ -\code{seriesOf} & 0.86 & 496 \\ -\code{administrator} & 0.84 & 730 \\ -\code{transferAgent} & 0.79 & 537 \\ -\code{custodian} & 0.63 & 601 \\ -\code{underwrittenBy} & 0.40 & 144 \\ +\code{advisedBy} & 0.93 & 1{,}673 \\ +\code{seriesOf} & 0.84 & 1{,}555 \\ +\code{subAdvisedBy} & 0.84 & 946 \\ +\code{administrator} & 0.80 & 2{,}066 \\ +\code{transferAgent} & 0.72 & 1{,}721 \\ +\code{custodian} & 0.63 & 1{,}761 \\ +\code{underwrittenBy} & 0.62 & 863 \\ \midrule -micro-average & 0.48 & 1{,}194 \\ +micro-average & 0.65 & 6{,}479 \\ \bottomrule \end{tabular} \end{table} @@ -497,37 +497,42 @@ quantified rather than assumed. \section{Corpus statistics} % ==================================================================== -Table~\ref{tab:stats} summarises one quarter (2025\,Q3) of N-CEN gold (after -primary-custodian scoping) and the proof-of-concept \emph{per-fund} samples. The -gold graph holds $15{,}739$ entity-to-entity edges across $435$ trusts and -$2{,}421$ funds. Fetching all prospectus books and applying the robust segmenter -yields $141$ samples whose per-fund median input is $\sim\!3.7\times10^{4}$ -characters against a $\sim\!6.5\times10^{2}$-character target: a median ratio near -$55\!:\!1$, the inverse of the symmetric-size benchmarks. +Table~\ref{tab:stats} summarises one full quarter (2025\,Q3) of the dataset. The +N-CEN gold graph (after primary-custodian scoping) holds $15{,}739$ +entity-to-entity edges across $435$ trusts and $2{,}421$ funds. Fetching all full +prospectus books for every trust ($2{,}326$ filings across $393$ trusts; $42$ +closed-end or interval funds file no standard prospectus) and applying the robust +per-fund segmenter yields $852$ samples ($659$ cleanly segmented per-fund plus +$193$ whole-trust fallbacks). The segmented samples have a per-fund median ratio +near $117\!:\!1$ (input prose to target serialization), and across all samples the +median exceeds $400\!:\!1$ --- the inverse of the symmetric-size benchmarks. \begin{table}[h] \centering -\caption{Corpus statistics. Left: N-CEN gold graph for 2025\,Q3 (primary-custodian -scope). Right: proof-of-concept samples (multi-book fetch, per-fund segmentation).} +\caption{Corpus statistics for the full 2025\,Q3 build. Left: N-CEN gold graph +(primary-custodian scope). Right: text-to-triple samples (all prospectus books per +trust, per-fund segmentation).} \label{tab:stats} \small \begin{tabular}{@{}lr@{\hskip 3em}lr@{}} \toprule -\multicolumn{2}{c}{Gold graph (2025\,Q3)} & \multicolumn{2}{c}{PoC samples} \\ +\multicolumn{2}{c}{Gold graph (2025\,Q3)} & \multicolumn{2}{c}{Samples (2025\,Q3)} \\ \cmidrule(r){1-2}\cmidrule(l){3-4} -Trust graphs & 435 & Samples (total) & 141 \\ -Funds (series) & 2{,}421 & \;segmented per-fund & 135 \\ -Entity-entity edges & 15{,}739 & \;whole-trust fallback & 6 \\ -\;custodian (primary) & 3{,}045 & Per-fund input (median) & 36{,}856 \\ -\;advisedBy & 2{,}588 & Per-fund target (median) & 654 \\ -Distributors & 458 & Ratio (median) & $54.8\!:\!1$ \\ +Trust graphs & 435 & Samples (total) & 852 \\ +Funds (series) & 2{,}421 & \;segmented per-fund & 659 \\ +Entity-entity edges & 15{,}739 & \;whole-trust fallback & 193 \\ +\;custodian (primary) & 3{,}045 & Trusts fetched & 393 \\ +\;advisedBy & 2{,}588 & Prospectus filings & 2{,}326 \\ +Distributors & 458 & Ratio (median, per-fund) & $117\!:\!1$ \\ \bottomrule \end{tabular} \end{table} -The full quarter therefore yields on the order of $2{,}400$ fund-level graphs -from $435$ prospectus filings; multiple quarters and the dropping of ontology -subsets (per the thesis's augmentation strategy) expand this further. +\paragraph{Train/validation/test split.} Partitioned at the trust level by a +deterministic hash of the CIK: $655$ train, $122$ validation, $75$ test samples +(from $268$, $37$ and $36$ trusts respectively), with \emph{no} trust appearing in +more than one split. Multiple quarters and the dropping of ontology subsets (per +the thesis's augmentation strategy) expand the corpus further. % ==================================================================== \section{Use in the thesis experiments}