Compare commits

...

2 Commits

Author SHA1 Message Date
Giò e05a88c578 Restrukturierung 2024-07-06 19:50:03 +02:00
Giò 55cae6d83a Restrukturierung Repo 2024-07-06 19:49:36 +02:00
104 changed files with 102 additions and 478 deletions

134
.gitignore vendored
View File

@ -1,115 +1,19 @@
# ---> SublimeText
# Cache files for Sublime Text
*.tmlanguage.cache
*.tmPreferences.cache
*.stTheme.cache
# Workspace files are user-specific
*.sublime-workspace
*.sqlite
# Project files should be checked into the repository, unless a significant
# proportion of contributors will probably not be using Sublime Text
# *.sublime-project
# SFTP configuration file
sftp-config.json
sftp-config-alt*.json
# Package control specific files
Package Control.last-run
Package Control.ca-list
Package Control.ca-bundle
Package Control.system-ca-bundle
Package Control.cache/
Package Control.ca-certs/
Package Control.merged-ca-bundle
Package Control.user-ca-bundle
oscrypto-ca-bundle.crt
bh_unicode_properties.cache
# Sublime-github package stores a github token in this file
# https://packagecontrol.io/packages/sublime-github
GitHub.sublime-settings
# ---> JetBrains
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf
# AWS User-specific
.idea/**/aws.xml
# Generated files
.idea/**/contentModel.xml
# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml
# Gradle
.idea/**/gradle.xml
.idea/**/libraries
# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/artifacts
# .idea/compiler.xml
# .idea/jarRepositories.xml
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr
scraper-python/
# CMake
cmake-build-*/
# Mongo Explorer plugin
.idea/**/mongoSettings.xml
# File-based project format
*.iws
# IntelliJ
out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Cursive Clojure plugin
.idea/replstate.xml
# SonarLint plugin
.idea/sonarlint/
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
# Editor-based Rest Client
.idea/httpRequests
# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
/.phpunit.cache
/node_modules
/public/build
/public/hot
/public/storage
/storage/*.key
/vendor
.env
.env.backup
.env.production
.phpunit.result.cache
Homestead.json
Homestead.yaml
auth.json
npm-debug.log
yarn-error.log
/.fleet
/.idea
/.vscode

View File

@ -1,2 +1,84 @@
# ConsultancyProject1_Auslastungsmodellierung
# Web Scraper e-domizil.ch
Das Repository enthält eine auf [Laravel (Version 10.x)](https://laravel.org) basierender Web Scraper für die Plattform e-domizil.ch.
## Installation
Vorbedingungen für die erfolgreiche Installation sind [Server Requirements](https://laravel.com/docs/10.x/deployment#server-requirements)
1. Das Repository klonen
```bash
git clone https://gitea.fhgr.ch/dianigionath/ConsultancyProject1_Auslastungsmodellierung.git`
```
2. Die Applikation mittels Composer installieren
```bash
php composer install
```
3. Eine Kopie der Datei .env.example nach .env erstellen und den Dateiinhalt bez. Datenbankverbindung anpassen.
```bash
cp .env.example .env`
```
Bsp. für Verbinung zu einer SQLite Datenbank.
```yaml
DB_CONNECTION=sqlite
DB_DATABASE=/absolute/path/to/database.sqlite
```
4. Mittels Artisan Console die Datenbank initialisieren
```bash
php artisan migrate
```
Erwartete Ausgabe:
```bash
WARN The SQLite database does not exist: /home/gio/database_test.sqlite.
┌ Would you like to create it? ────────────────────────────────┐
│ Yes │
└──────────────────────────────────────────────────────────────┘
INFO Preparing database.
Creating migration table ...................................................................................... 31ms DONE
INFO Running migrations.
0001_01_01_000000_create_users_table .......................................................................... 57ms DONE
0001_01_01_000001_create_cache_table .......................................................................... 18ms DONE
2019_12_14_000001_create_personal_access_tokens_table ......................................................... 36ms DONE
2024_03_15_142227_create_regions_table ........................................................................ 10ms DONE
2024_03_15_142228_create_seeds_table .......................................................................... 18ms DONE
2024_03_15_142257_create_properties_table ..................................................................... 17ms DONE
2024_03_15_142550_create_extractions_table .................................................................... 10ms DONE
2024_03_15_142625_create_exceptions_table ..................................................................... 10ms DONE
2024_03_15_162023_create_jobs_table ........................................................................... 18ms DONE
2024_04_08_115153_create_failed_jobs_table .................................................................... 32ms DONE
```
5. Gewünschte Region(en) mittels Artisan Konsole hinzufügen:
```bash
php artisan scraper:add-region
```
Mögliche erwartete Ausgabe:
```bash
Type in desired region:
> Davos
Choose desired region:
[5460aea91d044] Davos
[5390628eeaa24] Davos Davos Platz
[5460adf3d7913] Davos Clavadel
[565847a969c59] Prättigau/Davos
[5460adf87857d] Davos Wolfgang
[5460adf8f3e46] Davos Monstein
> Davos
New Region created {"name":"Davos","updated_at":"2024-07-06T17:24:09.000000Z","created_at":"2024-07-06T17:24:09.000000Z","id":1}
New Seed added {"uri":"https:\/\/www.e-domizil.ch\/search\/5460aea91d044?_format=json","region_id":1,"updated_at":"2024-07-06T17:24:09.000000Z","created_at":"2024-07-06T17:24:09.000000Z","id":1}
```
6. Zum Schluss sind Cronjobs einzurichten, welche den Webscraper regelmässig ausführt
Alle drei Tage um 02:00 die Scraping Jobs erstellen:
```bash
0 2 */3 * * /usr/local/bin/php ConsultancyProject1_Auslastungsmodellierung/artisan scrape:jobs
```
Jeden Tag alle drei Stunden zwischen 04:00 bis 23:00 mit einer zufälligen Verzögerung bis zu einer Stunde den Queue Worker für das Abarbeiten von 250 Jobs ausführen.
```bash
0 4,7,9,11,13,15,17,19,21,23 * * * sleep $((RANDOM \% 60))m ; /absolute/path/to/bin/php /absolute/path/to/artisan queue:work --max-jobs=250 --stop-when-empty --max-time=7200
```

View File

@ -1,44 +0,0 @@
// Use DBML to define your database structure
// Docs: https://dbml.dbdiagram.io/docs
Table seeds [note: 'Table contains the URIs which are used for the initial scraping.'] {
seed_id integer [primary key]
uri text [not null, unique]
region_id integer [not null, ref: > regions.region_id]
}
Table regions {
region_id integer [primary key]
name varchar(255) [not null]
}
Table properties {
property_id integer [primary key]
property_platform_id varchar(255) [unique, not null, note: 'uuid from platform beeing used']
seed_id integer [not null, ref: > seeds.seed_id]
check_data json [note: 'for storing data, which is beeing used for consistency checks. E. g. geo_dates or title']
last_found timestamp
created_at timestamp
}
Table extractions {
extraction_id integer [primary key]
property_id integer [unique, ref: > properties.property_id]
body text [not null]
header text [not null]
type types [not null]
created_at timestamp [not null]
}
enum types {
property
calendar
offer
}
Table exceptions {
exception_id integer [primary key]
exception json [not null, note: "exception while scraping (e. g. HTTP error message) and called url."]
type types [not null]
property_id integer [not null, ref: > properties.property_id, note: "either a property_id"]
}

View File

@ -1,204 +0,0 @@
<svg width="100%" height="100%" viewBox="0.00 0.00 2466.42 1017.92" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="width: 100%; height: 100%; max-height: 1018pt; max-width: 2466pt;">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 1013.92)">
<title>dbml</title>
<!-- enteties -->
<g id="enteties" class="node">
<title>enteties</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="1004.74" cy="-130.11" rx="185.02" ry="130.22"></ellipse>
<polygon fill="#29235c" stroke="transparent" points="875.74,-160.11 875.74,-220.11 1133.74,-220.11 1133.74,-160.11 875.74,-160.11"></polygon>
<polygon fill="none" stroke="#29235c" points="875.74,-160.11 875.74,-220.11 1133.74,-220.11 1133.74,-160.11 875.74,-160.11"></polygon>
<text text-anchor="start" x="886.47" y="-181.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enteties &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="875.74,-100.11 875.74,-160.11 1133.74,-160.11 1133.74,-100.11 875.74,-100.11"></polygon>
<polygon fill="none" stroke="#29235c" points="875.74,-100.11 875.74,-160.11 1133.74,-160.11 1133.74,-100.11 875.74,-100.11"></polygon>
<text text-anchor="start" x="910.49" y="-121.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#1d71b8"> &nbsp;&nbsp;&nbsp;property &nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="875.74,-40.11 875.74,-100.11 1133.74,-100.11 1133.74,-40.11 875.74,-40.11"></polygon>
<polygon fill="none" stroke="#29235c" points="875.74,-40.11 875.74,-100.11 1133.74,-100.11 1133.74,-40.11 875.74,-40.11"></polygon>
<text text-anchor="start" x="892.69" y="-61.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#1d71b8"> &nbsp;&nbsp;&nbsp;occupancy &nbsp;&nbsp;&nbsp;</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="874.74,-39.11 874.74,-221.11 1134.74,-221.11 1134.74,-39.11 874.74,-39.11"></polygon>
</g>
<!-- seeds -->
<g id="seeds" class="node">
<title>seeds</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="1679.25" cy="-800.11" rx="232.78" ry="172.57"></ellipse>
<polygon fill="#1d71b8" stroke="transparent" points="1517.25,-860.11 1517.25,-920.11 1842.25,-920.11 1842.25,-860.11 1517.25,-860.11"></polygon>
<polygon fill="none" stroke="#29235c" points="1517.25,-860.11 1517.25,-920.11 1842.25,-920.11 1842.25,-860.11 1517.25,-860.11"></polygon>
<text text-anchor="start" x="1574.82" y="-881.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seeds &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="1517.25,-800.11 1517.25,-860.11 1842.25,-860.11 1842.25,-800.11 1517.25,-800.11"></polygon>
<polygon fill="none" stroke="#29235c" points="1517.25,-800.11 1517.25,-860.11 1842.25,-860.11 1842.25,-800.11 1517.25,-800.11"></polygon>
<text text-anchor="start" x="1528.25" y="-821.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">seed_id</text>
<text text-anchor="start" x="1640.3" y="-821.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1733.45" y="-821.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="1517.25,-740.11 1517.25,-800.11 1842.25,-800.11 1842.25,-740.11 1517.25,-740.11"></polygon>
<polygon fill="none" stroke="#29235c" points="1517.25,-740.11 1517.25,-800.11 1842.25,-800.11 1842.25,-740.11 1517.25,-740.11"></polygon>
<text text-anchor="start" x="1528.25" y="-760.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">uri &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1740.59" y="-761.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">text</text>
<text text-anchor="start" x="1792.16" y="-761.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="1801.06" y="-761.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="1517.25,-680.11 1517.25,-740.11 1842.25,-740.11 1842.25,-680.11 1517.25,-680.11"></polygon>
<polygon fill="none" stroke="#29235c" points="1517.25,-680.11 1517.25,-740.11 1842.25,-740.11 1842.25,-680.11 1517.25,-680.11"></polygon>
<text text-anchor="start" x="1528.17" y="-700.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">region_id &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1694.81" y="-701.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<text text-anchor="start" x="1792.61" y="-701.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="1801.5" y="-701.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="1515.75,-679.11 1515.75,-921.11 1842.75,-921.11 1842.75,-679.11 1515.75,-679.11"></polygon>
</g>
<!-- regions -->
<g id="regions" class="node">
<title>regions</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="2203.16" cy="-830.11" rx="255.03" ry="130.22"></ellipse>
<polygon fill="#1d71b8" stroke="transparent" points="2025.16,-860.11 2025.16,-920.11 2382.16,-920.11 2382.16,-860.11 2025.16,-860.11"></polygon>
<polygon fill="none" stroke="#29235c" points="2025.16,-860.11 2025.16,-920.11 2382.16,-920.11 2382.16,-860.11 2025.16,-860.11"></polygon>
<text text-anchor="start" x="2088.95" y="-881.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;regions &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="2025.16,-800.11 2025.16,-860.11 2382.16,-860.11 2382.16,-800.11 2025.16,-800.11"></polygon>
<polygon fill="none" stroke="#29235c" points="2025.16,-800.11 2025.16,-860.11 2382.16,-860.11 2382.16,-800.11 2025.16,-800.11"></polygon>
<text text-anchor="start" x="2036.16" y="-821.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">region_id</text>
<text text-anchor="start" x="2167.74" y="-821.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="2273.36" y="-821.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="2025.16,-740.11 2025.16,-800.11 2382.16,-800.11 2382.16,-740.11 2025.16,-740.11"></polygon>
<polygon fill="none" stroke="#29235c" points="2025.16,-740.11 2025.16,-800.11 2382.16,-800.11 2382.16,-740.11 2025.16,-740.11"></polygon>
<text text-anchor="start" x="2035.86" y="-760.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">name &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="2150.94" y="-761.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">varchar(255)</text>
<text text-anchor="start" x="2332.28" y="-761.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="2341.18" y="-761.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="2023.66,-739.11 2023.66,-921.11 2382.66,-921.11 2382.66,-739.11 2023.66,-739.11"></polygon>
</g>
<!-- seeds&#45;&gt;regions -->
<!-- seeds&#45;&gt;regions -->
<g id="edge2" class="edge">
<title>seeds:e-&gt;regions:w</title>
<path fill="none" stroke="#29235c" stroke-width="3" d="M1843.25,-710.11C1936.25,-710.11 1928.4,-821.59 2014.11,-829.65"></path>
<polygon fill="#29235c" stroke="#29235c" stroke-width="3" points="2014.01,-833.15 2024.16,-830.11 2014.33,-826.15 2014.01,-833.15"></polygon>
<text text-anchor="middle" x="2015.26" y="-839.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">1</text>
<text text-anchor="middle" x="1837.03" y="-719.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">*</text>
</g>
<!-- properties -->
<g id="properties" class="node">
<title>properties</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="1004.74" cy="-710.11" rx="405.76" ry="299.63"></ellipse>
<polygon fill="#1d71b8" stroke="transparent" points="719.74,-860.11 719.74,-920.11 1289.74,-920.11 1289.74,-860.11 719.74,-860.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-860.11 719.74,-920.11 1289.74,-920.11 1289.74,-860.11 719.74,-860.11"></polygon>
<text text-anchor="start" x="871.37" y="-881.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;properties &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-800.11 719.74,-860.11 1289.74,-860.11 1289.74,-800.11 719.74,-800.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-800.11 719.74,-860.11 1289.74,-860.11 1289.74,-800.11 719.74,-800.11"></polygon>
<text text-anchor="start" x="730.74" y="-821.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">property_id</text>
<text text-anchor="start" x="890.77" y="-821.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1180.93" y="-821.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-740.11 719.74,-800.11 1289.74,-800.11 1289.74,-740.11 719.74,-740.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-740.11 719.74,-800.11 1289.74,-800.11 1289.74,-740.11 719.74,-740.11"></polygon>
<text text-anchor="start" x="730.26" y="-760.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">property_platform_id &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1058.52" y="-761.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">varchar(255)</text>
<text text-anchor="start" x="1239.86" y="-761.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="1248.76" y="-761.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-680.11 719.74,-740.11 1289.74,-740.11 1289.74,-680.11 719.74,-680.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-680.11 719.74,-740.11 1289.74,-740.11 1289.74,-680.11 719.74,-680.11"></polygon>
<text text-anchor="start" x="730.74" y="-700.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">seed_id &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1141.84" y="-701.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<text text-anchor="start" x="1239.65" y="-701.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="1248.54" y="-701.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-620.11 719.74,-680.11 1289.74,-680.11 1289.74,-620.11 719.74,-620.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-620.11 719.74,-680.11 1289.74,-680.11 1289.74,-620.11 719.74,-620.11"></polygon>
<text text-anchor="start" x="730.74" y="-640.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">check_data &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1220.06" y="-641.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">json</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-560.11 719.74,-620.11 1289.74,-620.11 1289.74,-560.11 719.74,-560.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-560.11 719.74,-620.11 1289.74,-620.11 1289.74,-560.11 719.74,-560.11"></polygon>
<text text-anchor="start" x="730.74" y="-580.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">last_found &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1131.18" y="-581.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">timestamp</text>
<polygon fill="#e7e2dd" stroke="transparent" points="719.74,-500.11 719.74,-560.11 1289.74,-560.11 1289.74,-500.11 719.74,-500.11"></polygon>
<polygon fill="none" stroke="#29235c" points="719.74,-500.11 719.74,-560.11 1289.74,-560.11 1289.74,-500.11 719.74,-500.11"></polygon>
<text text-anchor="start" x="730.74" y="-520.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">created_at &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="1131.18" y="-521.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">timestamp</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="718.74,-499.11 718.74,-921.11 1290.74,-921.11 1290.74,-499.11 718.74,-499.11"></polygon>
</g>
<!-- properties&#45;&gt;seeds -->
<!-- properties&#45;&gt;seeds -->
<g id="edge4" class="edge">
<title>properties:e-&gt;seeds:w</title>
<path fill="none" stroke="#29235c" stroke-width="3" d="M1290.74,-710.11C1400.84,-710.11 1402.71,-822.95 1506.25,-829.78"></path>
<polygon fill="#29235c" stroke="#29235c" stroke-width="3" points="1506.15,-833.28 1516.25,-830.11 1506.37,-826.29 1506.15,-833.28"></polygon>
<text text-anchor="middle" x="1507.36" y="-839.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">1</text>
<text text-anchor="middle" x="1296.96" y="-719.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">*</text>
</g>
<!-- occupancies -->
<g id="occupancies" class="node">
<title>occupancies</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="281.43" cy="-740.11" rx="281.36" ry="257.27"></ellipse>
<polygon fill="#1d71b8" stroke="transparent" points="84.43,-860.11 84.43,-920.11 478.43,-920.11 478.43,-860.11 84.43,-860.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-860.11 84.43,-920.11 478.43,-920.11 478.43,-860.11 84.43,-860.11"></polygon>
<text text-anchor="start" x="130.26" y="-881.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;occupancies &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="84.43,-800.11 84.43,-860.11 478.43,-860.11 478.43,-800.11 84.43,-800.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-800.11 84.43,-860.11 478.43,-860.11 478.43,-800.11 84.43,-800.11"></polygon>
<text text-anchor="start" x="95.43" y="-821.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">occupancy_id</text>
<text text-anchor="start" x="291.05" y="-821.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="369.63" y="-821.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="84.43,-740.11 84.43,-800.11 478.43,-800.11 478.43,-740.11 84.43,-740.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-740.11 84.43,-800.11 478.43,-800.11 478.43,-740.11 84.43,-740.11"></polygon>
<text text-anchor="start" x="95.43" y="-760.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">property_id &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="369.63" y="-761.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="84.43,-680.11 84.43,-740.11 478.43,-740.11 478.43,-680.11 84.43,-680.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-680.11 84.43,-740.11 478.43,-740.11 478.43,-680.11 84.43,-680.11"></polygon>
<text text-anchor="start" x="95.43" y="-700.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">occupancy &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="369.66" y="-701.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">json</text>
<text text-anchor="start" x="428.34" y="-701.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="437.23" y="-701.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="84.43,-620.11 84.43,-680.11 478.43,-680.11 478.43,-620.11 84.43,-620.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-620.11 84.43,-680.11 478.43,-680.11 478.43,-620.11 84.43,-620.11"></polygon>
<text text-anchor="start" x="95.43" y="-640.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">header &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="376.76" y="-641.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">text</text>
<text text-anchor="start" x="428.34" y="-641.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="437.23" y="-641.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="84.43,-560.11 84.43,-620.11 478.43,-620.11 478.43,-560.11 84.43,-560.11"></polygon>
<polygon fill="none" stroke="#29235c" points="84.43,-560.11 84.43,-620.11 478.43,-620.11 478.43,-560.11 84.43,-560.11"></polygon>
<text text-anchor="start" x="95.06" y="-580.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">created_at &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="281.1" y="-581.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">timestamp</text>
<text text-anchor="start" x="428.66" y="-581.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="437.55" y="-581.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="83.43,-559.11 83.43,-921.11 479.43,-921.11 479.43,-559.11 83.43,-559.11"></polygon>
</g>
<!-- occupancies&#45;&gt;properties -->
<!-- occupancies&#45;&gt;properties -->
<g id="edge6" class="edge">
<title>occupancies:e-&gt;properties:w</title>
<path fill="none" stroke="#29235c" stroke-width="3" d="M479.43,-770.11C585.65,-770.11 607.75,-826.42 708.71,-829.94"></path>
<polygon fill="#29235c" stroke="#29235c" stroke-width="3" points="708.68,-833.44 718.74,-830.11 708.8,-826.44 708.68,-833.44"></polygon>
<text text-anchor="middle" x="709.84" y="-839.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">1</text>
<text text-anchor="middle" x="485.65" y="-779.71" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">*</text>
</g>
<!-- exceptions -->
<g id="exceptions" class="node">
<title>exceptions</title>
<ellipse fill="none" stroke="black" stroke-width="0" cx="281.43" cy="-250.11" rx="239.92" ry="214.92"></ellipse>
<polygon fill="#1d71b8" stroke="transparent" points="114.43,-340.11 114.43,-400.11 449.43,-400.11 449.43,-340.11 114.43,-340.11"></polygon>
<polygon fill="none" stroke="#29235c" points="114.43,-340.11 114.43,-400.11 449.43,-400.11 449.43,-340.11 114.43,-340.11"></polygon>
<text text-anchor="start" x="143.21" y="-361.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#ffffff"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;exceptions &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</text>
<polygon fill="#e7e2dd" stroke="transparent" points="114.43,-280.11 114.43,-340.11 449.43,-340.11 449.43,-280.11 114.43,-280.11"></polygon>
<polygon fill="none" stroke="#29235c" points="114.43,-280.11 114.43,-340.11 449.43,-340.11 449.43,-280.11 114.43,-280.11"></polygon>
<text text-anchor="start" x="124.96" y="-301.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">extraction_id</text>
<text text-anchor="start" x="306.33" y="-301.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="341.03" y="-301.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<polygon fill="#e7e2dd" stroke="transparent" points="114.43,-220.11 114.43,-280.11 449.43,-280.11 449.43,-220.11 114.43,-220.11"></polygon>
<polygon fill="none" stroke="#29235c" points="114.43,-220.11 114.43,-280.11 449.43,-280.11 449.43,-220.11 114.43,-220.11"></polygon>
<text text-anchor="start" x="125.43" y="-240.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">exception &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="340.66" y="-241.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">json</text>
<text text-anchor="start" x="399.34" y="-241.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="408.23" y="-241.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="114.43,-160.11 114.43,-220.11 449.43,-220.11 449.43,-160.11 114.43,-160.11"></polygon>
<polygon fill="none" stroke="#29235c" points="114.43,-160.11 114.43,-220.11 449.43,-220.11 449.43,-160.11 114.43,-160.11"></polygon>
<text text-anchor="start" x="125.43" y="-180.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">entity &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="287.3" y="-181.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">enteties</text>
<text text-anchor="start" x="399.34" y="-181.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="408.23" y="-181.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="#e7e2dd" stroke="transparent" points="114.43,-100.11 114.43,-160.11 449.43,-160.11 449.43,-100.11 114.43,-100.11"></polygon>
<polygon fill="none" stroke="#29235c" points="114.43,-100.11 114.43,-160.11 449.43,-160.11 449.43,-100.11 114.43,-100.11"></polygon>
<text text-anchor="start" x="125.43" y="-120.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c">entity_id &nbsp;&nbsp;&nbsp;</text>
<text text-anchor="start" x="301.54" y="-121.31" font-family="Helvetica,sans-Serif" font-style="italic" font-size="32.00" fill="#29235c">integer</text>
<text text-anchor="start" x="399.34" y="-121.31" font-family="Helvetica,sans-Serif" font-size="32.00" fill="#29235c"> </text>
<text text-anchor="start" x="408.23" y="-121.31" font-family="Helvetica,sans-Serif" font-weight="bold" font-size="32.00" fill="#29235c">(!)</text>
<polygon fill="none" stroke="#29235c" stroke-width="2" points="112.93,-99.11 112.93,-401.11 449.93,-401.11 449.93,-99.11 112.93,-99.11"></polygon>
</g>
<!-- exceptions&#45;&gt;enteties -->
<g id="edge7" class="edge">
<title>exceptions:e-&gt;enteties:w</title>
<path fill="none" stroke="#29235c" stroke-width="3" d="M450.43,-190.11C639.01,-190.11 686.16,-190.11 874.74,-190.11"></path>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 23 KiB

View File

@ -1,7 +0,0 @@
URI,description
https://www.e-domizil.ch/search/632d3fb65adbe?_format=json&adults=1&duration=7,Heidiland
https://www.e-domizil.ch/search/5460aea91d044?_format=json&adults=1&duration=7,Davos (GR)
https://www.e-domizil.ch/search/5555b19e174fc?_format=json&adults=1&duration=7,Engadin
https://www.e-domizil.ch/search/5460aea9284b5?_format=json&adults=1&duration=7,St. Moritz
1 URI description
2 https://www.e-domizil.ch/search/632d3fb65adbe?_format=json&adults=1&duration=7 Heidiland
3 https://www.e-domizil.ch/search/5460aea91d044?_format=json&adults=1&duration=7 Davos (GR)
4 https://www.e-domizil.ch/search/5555b19e174fc?_format=json&adults=1&duration=7 Engadin
5 https://www.e-domizil.ch/search/5460aea9284b5?_format=json&adults=1&duration=7 St. Moritz

View File

@ -1,4 +0,0 @@
SELECT CONCAT('https://www.e-domizil.ch/rental/offer/', properties.property_platform_id) AS url, extractions.body, extractions.header, extractions.type, extractions.created_at, regions.name FROM `extractions`
LEFT JOIN properties ON properties.id = extractions.property_id
LEFT JOIN seeds ON seeds.id = properties.seed_id
LEFT JOIN regions ON regions.id = seeds.region_id;

19
scraper/.gitignore vendored
View File

@ -1,19 +0,0 @@
/.phpunit.cache
/node_modules
/public/build
/public/hot
/public/storage
/storage/*.key
/vendor
.env
.env.backup
.env.production
.phpunit.result.cache
Homestead.json
Homestead.yaml
auth.json
npm-debug.log
yarn-error.log
/.fleet
/.idea
/.vscode

View File

@ -1,84 +0,0 @@
# Web Scraper e-domizil.ch
Das Repository enthält eine auf [Laravel (Version 10.x)](https://laravel.org) basierender Web Scraper für die Plattform e-domizil.ch.
## Installation
Vorbedingungen für die erfolgreiche Installation sind [Server Requirements](https://laravel.com/docs/10.x/deployment#server-requirements)
1. Das Repository klonen
```bash
git clone https://gitea.fhgr.ch/dianigionath/ConsultancyProject1_Auslastungsmodellierung.git`
```
2. Die Applikation mittels Composer installieren
```bash
php composer install
```
3. Eine Kopie der Datei .env.example nach .env erstellen und den Dateiinhalt bez. Datenbankverbindung anpassen.
```bash
cp .env.example .env`
```
Bsp. für Verbinung zu einer SQLite Datenbank.
```yaml
DB_CONNECTION=sqlite
DB_DATABASE=/absolute/path/to/database.sqlite
```
4. Mittels Artisan Console die Datenbank initialisieren
```bash
php artisan migrate
```
Erwartete Ausgabe:
```bash
WARN The SQLite database does not exist: /home/gio/database_test.sqlite.
┌ Would you like to create it? ────────────────────────────────┐
│ Yes │
└──────────────────────────────────────────────────────────────┘
INFO Preparing database.
Creating migration table .......................................................................... 31ms DONE
INFO Running migrations.
0001_01_01_000000_create_users_table .......................................................................... 57ms DONE
0001_01_01_000001_create_cache_table .......................................................................... 18ms DONE
2019_12_14_000001_create_personal_access_tokens_table .......................................................................... 36ms DONE
2024_03_15_142227_create_regions_table .......................................................................... 10ms DONE
2024_03_15_142228_create_seeds_table .......................................................................... 18ms DONE
2024_03_15_142257_create_properties_table .......................................................................... 17ms DONE
2024_03_15_142550_create_extractions_table .......................................................................... 10ms DONE
2024_03_15_142625_create_exceptions_table .......................................................................... 10ms DONE
2024_03_15_162023_create_jobs_table .......................................................................... 18ms DONE
2024_04_08_115153_create_failed_jobs_table .......................................................................... 32ms DONE
```
5. Gewünschte Region(en) mittels Artisan Konsole hinzufügen:
```bash
php artisan scraper:add-region
```
Mögliche erwartete Ausgabe:
```bash
Type in desired region:
> Davos
Choose desired region:
[5460aea91d044] Davos
[5390628eeaa24] Davos Davos Platz
[5460adf3d7913] Davos Clavadel
[565847a969c59] Prättigau/Davos
[5460adf87857d] Davos Wolfgang
[5460adf8f3e46] Davos Monstein
> Davos
New Region created {"name":"Davos","updated_at":"2024-07-06T17:24:09.000000Z","created_at":"2024-07-06T17:24:09.000000Z","id":1}
New Seed added {"uri":"https:\/\/www.e-domizil.ch\/search\/5460aea91d044?_format=json","region_id":1,"updated_at":"2024-07-06T17:24:09.000000Z","created_at":"2024-07-06T17:24:09.000000Z","id":1}
```
6. Zum Schluss sind Cronjobs einzurichten, welche den Webscraper regelmässig ausführt
Alle drei Tage um 02:00 die Scraping Jobs erstellen:
```bash
0 2 */3 * * /usr/local/bin/php ConsultancyProject1_Auslastungsmodellierung/artisan scrape:jobs
```
Jeden Tag alle drei Stunden zwischen 04:00 bis 23:00 mit einer zufälligen Verzögerung bis zu einer Stunde den Queue Worker für das Abarbeiten von 250 Jobs ausführen.
```bash
0 4,7,9,11,13,15,17,19,21,23 * * * sleep $((RANDOM \% 60))m ; /absolute/path/to/bin/php /absolute/path/to/artisan queue:work --max-jobs=250 --stop-when-empty --max-time=7200
```

Some files were not shown because too many files have changed in this diff Show More