基於社群平台內容 探討NFT價格影響因素
Exploring Factors Influencing NFT Prices Based on Social Media Content
Exploring Factors Influencing NFT Prices Based on Social Media Content
Enhancing Multimodal Graph Neural Networks with Music Understanding Models for Music Recommendation
LLM-Based Construction of Attack Life Cycles from CTI Reports
Disease Risk Prediction Based on Artificial Intelligence and Genomic Information - A Case Study of Systemic Lupus Erythematosus
A Case Study of Digital Transformation Strategies in Small and Medium-Sized Manufacturing Enterprises: The Case of Company Y
Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping
隨著大型語言模型 (LLM) 的發展,對網路資料需求越發增加,而資料產生的速度也隨著網路流量增長而增加。然而,在資料需求與供給爆炸式增長的趨勢下, 網路資料的爬取方式卻沒有顯著的進展。由於網站前後端分工架構興起,且前端網站程式碼越發複雜多變等原因,導致專家需為每個網站客製化爬蟲程式。 爲此本研究 Webscraper 嘗試結合多模態大語言模型 (MLLM) 的網頁瀏覽能力、工具使用能力及 LLM 的程式生成執行的能力等,讓 MLLM 能夠與網頁互動,自主決定爬取網頁時機並調用相關工具爬取網頁原始碼產生結構化資料以 及可重複利用的程式碼。我們發現許多網頁設計採用 index-content 模式,如新聞網站、購物網站、社群媒體、影音平台等,除資料具有極高附加價值,也顯示 index-content 模式被廣泛運用於網頁設計。Webscraper 使用 Anthropic 的桌面自動化代理框架 Computer use 作為瀏覽網頁模組,並開發網頁爬蟲工具供 Computer use 調用,透過五階段的流程提示詞爬取 index-content 類型的網頁。 實驗結果顯示,在新聞領域僅使用流程化的 Prompt 的 Webscraper 即能賦予 Computer use 較佳的爬蟲能力。此外,使用本實驗開發的工具能進一步提升爬蟲 的準確率。最後,我們將 Webscraper 運用於購物網站的爬蟲任務,實驗結果也顯示該架構不只針對新聞領域有效,以此驗證本架構的泛化能力。
Rental Car Relocation by Considering Carbon Emissions
Perceived believability of fake news on social media
A Deep Learning Model for TTP-based Threat Hunting on Windows Audit Logs
An integrated production-maintenance problem in a flow shop with endogenous yield rates
The Management Application of Healthcare Quality Education and Training for the Transparency of Healthcare Organization Accreditation
Critical Success Factors in Business Process Optimization: A Case Study of Company I
Financial Numerical Entity Understanding: Novel Tasks, Datasets, and Approaches
Exploring the Influence of Virtual Pets on Focus and Performance in Online Learning
Blockchain Business Model in Container Shipping: A Case Study of TradeLens
Improving Tacit Knowledge Transfer with Virtual Reality: An Experiment in Car Driving Training
An Adaptive Retrieval-Augmented Generation Method Based on Stacking Ensemble Learning
StreaMeme: Meme Category Recommendation of Livestreaming Using LLMs
Exploring Exercise Applications Using Dilatant Fluid with Daily Objects
An Adaptive RAG Framework using Candidate Pruning and Noise Injection
顯示第 1–20 筆,共 1400 筆