- Crawl news from multi suppliers
SkyNews
: technology newshttp://feeds.skynews.com/feeds/rss/technology.xml
IT News
: it newshttps://www.itnews.com.au/RSS/rss.ashx"
- Store news into the
DISK
as rss file named witharticle.rss
- Also store news into the DB by using
MySQL
, about tables structure, refer to shadow-news-entity project - Multi suppliers crawlers can be executing by IDE or executing by shell scripts.
- This project is built base on
DDD-Architecture
andTDD programming
. Built by some layers like asDomain
,Infrastructure
,interface
,application
, andrepository
. - This project are using JPA and EclipseLink by
infrastructure layer
to access MySQL DB ROME
is used as a library for working withrss file
Java 8
is the main language level
If there are any problems, please feel free to contact to me or create new pull request
- Email: [email protected]
- git clone [email protected]:chariot9/shadow-news-crawler.git
Execute: mvn clean install
- Run main method in
NewsBootstrap
by adding parameters to main class like as/data/news/skynews 1 20170524 20170528
- About the meaning of each parameters:
args[0]
: folder to store news file in the the diskargs[1]
: SupplierID, forSkyNews
is 1 andITNews
is 2args[2]
: Date for getting news with published from itargs[3]
: Date for getting news with published to it
To set up all necessary environment and build jar file, run the following shell script in project folder:
./shell/release
To run the program, execute the following command:
cd /data/shell/shadow/execute && ./diamond_exe.sh
Created by Trung, Yokohama Japan 2017