|
6 days ago | |
---|---|---|
app | 11 months ago | |
bin | 12 months ago | |
cache | 8 months ago | |
config | 11 months ago | |
db | 11 months ago | |
lib | 2 months ago | |
log | 12 months ago | |
public | 12 months ago | |
storage | 12 months ago | |
tmp | 12 months ago | |
vendor | 12 months ago | |
.gitattributes | 12 months ago | |
.gitignore | 12 months ago | |
.ruby-version | 6 months ago | |
Dockerfile | 6 months ago | |
Gemfile | 3 months ago | |
Gemfile.lock | 3 months ago | |
Procfile | 6 months ago | |
README.md | 2 months ago | |
Rakefile | 12 months ago | |
app.json | 6 months ago | |
channel.apple.sh | 4 months ago | |
config.ru | 12 months ago | |
coursera.process.rb | 6 months ago | |
crunchyroll.process.rb | 5 months ago | |
daily.sh | 10 months ago | |
docker-compose.yml | 6 months ago | |
duckduckgo.process.rb | 5 months ago | |
etymonline.process.rb | 2 weeks ago | |
example.process.rb | 6 days ago | |
go | 11 months ago | |
irs.process.rb | 4 months ago | |
justia.process.rb | 2 weeks ago | |
mercury.disburse.rb | 5 months ago | |
municode.process.rb | 5 months ago | |
paylinks.absence.rb | 2 months ago | |
paylinks.hours.rb | 2 months ago | |
pluto.process.rb | 5 months ago | |
reddit.process.rb | 2 weeks ago | |
sec.process.rb | 5 months ago | |
sec.record.rb | 12 months ago | |
senate.recordings.rb | 5 months ago | |
translate-google.process.rb | 8 months ago | |
twitter.process.rb | 4 months ago | |
urbandictionary.process.rb | 2 months ago | |
uscode.process.rb | 2 months ago |
README.md
Reap -
Pulls many online sources,
and pays some bills.
Add Dependencies.
You need some online scraping engines:
geckodriver
, andyt-dlp
.
Grab a copy of each one, and place inside ~/bin
.
Run: echo 'export PATH=$PATH:~/bin' >> ~/.bashrc; . ~/.bashrc
;
so your shell realizes you added the programs.
In geckodriver
's case, you should extract the zipped file you pulled,
and place the unzipped binary inside ~/bin
.
Also make sure your machine has a copy of Firefox.
On Mac Homebrew:
brew install chromedriver geckodriver yt-dlp
You also need our code and some ruby programs. Begin by grabbing Ruby 3.1.2.
git clone git@base.assembled.app:code/reap
cd reap
gem install bundler
bundle install --with=pull
Run programs.
All sourced records are placed under ./cache/*/
MyPayLinks
Open a MyPayLinks calendar:
ruby paylinks.absence.rb
ruby paylinks.hours.rb
Record "leave" hours in MyPayLinks:
cat <<END >> .call
domain.user=MyNameHere
domain.passcode=P4ssC0de
END
ruby paylinks.absence.rb 1 V 1 4-8 11-13
... this records one hour of annual 'V'acation,
on this month's days #1, 4,5,6,7,8, 11,12,13
.
See paylinks.absence.rb
, REASONS
hash.
You can also,
ruby paylinks.hours.rb 1 1-5 8-12 15 17-19
1
means one hour applied per day1-5 8-12 15 17-19
are day ranges inside month.
Should you need changes in a prior month,
re-enable paylinks.hours.rb#48
.
Senate
Scrape senate floor proceedings
ruby senate.recordings.rb # or...
ruby senate.recordings.rb 2022
... check inside ./cache/senate
.
IRS
Pay your bills; especially prior years.
cat <<END >> .call
irs.address=your@email.here
irs.passcode=your_id.me_passcode
irs.routing=0011223344
irs.account=4433221100
irs.pay_by=2022-10-01
END
ruby irs.process.rb
Crunchyroll
Pull some anime!
ruby crunchyroll.process.rb
ruby crunchyroll.process.rb https://www.crunchyroll.com/bleach
ruby crunchyroll.process.rb bleach
Once you indexed, pull a season:
cd cache/crunchyroll/bleach/00-Bleach\ Season\ 1/
./_source.rb
DuckDuckGo
Run a search.
ruby duckduckgo.process.rb 8 Videos gecko reflow
Etymonline
Learn some old language. Requires around a day.
ruby etymonline.process.rb
Municode
Scrape all indexed US municipal codes. Requires many days.
ruby municode.process.rb
Mainly pulls images and camera recordings.
ruby reddit.process.rb ProgrammerHumor
SEC
Pull business records.
ruby sec.process.rb AAPL AMZN FB GOOGL MSFT TSLA TWTR
You can run more analysis, by requiring rails dependencies:
bundle install --with=rails
ruby sec.record.rb
Sources money analysis remarks and economic models.
echo "TWITTER_TOKEN=abc123" >> .call
ruby twitter.process.rb
Pluto.TV
Check in on cinema:
ruby pluto.process.rb
Rails dependencies: display your reaped records.
Run postgres
locally on your machine:
- Mac:
brew install postgresql; brew services start postgresql
. - Fedora:
sudo dnf install postgresql; sudo systemctl start postgresql
. - ...and so on,
and use:
bundle install --with=rails
rails db:create db:migrate db:seed
rails s
Roadmap:
Leadership and Spending.
- SEC
- https://www.fpds.gov
- https://www.usaspending.gov
- https://tpis1.trade.gov/cgi-bin/wtpis/prod/tpis.cgi
- https://comtrade.un.org
- https://www.census.gov/foreign-trade/reference/codes/index.html
- https://github.com/LibraryOfCongress/api.congress.gov
Geography and Mapping
- https://geocode.earth/data/whosonfirst/combined/
- https://www.census.gov/geographies/mapping-files.html
Online Meshes
traceroute
&& https://ipgeolocation.io
Online Shops.
- Concording policies
- https://www.adafruit.com/terms_of_service
- https://www.amazon.com/gp/help/customer/display.html?nodeId=508088&ref_=footer_cou
- https://www.westerndigital.com/legal/terms-of-use
- https://www.canakit.com/
- https://www.pishop.us/
- https://vilros.com/
- https://www.microcenter.com/site/customer-support/terms-conditions-site.aspx
- https://www.sparkfun.com/terms
- Nonconcording policies (scraping is illegal)
Legal and Prison Records.
- ://case.law
- International Criminal Court
- https://codelibrary.amlegal.com
- govinfo.gov
- https://fastcase.com/solutions/legal-data-api
- https://github.com/usgpo/api
- https://laws.africa/indigo
- https://senatecommitteehearings.com and https://github.com/leschonander/senatevideoscraper
- https://archive.org/download/state.regulations.bulk
- https://regulations.justia.com
- https://dreamproit/billtitles-py
region | address | hack | number_records_guess | guarded |
---|---|---|---|---|
Alabama | http://www.doc.state.al.us/InmateSearch | sql | 25060 | |
Arizona | https://corrections.az.gov/public-resources/inmate-datasearch | skim-6 | NULL | |
Arkansas | https://apps.ark.org/inmate_info/search.php | sql | 16341 | |
California | https://inmatelocator.cdcr.ca.gov/ | sql | 2000 | |
Colorado | http://www.doc.state.co.us/oss/index.php?ref=home | skim-abc | NULL | |
Connecticut | http://www.ctinmateinfo.state.ct.us/searchop.asp | sql | 14640 |
Business records.
region | address | hack | number_records_guess | guarded |
---|---|---|---|---|
Delaware | https://icis.corp.delaware.gov/eCorp/EntitySearch/NameSearch.aspx | skim-abc | NULL | recaptcha |
Language phrase books.
- https://etymonline.com
- https://en.wikivoyage.org/wiki/Chinese_phrasebook
- https://refugeephrasebook.de/refugee_phrasebook/
- https://www.paracrawl.eu/
- OPUS open language corpus.
...and similar sources in many languages.