Verified:

EDge Game profile

Member
81

Jun 20th 2021, 4:34:33

Good Luck EDge

EDge Game profile

Member
81

Apr 26th 2021, 18:25:09

Good luck EDge

EDge Game profile

Member
81

Apr 19th 2021, 16:48:10

This is still around?

EDge Game profile

Member
81

Aug 3rd 2020, 21:51:59

EDge 4 Prez

EDge Game profile

Member
81

Nov 1st 2019, 20:41:13

Any old school players still around?

EDge Game profile

Member
81

May 1st 2014, 21:15:26

Thanks for the ideas. Looking to build a script that can be run via cron on a regular basis grabbing new data. I tried using fiddler to inspect the requests while manually navigating the site without making much headway. Although tampermonkey won't work for this specific task, it looks pretty useful for other things, thanks!

EDge Game profile

Member
81

May 1st 2014, 18:15:06

I'm trying to build something that will crawl and scrape the files from https://marketplace.spp.org/...guest/binding-constraints

Haven't had much luck with the traditional methods so was hoping someone could help me out. Preferably in Python or C#, but a simple way to grab the urls for each file after you go through the folders would help too. It looks like it's using a filebrowser widget which makes the easy way of grabbing links a little tough.

Thanks!

EDge Game profile

Member
81

Aug 10th 2013, 15:12:43

Good luck EDge

EDge Game profile

Member
81

Jul 1st 2013, 19:11:25

No overland flood insurance available for residential households

EDge Game profile

Member
81

Jun 20th 2013, 18:18:08

Good Luck EDge!

EDge Game profile

Member
81

Jun 16th 2013, 20:05:57

Good Luck EDge!

EDge Game profile

Member
81

Jun 14th 2013, 20:51:38

Good Luck EDge!

EDge Game profile

Member
81

Jun 11th 2013, 0:26:53

I was planning on using the shared unlimited hosts, or the dreamhost VPS which is unlimited disk/bandwidth, to begin with just so I can gauge what kind of numbers I'll be looking at. After everything is up and running I should have enough usage data to look for other options that will provide better performance but allow for enough bandwidth/disk space.

In terms of making the system distributed, the current setup has a different console app/windows process for each logically different scrape.

I have worked with PHP, Java, C/C++, C#, VB, in the past but was planning to go with Python due to some libraries available for it which I plan to utilize for some of the modelling. However, if a specific language will be more efficient than another, I'm okay having the scrapes in one language and the models in another.

There will be maybe a dozen pages that are of the high frequency type; but if it does become a significant issue, I can just run something on one of my boxes at home to handle those. I was just hoping to have everything on a hosted server for security and accessibility.

Here's an example of something that I scrape:
www dot spp dot org/XML/LIP-Pricing dot xml

This is not one of the frequently updated sources though, this will only be revised every five minutes. And yes, I did replace . with dot lol. Just don't need this thread showing up on a google search result if a competitor searches the same url.

EDge Game profile

Member
81

Jun 10th 2013, 20:53:49

Thanks, think I'll just sign up for a dreamhost account.

Anyone have an idea for how I can get a scrape working that checks a site every few seconds for changes? Right now I have windows system processes take care of that, but need an alternative that will work on these linux servers. Having a cronjob that executes every few seconds doesn't seem optimal to me.

EDge Game profile

Member
81

Jun 10th 2013, 20:42:46

Originally posted by NukEvil:
Originally posted by Jayr:
Im lost...this means nothing to me



Good Luck EDge!


fixed

EDge Game profile

Member
81

Jun 10th 2013, 16:58:13

So looking over one of the tables for 2012, there are about 270M rows of data using 17GB of space in MS Sql Server. Based on that, I am starting to think I might be able to get away without an "unlimited" plan.

Xin, why is it that you guys are using Rackspace for production instead of dreamhost? Better performance? Reliability? Speed? Or do they support products that dreamhost doesnt?

EDge Game profile

Member
81

Jun 9th 2013, 22:46:08

Speeds pretty good with dreamhost azz kikr? How is their support team? Do you use them from apps, or plain html sites?

EDge Game profile

Member
81

Jun 9th 2013, 15:28:23

I'm looking at Dreamhost and hostgator right now. Either the shared hosting on either provider since they offer unlimited disk space and bandwidth, or the VPS on dreamhost since it allows unlimited bandwidth/diskspace.

Anyone have any experience with either provider, or see any issues with their plans vs what I'm planning to do?

The current servers both have 24 core amd opteron processors with 16gb of ram and raid harddrives. I know performance wise it will be a significant step down with these hosting plans, but I won't be doing nearly as much computing or queries off the webhost.

EDge Game profile

Member
81

Jun 9th 2013, 14:44:55

primary key will be the datetime int combo

EDge Game profile

Member
81

Jun 9th 2013, 14:44:26

data will only be inserted. majority of the tables will have 3 columns. Datetime, int, decimal

EDge Game profile

Member
81

Jun 9th 2013, 9:11:55

Also, not sure how the 5 second scrapes will work with crontab, or if there's a better approach to take for those.

EDge Game profile

Member
81

Jun 9th 2013, 9:10:29

The current setup that is working is with a separate SQL Server box, and an app server. The high frequency data is scraped using c# windows processes and the not as frequent data is scraped using c# console apps launched with windows task scheduler.

I would like to migrate this all to a remote host though and was thinking I could use mysql as the database, and python scripts for the scrapes executed by crontab. I am not all too familiar with any of this technology, but I figure basic data scrapes shouldn't require any real advanced programming in Python. The only thing I will need to become familiar with is db tuning the mysql database since it will have so much data.

EDge Game profile

Member
81

Jun 9th 2013, 9:03:02

Good Luck EDge!

EDge Game profile

Member
81

Jun 8th 2013, 23:05:47

So I'm looking at starting a little project. It is very data intensive as it involves scraping a lot of websites, downloading a bunch of csv's off SSL sites requiring client side certs, scraping and parsing various kml files and storing all of this data into a database.

There will be some data that is inserted just once, but majority of the data will be scraped frequently. Some change as often as every 5 seconds, majority change every 5 minutes.

I then have a few reports and models that will run based on this data.

My questions are:

1) Any recommendations for webhosts that I can use for this
2) Any recommendations for which framework/language/database that would be ideal for this
3) Anything you think I should know heading in to this

Keep in mind the volume of data isn't small. Initial load will include in excess of half a billion rows of data, with over half a mill rows added daily.

Thanks!

EDge Game profile

Member
81

Jun 7th 2013, 3:56:06

Good Luck EDge!

EDge Game profile

Member
81

May 23rd 2013, 5:07:04

Sodawater's a chill dude. Sucks this happened to him.

Good Luck EDge

EDge Game profile

Member
81

May 13th 2013, 14:06:24

Good Luck EDge!

EDge Game profile

Member
81

May 6th 2013, 15:29:41

Good Luck EDge!

EDge Game profile

Member
81

Apr 29th 2013, 3:54:32

Good Luck EDge!

EDge Game profile

Member
81

Apr 25th 2013, 18:30:44

Good Luck EDge!

EDge Game profile

Member
81

Apr 19th 2013, 0:36:01

I remember you!

EDge Game profile

Member
81

Apr 16th 2013, 4:37:03

pank... hahaha nice. I bet majority of the people here don't know the origin of the word

EDge Game profile

Member
81

Apr 14th 2013, 22:49:01

Good Luck EDge

EDge Game profile

Member
81

Apr 13th 2013, 2:38:13

Good Luck EDge

EDge Game profile

Member
81

Jan 27th 2013, 16:40:35

Suits
White Collar
Californication
Sons of Anarchy
Entourage
House of Lies
Game of Thrones
Walking Dead
The Newsroom
Falling Skies

EDge Game profile

Member
81

Dec 17th 2012, 18:45:26

Hate Skylar so much!

EDge Game profile

Member
81

Dec 8th 2012, 8:31:57

.

EDge Game profile

Member
81

Nov 19th 2012, 17:10:38

Good Luck EDge!

EDge Game profile

Member
81

Nov 9th 2012, 15:11:31

Good Luck EDge

EDge Game profile

Member
81

Oct 21st 2012, 15:01:48

Good Luck EDge

EDge Game profile

Member
81

Oct 13th 2012, 15:04:44

Good Luck EDge

EDge Game profile

Member
81

Oct 6th 2012, 7:53:03

Good Luck EDge

EDge Game profile

Member
81

Sep 25th 2012, 5:48:48

Good Luck EDge

EDge Game profile

Member
81

Sep 17th 2012, 22:41:12

Good Luck EDge

EDge Game profile

Member
81

Sep 11th 2012, 20:51:36

Good Luck EDge

EDge Game profile

Member
81

Sep 4th 2012, 19:04:42

Good Luck EDge

EDge Game profile

Member
81

Sep 1st 2012, 14:45:35

Good Luck EDge

EDge Game profile

Member
81

Aug 23rd 2012, 3:37:38

Good Luck EDge

EDge Game profile

Member
81

Aug 19th 2012, 3:45:12

Good Luck EDge

EDge Game profile

Member
81

Jul 13th 2012, 14:36:17

Is this a product you're developing? Are there any assets that you can secure the loan against? Which part of Canada are you from?