It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
avatar
orcishgamer: Sure, you can create a hash of each page, or if you want to only check for changes to content use a screen scraper and hash the result (which in practice very well may amount to the same thing, depending on how good the screen scraper is).

BTW, doing all of that in C++ is going to be a real pain as a beginner, even opening a socket is hard in C++. You might consider a language that's a little easier for that kind of thing if it's an option (Java is a good choice, C# should be a good choice as well).
avatar
jamyskis: I don't see why it should be that much of a pain in C++. You can use libcurl to fetch the page, create a base hash as you suggest, download the new page using libcurl, hash that and compare the two hashes without ever having to parse the markup.

Simples.
The hashes will tell you that there are differences, but you need to parse the markup and compare like for like to tell what the differences are.

Using even a well-designed and well-documented library like libcurl can be intimidating for a beginner, though it's definitely the best way to get the job done.
avatar
jamyskis: I don't see why it should be that much of a pain in C++. You can use libcurl to fetch the page, create a base hash as you suggest, download the new page using libcurl, hash that and compare the two hashes without ever having to parse the markup.

Simples.
avatar
cjrgreen: The hashes will tell you that there are differences, but you need to parse the markup and compare like for like to tell what the differences are.

Using even a well-designed and well-documented library like libcurl can be intimidating for a beginner, though it's definitely the best way to get the job done.
He didn't say telling what the differences are was part of the requirements, just spotting that there were differences. Now you could be right, that's what he/she intends, but that's not in the "spec" currently so to speak:)

jamyskis is right that there's plenty of C++ libraries to help out, it's still hideously complex compared to some other language libraries, as you say.
avatar
Nroug7: Hiya, im self-learning C++ at the moment and i was wondering, is it possible to create a program that can scan over websites details (Text, pargraphs, sentences etc) and look for updates to that website?
Almost forgot...

Once you feel you got a decent mastery of the basics (read an introduction level book, done some exercises, etc), I highly recommend that you read those books by Scott Meyers:

Effective C++
More Effective C++
Effective STL

They'll help you improve your programming form greatly.

Also, if you get stuck on something, this web site is a great source of information (it has so many user questions that it is almost an encyclopedia assuming that you formulate your question correctly):

www.stackoverflow.com

And finally, the library reference on this website is great:

http://www.cplusplus.com/

Good luck.

avatar
Nroug7: Hiya, im self-learning C++ at the moment and i was wondering, is it possible to create a program that can scan over websites details (Text, pargraphs, sentences etc) and look for updates to that website?
avatar
cjrgreen: If you must do something like that in C++, and you are only concerned with text that is directly visible in the page, you can use libcurl and htmlcxx to do the heavy lifting. You call functions in libcurl to retrieve the contents of the page, and functions in htmlcxx to parse the page into its elements. Both libraries are well documented, though in all I would suggest that this is not exactly a beginner's project.

CURL and libcurl: http://www.haxx.se
htmlcxx: http://htmlcxx.sourceforge.net

But if you are trying to learn Visual C++, you have to do things entirely differently. I will leave them for another post, but I will gently suggest that Visual C++ is not actually C++ and will not teach you how to write good, standard, or portable C++.
Not sure I'll ever end up using them for C++, but thanks for the reference.
Post edited February 09, 2012 by Magnitus
avatar
cjrgreen: The hashes will tell you that there are differences, but you need to parse the markup and compare like for like to tell what the differences are.

Using even a well-designed and well-documented library like libcurl can be intimidating for a beginner, though it's definitely the best way to get the job done.
avatar
orcishgamer: He didn't say telling what the differences are was part of the requirements, just spotting that there were differences. Now you could be right, that's what he/she intends, but that's not in the "spec" currently so to speak:)
At the risk of splitting hairs, the OP may not be aware of the importance of stating requirements exactly, and he wrote "scan over websites details (Text, pargraphs, sentences etc) and look for updates to that website".

It suggests that he intends to look for differences by examining the textual content of the website. This means you also have to parse the content into elements and compare like for like. It also means that he will not be fooled by non-text changes like HTML whitespace or comments.
avatar
orcishgamer: He didn't say telling what the differences are was part of the requirements, just spotting that there were differences. Now you could be right, that's what he/she intends, but that's not in the "spec" currently so to speak:)
avatar
cjrgreen: At the risk of splitting hairs, the OP may not be aware of the importance of stating requirements exactly, and he wrote "scan over websites details (Text, pargraphs, sentences etc) and look for updates to that website".

It suggests that he intends to look for differences by examining the textual content of the website. This means you also have to parse the content into elements and compare like for like. It also means that he will not be fooled by non-text changes like HTML whitespace or comments.
Maybe, I assumed different motives, like monitoring for updates to certain sections of a website (e.g. a GOG download page when new goodies are added) or something like that. Don't get me wrong, you could totally be right too, I just had to go off what the OP said and find it better not to assume he needs functionality when he actually doesn't (because we all know how that turns out in development, don't we:) ).
Basically, what im trying to do is make something like a program that could say, Scan over a website like www.skyrimnexus.com and then check for updates using the numerical value of an update. After that, i want it to continue to open a download prompt.

Anyways, the website i used as an example is not the website it will be used for.

The purpose of the program is to make it easier for people to manage their downloads and updates.

(and yes, im well aware of SDLC)
bump.
Why are you bumping this? You've had some quite comprehensive answers.

If you're doing this for a particular purpose, C++ is not the correct choice of language. It can be done far more easily in something like C#.

If you're doing this to learn coding in C++ it is a bad choice of exercise, and you should pick another.

If it's a mixture of the two, then you should do each part separately (i.e. learn C++ with a more appropriate exercise, and write this app in C#, Python, Ruby, etc.)
avatar
Nroug7: Basically, what im trying to do is make something like a program that could say, Scan over a website like www.skyrimnexus.com and then check for updates using the numerical value of an update. After that, i want it to continue to open a download prompt.
Sure, you can do that 'easily' with any language.

About the application design, you should make it possible so that users could add references to particular files (links, for example: "http://website.com/mod/871/" and that would mean that the user will be checking for updates to mod with id 871).

Then, create an event to check for 'updates', to get an actual HTML file of the website that relates to that mod (similar to when going to that website in a browser).

Then just 'open' the HTML file and scan its contents for a particular tag id (you'd like the one that shows you the version and compares it to the value that you keep next to the references.

Final, just create an event that would download the file if the version is newer, save it in a particular folder, and update the references' version.

There's your application design, do the code yourself. :p