Archive

Archive for the ‘rss’ Category

rss aggregator

January 24, 2009 1 comment

From : jinhjok.blogspot.com/2007/06

  • aggrss – an rss aggregator which uses the lastRSS library to aggregate RSS feeds. Unix
  • AgileRss – desktop aggregator that is able to display any RSS, ATOM, and XML news feed. It also allows you to keep up-to-date with all your favourite sources of information that support the RSS protocol. Internet syndication and aggregation software, that runs on any java enabled operating system. Minimum hardware configuration is Intel Pentium II system with 128 MB of RAM and 100 MB free disk space. Sun Microsystems Java VM version 5.0 or later
  • aKregator – a news aggregator and feed reader for Linux/UNIX and the KDE desktop environment. It is meant to be fast, powerful, and easy to use
  • CaféRSS – a simple RSS aggregator that’s quite easy to integrate into your pages. It supports RSS 0.9, 0.91, 0.92, and 2.0 feeds
  • Feed on Feeds – server side personal RSS aggregator. Requires a server running modern versions of PHP and MySQL
  • FeedOnFeeds-Redux – a continuation of the popular, but stagnated FeedOnFeeds project. FoFRedux provides a simple, yet effective browser-based news aggregator
  • Feed Reader Professional – Reads news feeds – Reads weather feeds – Reads blog feeds – Reads content feeds – Reads industry feeds – Reads affiliate program feeds – Reads XML and RSS feeds – Simple to install, simple to use – Does not require MYSQL – Software includes professional output – Requires PHP on Linux. $99 US
  • FeedReactor – experimental feed aggregator in Python
  • FIRST SAY – a syndicated content aggregator / portal that uses XML/RSS syndication feeds from Weblogs and mainstream content providers
  • getrss – RSS news aggregator in Ruby
  • Infinite Penguins RSS viewer – The code consists of a single PHP source file for the frontend, and the Magpie RSS Library (versions 0.1, 0.3, 0.5 have been tested) for the backend
  • Jrssfeedreader – a light and fast Rss & Rdf news reader written in Java
  • K.R.S.S. – a Linux-based application that downloads Rich Site Summary feeds and displays them on your desktop, in HTML
  • Liferea – a simple FeedReader clone for Unix distributions with GNOME2. It is a news aggregator for RSS/RDF feeds which also supports CDF channels, Atom/Echo/PIE feeds and OCS or OPML directories
  • lilina – a simple but powerful news aggregator written in PHP. No database is needed, RSS/ATOM parsing is done by the excellent MagpieRSS library (it is included, no additional installation needed). It features feed auto-discovery and an easy-to-use interface
  • lylina – an RSS/Atom feed aggregator loosely based on it’s predecessor, lilina. Although it features a similar interface, making it an easy upgrade and quite familiar to lilina users, it improves upon the formula with the addition of a much faster MySQL backend and a much more streamlined cron-based update system. It also adds optional multi-user support, allowing for each user to login and see a customized feed page, filled with the content of choice
  • Metaplanet – a feed agregrator that shows the news of multiple sources in a unified web page. The main objetive is to serve web pages as fast as posible with a minimum load on the server. Requiriments: PHP 4.3 with XSLT support enabled. PHP command-line interpreter
  • My Newspaper – a personal RSS Aggregator and Reader written in Python with a bit of javascript and uses sqlite as permanent storage for the articles
  • MyHeadlines – personalized syndicated content. As a user you may subscribe to many news/content sources from the MyHeadlines database. For each of your subscriptions this site will gather the latest headlines/stories/content from the source and present a consolidated view of all of your interests in one location. Requires MySQL v3.23.23 or better. PHP 4.0.4 or better
  • NewsGoblin PHP RSS News Aggregator – This script will bring you a complete news site that is constantly updated via rss feeds. This script is completely customizable, has an adsense setting to allow you to make money showing news on your site. Hot news items based on high dollar Google Adwords. No database required
  • PenguinTV – a Python-based RSS reader specifically designed for downloading and viewing podcasts and video blog entries
  • Plagger – A pluggable RSS/Atom feed aggregator written in Perl. Everything is implemented as a small plugin so you can mash them up to build a new application to handle RSS/Atom feeds
  • Raggle – a console RSS aggregator, written in Ruby. Features include customizable keybindings, basic HTML rendering, HTTP proxy support, OPML import/export, themes, support for various versions of RSS, Screen support, browser auto-detection, and more. Raggle has been tested under Linux and OpenBSD, and should work properly under other Unix variants as well
  • renko – a simple enclosure-aware RSS aggregator/downloader
  • Rippy the Aggregator – Written in PHP (version requirement unknown, probably needs PHP4). Doesn’t require any compiled-in optional libraries that don’t ship with PHP. Stores its cached data in flat files, no database needed. Freely licensed and customizable under the GNU GPL version
  • Rnews – a server-side rss aggregator written in php with mysql
  • Rol – a simple application for reading RSS or RDF feeds such as those produced by many news sites or weblogs. It is not intended to do anything more than display the headlines and allow you to choose which to read in your web browser. Rol is written on Debian GNU/Linux, but it should work on any POSIX system
  • RSS-Planet – a custom marker file generator for xplanet which uses RSS feeds from news websites to plot the current headlines on a world map. By default, Yahoo! News and CNN.com are supported, but other feeds that point to articles with easily-discoverable place names (such as the Washington Post) should work as well. Python 2.3 or above, xplanet 1.0 or above required
  • RSS2Exchange – allows you to publish your very own industry specific news pages directly into MS Exchange Public Folders. For Windows and Linux
  • RSSFeedMagic – an rss/rdf web reader
  • RSStatic – takes information from the feeds you choose and generates static html pages for each item in the feed. This quickly turns a 10 page website into a much larger, more robust site complete with relevant content that continually grows. Requires PHP 4.3.10, and Apache 1.3.33
  • Snownews -a text mode RSS/RDF newsreader. It supports RSS 1.0 feeds that comply with the W3C RDF specification and also supports Radio Userland’s RSS 0.91 and 2.0 versions. The program depends on ncurses for the user interface and uses libxml2 for XML parsing. ncurses must be at least version 5.0. It should work with any version of libxml2. Runs on Linux, *BSD, OS X (Darwin), Solaris and probably many more Unices
  • sux0r – a Bayesian filtering RSS aggregator. Users classify news under different categories, and after gathering enough data, the computer will be able to automatically pick out interesting news
  • Syndigator – is an RSS reader/aggregator for Linux
  • TALAggregator – is a free multi-user, web-based RSS Aggregator. It is written in Python, uses MySQL for storage, ModPython for dynamic web pages, and SimpleTAL for HTML templates
  • Temboz – an RSS aggregator. It is inspired by FeedOnFeeds (web-based personal aggregator), Google News (two column layout) and TiVo (thumbs up and down). Temboz is written in Python, and leverages Mark Pilgrim’s Ultra-liberal feed parser, SQLite 2.x, Cheetah
  • TheYoke -an ultra-simple, polite RSS aggregrator designed for use on the UNIX command line
  • Tiny Tiny RSS – Server-side RSS feed aggregator written in PHP and heavily based on XmlHttpRequest and related technologies for user interface and operation
  • TV RSS – Gtk2-Perl Torrent RSS feed reader for linux
  • ZebraFeeds – a web-based news (RSS/ATOM) aggregator, based on zFeeder 1.6. Uses only flat-text-files, and works without SQL database. Requires PHP >= 4.2.0
  • zFeeder – a PHP script used to display RSS content on your webpages. It can be used to display other’s content on your site. (also known as aggregator). It parses RSS (or RDF or backend) files (xml files) and shows content formatted. It supports all versions of RSS (0.9, 0.9x, 1.0 and 2.0) and is template driven
  • Zort – a web-based RSS aggregator, based on MagpieRSS. Requires: A webserver (currently only tested on Apache 1), PHP (currently only tested on PHP 4.3.10), a web browser (currently only tested against Mozilla)
Tags:

RSS(XML) reading process

October 31, 2007 1 comment

$this->parser = xml_parser_create();
if(is_resource($this->parser)){
xml_set_object($this->parser, &$this);
xml_set_element_handler($this->parser, ‘feed_start_element’, ‘feed_end_element’);
xml_set_character_data_handler( $this->parser, ‘feed_cdata’ );
return true;
}
return false;

xml_parser_create
(PHP 3>= 3.0.6, PHP 4 , PHP 5)
xml_parser_create — Create an XML parser

Description

resource xml_parser_create ( [string encoding])

xml_parser_create()
creates a new XML parser and returns a resource handle referencing it to be used by the other XML functions.

xml_parser_create()는 파서를 생성하고,
관련된 xml 관련 함수의 리소스 핸들러를 사용.
이 옵션은 XML 입력시 파서의 문자 엔코딩을 알아내기 위한 옵션.
ISO-8859-1 기반인 UTF-8, US-ASCII 등을 사용할 수 있다.

The optional encoding specifies the character encoding of the XML input to be parsed.
Supported encodings are “ISO-8859-1″, which is also the default if no encoding is specified,
“UTF-8″ and “US-ASCII”.

is_resource
(PHP 4, PHP 5)
is_resource — Finds whether a variable is a resource

Description

bool is_resource ( mixed $var )
Finds whether the given variable is a resource.

is_resource()은
var 인자에 주어진 변수가 resource면 TRUE를,
아니라면 FALSE를 반환.

Parameters
var
The variable being evaluated.

Return Values
Returns TRUE if var is a resource, FALSE otherwise.

xml_set_object
(PHP 4 , PHP 5)
xml_set_object — Use XML Parser within an object

Description

void xml_set_object ( resource parser, object object)

This function allows to use parser inside object.
All callback functions could be set with xml_set_element_handler() etc and assumed to be methods of object.

xml_set_element_handler
(PHP 3>= 3.0.6, PHP 4 , PHP 5)
xml_set_element_handler — Set up start and end element handlers

Description

bool xml_set_element_handler
( resource parser, callback start_element_handler, callback end_element_handler)

Sets the element handler functions for the XML parser parser.
start_element_handler and end_element_handler are strings containing the names of functions that must exist when xml_parse() is called for parser.

The function named by start_element_handler must accept three parameters:
start_element_handler ( resource parser, string name, array attribs)

parser
The first parameter, parser, is a reference to the XML parser calling the handler.

name
The second parameter, name, contains the name of the element for which this handler is called.
If case-folding is in effect for this parser, the element name will be in uppercase letters.

attribs
The third parameter, attribs, contains an associative array with the element’s attributes (if any).
The keys of this array are the attribute names, the values are the attribute values.
Attribute names are case-folded on the same criteria as element names.
Attribute values are not case-folded.

The original order of the attributes can be retrieved by walking through attribs the normal way,
using each().
The first key in the array was the first attribute, and so on.

The function named by end_element_handler must accept two parameters:
end_element_handler ( resource parser, string name)

parser
The first parameter, parser, is a reference to the XML parser calling the handler.

name
The second parameter, name, contains the name of the element for which this handler is called.
If case-folding is in effect for this parser, the element name will be in uppercase letters.

If a handler function is set to an empty string, or FALSE, the handler in question is disabled.
TRUE is returned if the handlers are set up, FALSE if parser is not a parser.

xml_set_character_data_handler
(PHP 3>= 3.0.6, PHP 4 , PHP 5)
xml_set_character_data_handler — Set up character data handler

Description

bool xml_set_character_data_handler
( resource parser, callback handler)

Sets the character data handler function for the XML parser parser.
handler is a string containing the name of a function that must exist when xml_parse() is called for parser.

The function named by handler must accept two parameters:
handler ( resource parser, string data)

parser
The first parameter, parser, is a reference to the XML parser calling the handler.

data
The second parameter, data, contains the character data as a string.

If a handler function is set to an empty string, or FALSE, the handler in question is disabled.

Atom Syndication Format

September 15, 2007 Leave a comment

*Licence(from) : http://www.atomenabled.org/developers/syndication/

Elements of <feed>

Required feed elements

id
Identifies the feed using a universally unique and permanent URI. If you have a long-term, renewable lease on your Internet domain name, then you can feel free to use your website’s address.
<id>http://example.com/</id>

title
Contains a human readable title for the feed. Often the same as the title of the associated website. This value should not be blank.
<title>Example, Inc.</title>

updated
Indicates the last time the feed was modified in a significant way.
<updated>2003-12-13T18:30:02Z</updated>

Recommended feed elements

author
Names one author of the feed. A feed may have multiple author elements. A feed must contain at least one author element unless all of the entry elements contain at least one author element.
<author>
<name>John Doe</name>
<email>JohnDoe@example.com</email>
<uri>http://example.com/~johndoe</uri>
</author>

link
Identifies a related Web page. The type of relation is defined by the rel attribute. A feed is limited to one alternate per type and hreflang. A feed should contain a link back to the feed itself.
<link rel=”self” href=”/feed” />

Optional feed elements

category
Specifies a category that the feed belongs to. A feed may have multiple category elements.
<category term=”sports”/>

contributor
Names one contributor to the feed. An feed may have multiple contributor elements.
<contributor>
<name>Jane Doe</name>
</contributor>

generator
Identifies the software used to generate the feed, for debugging and other purposes. Both the uri and version attributes are optional.
<generator uri=”/myblog.php” version=”1.0″>
Example Toolkit
</generator>

icon
Identifies a small image which provides iconic visual identification for the feed. Icons should be square.
<icon>/icon.jpg</icon>

logo
Identifies a larger image which provides visual identification for the feed. Images should be twice as wide as they are tall.
<logo>/logo.jpg</logo>

rights
Conveys information about rights, e.g. copyrights, held in and over the feed.
<rights> © 2005 John Doe </rights>

subtitle
Contains a human-readable description or subtitle for the feed.
<subtitle>all your examples are belong to us</subtitle>


Elements of <entry>

Required Elements of <entry>

id
Identifies the entry using a universally unique and permanent URI. Suggestions on how to make a good id can be found here. Two entries in a feed can have the same value for id if they represent the same entry at different points in time.
<id>http://example.com/blog/1234</id>

title
Contains a human readable title for the entry. This value should not be blank.
<title>Atom-Powered Robots Run Amok</title>

updated
Indicates the last time the entry was modified in a significant way. This value need not change after a typo is fixed, only after a substantial modification. Generally, different entries in a feed will have different updated timestamps.
<updated>2003-12-13T18:30:02-05:00</updated>

Recommended elements of <entry>

author
Names one author of the entry. An entry may have multiple authors. An entry must contain at least one author element unless there is an author element in the enclosing feed, or there is an author element in the enclosed source element.
<author>
<name>John Doe</name>
</author>

content
Contains or links to the complete content of the entry. Content must be provided if there is no alternate link, and should be provided if there is no summary.
<content>complete story here</content>

link
Identifies a related Web page. The type of relation is defined by the rel attribute. An entry is limited to one alternate per type and hreflang. An entry must contain an alternate link if there is no content element.
<link rel=”alternate” href=”/blog/1234″/>

summary
Conveys a short summary, abstract, or excerpt of the entry. Summary should be provided if there either is no content provided for the entry, or that content is not inline (i.e., contains a src attribute), or if the content is encoded in base64.
<summary>Some text.</summary>

Optional elements of <entry>

category
Specifies a category that the entry belongs to. A entry may have multiple category elements.
<category term=”technology”/>

contributor
Names one contributor to the entry. An entry may have multiple contributor elements.
<contributor>
<name>Jane Doe</name>
</contributor>

published
Contains the time of the initial creation or first availability of the entry.
<published>2003-12-13T09:17:51-08:00</published>

source
If an entry is copied from one feed into another feed, then the source feed’s metadata (all child elements of feed other than the entry elements) should be preserved if the source feed contains any of the child elements author, contributor, rights, or category and those child elements are not present in the source entry.
<source>
<id>http://example.org/</id>
<title>Fourty-Two</title>
<updated>2003-12-13T18:30:02Z</updated>
<rights>© 2005 Example, Inc.</rights>
</source>

rights
Conveys information about rights, e.g. copyrights, held in and over the entry.
<rights type=”html”>
© 2005 John Doe
</rights>


Common Constructs

Category

<category> has one required attribute, term, and two optional attributes, scheme and label.
term identifies the category
scheme identifies the categorization scheme via a URI.
label provides a human-readable label for display

Content

<content> either contains, or links to, the complete content of the entry.

Link

<link> is patterned after html’s link element. It has one required attribute, href, and five optional attributes: rel, type, hreflang, title, and length.

href is the URI of the referenced resource (typically a Web page)

rel contains a single link relationship type. It can be a full URI, or one of the following predefined values (default=alternate):

  • alternate: an alternate representation of the entry or feed, for example a permalink to the html version of the entry, or the front page of the weblog.
  • enclosure: a related resource which is potentially large in size and might require special handling, for example an audio or video recording.
  • related: an document related to the entry or feed.
  • self: the feed itself.
  • via: the source of the information provided in the entry.

type indicates the media type of the resource.

hreflang indicates the language of the referenced resource.
title human readable information about the link, typically for display purposes.
length the length of the resource, in bytes.

Person

<author> and <contributor> describe a person, corporation, or similar entity.
It has one required element, name, and two optional elements: uri, email.
<name> conveys a human-readable name for the person.
<uri> contains a home page for the person.
<email> contains an email address for the person.

Text

<title>, <summary>, <content>, and <rights> contain human-readable text, usually in small quantities.
The type attribute determines how this information is encoded (default=”text”)

If type="text", then this element contains plain text with no entity escaped html.
<title type=”text”>AT&T bought by SBC!</title>

If type="html", then this element contains entity escaped html.
<title type=”html”>
AT&T bought <b>by SBC</b>!
</title>

If type="xhtml", then this element contains inline xhtml, wrapped in a div element.
<title type=”xhtml”>
<div xmlns=”http://www.w3.org/1999/xhtml”>
AT&T bought <b>by SBC</b>!
</div>
</title>

Tags: