Yahoo Pipes

From: Mouse19 Sep 2009 11:17
To: ALL1 of 17
People of Teh.

I have been having a play with Yahoo Pipes. It's actually quite good I think. Basically takes RSS feeds and website feeds and shing, aggregates them and lets you mould them and spits it out into one big feed.

Now, those who I have pestered on MSN et al in the past will know I am to coding what NWA were to Police Christmas Parties.

So I have the following conundrum. I'm taking feeds from local venue's Myspace events listings and getting them all to combine into one listing. So far so good but I need to order it by event date, not publishing date for it to make sense.

So I have in the Description field (field? I need to learn the proper terms too...) this:

code:
<div class="vevent"><p class="description">Wednesday, Sep 16 2009<br /> <span class="location">Hot Springs, Arkansas</span><br /> 9:00 PM </p> </div>

 
If I order it by Description class (class?) then it of course groups all events on a Wednesday together and all groups on a Saturday together etc. So I need to get it to probably delete the days so that it will then order by description and put things in actual date order, but I don't know how :(
 
Someone fancy being patient and helping me?
 
 
EDITED: 19 Sep 2009 11:20 by MOUSE
From: koswix19 Sep 2009 11:33
To: Mouse 2 of 17
quote:
I am to coding what NWA were to Police Christmas Parties.


(giggle)
From: Peter (BOUGHTONP)19 Sep 2009 12:05
To: Mouse 3 of 17
Are all the RSS feeds in the same format? (i.e. do they all have that format of description?)

Can you provide sample feeds?
From: Peter (BOUGHTONP)19 Sep 2009 12:49
To: Mouse 5 of 17
Unless I'm missing something, those all have pubDate which have the correct date field?

In theory, you just get Yahoo Pipes to format that as yyyy-mm-dd HH:mm:ss and you're fine.

Except this interface really is incredibly shit, and I can't figure out how to tell it to do that. :/

Option two doesn't need date formatting, just extract the title attribute from the .dtstart tag in the description and you've got a suitably formatted timestamp.
Again, shit interface; can't figure out how to get that as a distinct field.

So, third option, apply a regex transform to the description and put the timestamp at the front.

To do that, add an Operators>Regex and replace item.description as follows:
replace =
code:
(.*?)(<abbr.*?class="dtstart".*?title="([^"]+)"[^>]*>)

with =
code:
<!-- $3 --> $1$2


That's not the most efficient way, and it's a little fragile if the original HTML changes too much, but it works.

Gah! No it doesn't - looking at the RSS feeds, they all seem to have the .dtstart information, but at some point it seems to be dissappearing. :/
EDITED: 19 Sep 2009 12:50 by BOUGHTONP
From: Mouse19 Sep 2009 12:59
To: Peter (BOUGHTONP) 6 of 17

Right, the PubDate is useless. That's the date/time that the event was added. I need to order it by the date which is in the Desciption bit. Which is the date of the event. But oh yeah, it's in that dtstart thing..

 

I'll have a proper check when I'm not on my phone, I think the second option might be a goer if I can furrow my brow deep enough to understand it. CHEERS PETER.

From: Peter (BOUGHTONP)19 Sep 2009 13:06
To: Mouse 7 of 17
Ok, think I got it.

You need to create a Loop item, and then drag a new Date Formatter into that, and use %F %T to get an appropriate format.

Like this...
Attachments:
From: Peter (BOUGHTONP)19 Sep 2009 13:13
To: Mouse 8 of 17
I'm confused. :S

Excluding broken timezone stuff, the pubDate seems to be matching the values in the description for all the ones I'm looking at?
Attachments:
From: Mouse19 Sep 2009 13:14
To: Peter (BOUGHTONP) 9 of 17
Ooo, sort of.. but the pubDate is of no use. I need it sorting by the info in

tag or the dt start one. This will the events in chronological order of when things are happening, not the prder they were published in which I'm not interested in.

From: Mouse19 Sep 2009 13:24
To: Peter (BOUGHTONP) 10 of 17
Sorry, let me look ay this when I'm not on my phone. Tomorrow aft maybe. Pub crawl time now.
From: Peter (BOUGHTONP)19 Sep 2009 13:30
To: Mouse 11 of 17
If you're certain you need the Description bit, you can extract and re-format it like this: (attached).

With the regex replace used being this:
code:
^<div .*?class="vevent"[^>]*>\s*<p .*?class="description"[^>]*>\s*\S+ (\S+) (\S+) (\S+)<br />.*$


Again, fragile if things change, but otherwise it works.
Attachments:
From: Peter (BOUGHTONP)19 Sep 2009 13:32
To: Mouse 12 of 17
Well Yahoo is crap and is killing the .dtstart part, which would be the best one to go for, but see above for extracting the p.description part.
From: Mouse22 Sep 2009 10:52
To: Peter (BOUGHTONP) 13 of 17

Ooo yay! Got it working now. I apologise, the pubdate it seems was the same as the event date. Which doesn't make any sense but does mean it will work.

 

Thanks Pete, you're a star.

EDITED: 22 Sep 2009 10:52 by MOUSE
From: Peter (BOUGHTONP)22 Sep 2009 11:14
To: Mouse 14 of 17
Hooray! *twinkles*
From: Mouse22 Sep 2009 11:26
To: Peter (BOUGHTONP) 15 of 17
Just got to figure out how to get the actual venue name in the listing, which is probably going to be hard as the actual venue name isn't in the feed's information...
From: Peter (BOUGHTONP)22 Sep 2009 12:12
To: Mouse 16 of 17
If you can link the venue to the feed, you can use one of the string bits to specify it manually, then instead of one feed block with multiple urls, create one feed for each url, add the venue, and use the union one to combine the results.

(or something like that - on my mobile at the moment, so can't check specifics)
From: Mouse22 Sep 2009 12:14
To: Peter (BOUGHTONP) 17 of 17
Ahhh, yeah. I think I know what you mean. I'll have a fiddle.