All items from Forums Archives

ViewView items

Increasingly, the Web has become a vehicle for extending our human interactions to larger, more diverse and more global communities. Each of us seeks to expand our circle, to include more people and more points of view, to learn more, impart more and share more.

The obstacle, however, is clear to anyone who has ever participated in a conference call: interactions lose their humanity as distance and scale increase. We tend to fall back onto the factual, and we lose the emotional; the pursuit of clarity trumps nuance and humor; intentions become unclear and misunderstood.

The sheer scale of the Web makes this a bigger challenge every day. More user-generated content, more noise and, indeed, more dishonesty show their face as we expand our circles. How can we make the web a better vehicle for sharing, building and making our lives better?

At OpenAmplify, we believe a big part of the answer lies in understanding what's out there. It's not good enough to pull keywords and tags out of the content; classifying it into topical categories is little better. We need a more comprehensive web: one that operates on emotions, intentions, attitudes; one that comprehends more of what makes us tick.

OpenAmplify's vision is to be the platform for this deeper understanding of content. We will work to surface every last shred of meaning from content, and to return it to you in a clear, useful and scalable format, so that you can expand your circle in a human way. We'll make it easy to use and openly accessible. We'll never stop making it faster and better.

This is a central element in our world view. I'd like to invite the community to discuss it here.

Hi Mike,

You bring up some valid and interesting points here.  Let me throw out some thoughts to see where they go.

The precedence given to clarity over humor, nuance, etc. depends on the nature and importance of the interaction. 

Then the degree to which thinking about this precedence depends on the percieved degree of shared grammar,vocabulary (langage)  and cultural background within the interaction.  This suggests the "*** saw jane go up the hill" level of interaction will be higher at the start when people are searching for levels of comprehension in the group and should become more sophisticated as the group members learn.

I am a native English speaker, but have spent a lot of my life living and working in non-English speaking areas.  I have noticed the way I simplify my expression to make sure that the listeners understand my intented meaning.

I have also noticed a dismaying tendency of people (sometimes including me) to express what they can and not what they intend when they are speaking a foreign language, or when they don't share the vocabulary, grammar, and culture and are simplifying.

As I write, I wonder how this is related to the problems that arise with with email correspondence, the intention and meaning of which is said to be mis-interpreted by well over half of the recipients.

Gren.

 

Wow, it filterd out D I  CK as in  Di c k and Jane.  Terrible!!!!!!

Gren.

Hi Gren,

I agree, and thanks for finding this feature of our forums software.  We're going to see what we can do to turn that fairly annoying function off.  Perhaps we'll set a challenge at some point to integrate OpenAmplify with our platform to see if it can do a better job of distinguishing the different versions of the word.

Daniel

Hi Gren -

Totally agree that context (or objectives) will bias the importance of clarity over other aspects of the conversation. I also, by the way, spent a lot of my life in non-English speaking places. The problem of incomplete comprehension is not going to go away soon.

We must recognize the limits of machine intelligence in such settings. If a human being can't properly glean the real meaning of an exchange, a computer can't either. Recognizing that limitation, however, we *can* get a lot further than we have gotten, and perhaps reduce that high percentage of misinterpretation to which you refer to some more mangeable level.

Our approach, at least at this stage, will be to surface as many of the building blocks of the conversation, at an acceptable level of precision. Then, we can both improve that precision over time, and, as we do so, look for combinations and syntheses of the building blocks into bigger ones that deepen understanding. That's kind of how humans improve their communications skills, too. Over time, our approximation of human comprehension will improve.

 

Is anyone going to Participation Camp NY on June 27-28 in New York?

This another unconference in the open government/transparency area.  I went  to Gov2.0Camp in DC a few months ago, which was fantastic - hundreds of keenly interested government employees at a very senior level interacting with, teaching, and learning from the tech world.

What other events are there in this space coming up?

The Twitter Developers Nest ( @devnest) posed an interesting idea for new meetups and conferences.

So far I've seen a bunch of cool Twitter apps built on the OpenAmplify API.  I'd love to see further integration and wicked cool ideas come out of events like these.

Thoughts...?

Please read and follow the OpenAmplify Community DISCUSSION RULES before posting on our forums.

Thank you.

 

The OpenAmplify Community Team.

The Semantic Web is coming on strong and holds great promise for
linking data, improving search, and answering problems from all sorts
of domains. If that's the case, then why hasn't it seen universal
adoption at all levels? Richard Zlade argues on Semantic Universe that
usability is key.

What do you think?

 

I tend to agree. Having designed online tools for mass use, I've found that adoption frequently follows ease-of-use. The interesting question for me here is, who does the Semantic Web need to be easier for before we see it take off?  Is it Joe Browser or somewhere a little further up the publishing line? 

I guess if it were easy for bloggers and larger publishers, that would create more grist for the creaters of tools, portals, and search engines.  They're probably the ones to pass along any benefits to the end-users.

Anyone see it differently? Am I missing any important spots in the ecosystem?

I think part of the problem is that proper semantic markup is almost too difficult to be simple, and striking a good balance for users between ease-of-use and usefulness is tricky.

Say you have a blogging system where you want your users to tag each post with something simple like simple like category, should you give people a choice of 10 tags or categories to pick from, or let them enter any word or phrase they want? Choosing from a pre-defined set of options is often quicker for most users, and probably gives you a more concise, easy to use, dataset in the end. But how can you be sure you're picking the right set of categories, what if someone has a post that doesn't fall into any of your categories?

This is probably why tagging  became so popular in the first place; it's provides simple, flat categorization and you can enter anything you want. This probably gives you the "full picture" since people use different tags, thereby providing you with all possible ways of describing something, but it also causes problem when you want to try and use the data. What happens is you end up with a whole bunch of similar tags describing the same thing ("World war 2", "World War Two", "WW2", "WWII", etc.), and normalizing them can be tricky (what do you normalize to, what is the label?).  Probably this problem gets smaller the bigger your dataset is, especially if your system can start suggesting tags the user might want to use (best of both worlds!), but it's a problem that shouldn't be underestimated.

Agree it's a complex problem. Further complicated, I believe, by the fact that everyone has a different view of the universe. To me, a pitch is a sales argument; to you, it might be the throw of a baseball, or the angle of the blades on your ski boat's prop. We're both right. Dealing with multiple senses of the same word makes tagging difficult to consume on a machine basis. Representation is hard enough; interpretation adds a whole 'nuther dimension.

Not to disrespect usability, but I'd say a large part of it is the classic chicken/egg scenario; without existing tags on pages out there, there's no percentage in having tools that use them; without anyone using them, there's no reason to add them to pages... round and round.

Which is actually where OpenAmplify is potentially really useful, if we can lower the cost of adding the semantic content enough, we kickstart that process.

Analyzing social media means dealing with user-generated content (UGC)
which can be unstructured and varying in style, formality, and
degree of correct spelling. The textual content of social media is more interactive and opinionated than text in general. Therefore, new tools and resources may be required for automated or semi-automated analysis of such content. Does this mean that we have to develop new resources and technology from scratch, or can we re-use or adapt existing ones?

Is traditional content analysis a useful way of analyzing social media? OpenAmplify can easily solve the quantitative tasks of content analysis, and categorize huge amounts of text into different categories based on heaviest topics, expressed attitudes, intentions, etc.

In what other ways or aspects can the content of social media be analyzed?

Please use this thread to discuss the above, and also to post specific questions, problem areas and challenges posed by social media research.

I think there are some really interesting analyses to be done here.

Social media often use novel abbreviations and alternate spellings, so lexicons need to allow for that. That's pretty easy, though.

The trickier part is determining how social media text differs syntactically. I'd bet that some but not all function words are often left out; but perhaps that is only in certain contexts. "..." is very common, but just what can it substitute for? The whole sense of a text can be drastically change by one emoticon added to the end. And at least in some social media, a text may be divided across successive messages, with no machine-readable indication that the parts go together.

I don't think it's well understood yet, what the similarities and differences are between various kinds of social media. There's room for a lot of very interesting study here.

Steve

Another challenge lies in the simple fact that short posts, notably tweets, often have sentiment, etc., but no topical content. Example: "That's great! I love it." Given this, we are looking into producing document-level polarity scores, so that at least the overall polarity of the text can be discerned. Look for this in a future release.

I would be willing to bet that social media, at least as much if not often more than other traditional media, is laced with humor and sarcasm.  Also, conversations are not as often threaded which makes for a huge problem with co-reference.

Is it possible for OpenAmplify to include Location Information along with TopicIntention?. If yes, then we can easily tell what is the topic of discussion in a specifc place/city - Talk of the Town and of course using polarity we can tell the mood. Extending it further , it can play a big role in the Location Based Search which is the next big thing.

Any thoughts?

 

Is this a job for Named Entity Recognition? That is, are you suggesting that we associate locations with topics/actions, or that we identify certain topics are being locations or other entity types within the Topic structures?

Yes , I am suggesting link location with topic/action.And of course, NER can do the trick.

And the other one that came to my mind is , Why not use the location information of a article - Capture the Date and Location information which appears in the first line of any article . As we know , OpenAmplify can identify the location in an article, we can use that information to accurately  tell "What exactly is happening in that Location".

 

 

I spoke to a WHO analyst on a jet recently. He felt that if Doctors' reports from the field could be analyzed and commonalities and trends identified thereby, a lot of lives could be saved: early warning of epidemics, emerging humanitiarian and medical disasters, etc. I found this intriguing. Anyone with a background in this who'd care to comment?

I'm not a doctor, nor do play one on TV, but I do have some history writing medical record-keeping software. Some of the hurdles we encountered  (not insurmountable, but definitely there):

Almost every medical institution (hospital, practice, clinic, etc) uses different standards for their records; what needs to be recorded, what things are taken for granted, details of terminology, etc. In my prior job, we dealt with this issue by making our system extremely customizable. In terms of OpenAmplify, I think that would mean being able to vary some of our underlying ontology etc. according to what data stream we were parsing at the moment. The ongoing effort by the HL7 folks to standardize electronic patient data transmission might help us some too.

A lot of doctors are very resistant to doing anything with or on a computer; they tend to view it as clerical work which is beneath them. This has been changing slowly as more post-internet era medical students turn into doctors, but it's going to be while before the problem goes away completely. In the case of the sort of thing you're talking about this may not be a big issue, since we'd be analyzing existing reports, not asking people to create new ones.

 

During the course of the year there are several "pitch style" events where startups can attract real feedback and possible capital.  Each event differs on the level at which said company needs to be stage-wise, whether they just have an idea or have a functioning product.  

Yesterday, one of these events, TechCrunch 50 , was the focus of much media buzz.  Each startup was given time to pith and demo their product. Then, they were evaluated by a distinguished panel of succesful entrepreneurs and ventures capitalists.

The winning startup, RedBeacon, "allows consumers who need a service performed to find and interact with local businesses and professionals."

Go to TechCruch and via this link you will see all the TC50 realted posts and companies.

Which one interests you the most?

My favorite so far is Stribe, which "provides a free and easy way to place a social networking layer over any site."

 

 

 

Many ideas are often the result of improving upon others' good notions and thoughts.

New technology, the speed of development and the drop in price as it relates to these technologies (such as hosting) is alowing more and more "new" companies to come about.

Ideas that just weren't ready for the spotlight 3-10 years ago might be more successful if launched today.

DaveW:

The winning startup, RedBeacon, "allows consumers who need a service performed to find and interact with local businesses and professionals."

 

I love this .  As I am looking for a Real Estate Broker who can assist me in finding an Apartment , I was worried that I need to browse 100's of craiglist pages to find a deal but now I am saved.  Stick out tongue

DaveW:

 

My favorite so far is Stribe, which "provides a free and easy way to place a social networking layer over any site."

I really dont know whether it will be a Hit . The reason is the same question that was asked by judges - We already have so many Social Networking Sites (FB ,Twitter) why do anyone would be interested to create one more community. And the Response by the CEO was not satisfactory.

My Favourite is Clicker , TV Guide for Internet Age. I can’t wait to get my invite

 

I seem to be seeing a lot of companies here who are recycling pitches.  Does the apparent lack of novelty indicate an actual stagnation of ideas among startups or is it a fault of perception on my part.  Assuming the former would seem to indicate that there is a lot of opportunity for companies like these to raise the bar through effective and innovative use of services like the OpenAmplify API.

I came across this article

http://www.techcrunch.com/2009/06/30/digg-tries-to-bury-dupes-again/

and the easiest ways to solve is - Use OpenAmplify.  (Saying "Keep It Simple Stupid")  .

As openamplify can identify topics,actions,Guidance ,Polarity associated with the text , it should be easy to tell that  "Yes ,both url speak  on the same article". We can infact establish some metrics for the amplify results and tell how far the two article differ and conclude based on that.

I also understand that ,there will be some difficulties in predicting the exact match . Assume the tone/view/guidance/opinion of the author is different then you get different  OpenAmplify Results, which is understandable. Ignoring that , I believe OpenAmplify would be one of the Best possible option.

Your Opinions ??

This approach could be used on news sites -- most of whch have links back to prior articles on the same subject (usually created by hand, I'd bet), but none that go forward. Or, it might be useful to filter out the 200 slightly-rewritten versions of the original NYT story that inevitably show up when you search.

Interesting! I have also been thinking a bit about this lately, and I think that especially Actions can be really useful for this. After all, the "summary" of an article, at least news content, is often expressed using an action:

The president is giving a speech in Egypt.

If we find reoccurring actions, we should often find articles describing the same event.

Now  "Can we think the other way?" Find all article/pages/websites/ that talk about  similar topics. One obvious use would be in Twitter . ( courtesy @RamS)

Yup !!! , We can identify people with similar interest of yours and you can decide whether  to follow them or not. Rather than wasting time in following and then unfollowing ,we can have an application which takes  the tweets of the user , amps it ,compares with yours( yours tweets are also amped and you do have an amplify output). 

We built a initial prototype and the results were decent. The only problem was amplifying the tweets . We were amplifying the latest 25 tweets and analyzing it with the person you would like to follow. It also holds good at times , as it tells your Current interest (say it in a Techie/Funky way - Current Trend) .

A  FF Addon for this would be awesome, which takes a username and gives  you

  1. The Interest score 
  2. A button which has Follow/UnFollow label based on 1

 

 

Is there any place I can go to see how you are comparing items for similarity? I am developing an application that I am integrating with OpenAmplify to determine document similarity, above and beyond the phrase matching that I already use.

Thanks!

J.Ja

According to Nielsen, Americans have nearly tripled the amount of time they spend at social networking and blog sites such as Facebook and MySpace in the past 12 months.  In August 2009, 17 percent of all time spent on the Internet was at social networking sites, up from 6 percent in August 2008.

These figures have advertisers jumping. In fact Nielsen also reports that even in a recession the estimated online advertising spending on the top social network and blogging sites increased 119 percent in the same time period.

Now that the masses are socializing online, will advertisers be able to deliver on the promise to play the correct ads in front of the audiences they need to reach?  If so, what is a valid measurement of the effectiveness of the ads placed? 

Some are saying that the solution is in semantic advertising technology.   What do you think?

Even though the ad is placed at the right place for the right audience, how are we going to avoid  the click fraud. The following is a recent incident -dorm ring1, which shook everyone.

Hmm... maybe some kind of model similar to the original radio sponsorship thing, where shows frequently had the sponsor's name in the actual title.. "The Microsoft Software Quality Blog."

Sometimes you gotta look to the past to see the future...

While advertising on social networks has gotten more targeted, especially on Facebook, we as a technically literate audience have become more dulled to the visibility of said ads.  The more we use the internet, the more accustomed our eyes have become to shifting around to find the actual content we were looking for and also are subconsciously able to avoid the true ads and spam ads that share those pages as well.

I agree.... I don't even see the ads on Facebook anymore (much like banner ads from the days of yore). Interstitial ads work only if the promised content is really THAT compelling (this usually only works for video content, though). The online ad game continues...

I don't notice the ads much either. Although they try hard: FaceBook has figured out I'm single and puts up supposedly-single models to advertise eHarmony and the like....

Which is not to even mention ad blockers; they've gotten pretty effective. I don't think people's efforts to move print media or television's models of advertising over to the Internet will work well in the long run. I think that ultimately, Internet advertising is going to be much more like movies' product placement efforts.

I wonder how placement would work effectively on the "internet - scale"? 

PR companies constantly pitch bloggers to review items or to go to events where they will develop a relationship with products/services/clients.  This is not a great model. 

I wonder what model will come forward.

I agree with John... I think some permutation of the product-placement model will be the next avenue to be explored. We quickly condition ourselves to not see ads in their expected places (sidebars, header & footer banners, etc.), so you gotta get 'em where their eyeballs already are, ya know?

The XML structure returned by OpenAmplify should lend itself to using the service to determine how similar texts are. This could be used, just to name a couple of examples, in automated polling or in brand management.

This fascinates me. Anyone else see this as interesting? Someone care to try it?

"An easy way to start might be to calculate the difference in weight assigned to each topic found in both, and discount for the weights of topics that aren't found in both. Likewise for actions and intentions."

I tried this approach this weekend with the Demographics and Style sections, with good results. For my purposes, I did not need to be terribly exact, I just ensured that the "Name" on each sub-element was the same.

On my sample documents, I am going to roll up my sleeves and compare the topics, actions, and intentions, and see how I feel about comparing the numbers. For my needs, I am actually being pretty darned specific in what I am looking for... I am looking to see if the two items could be extremely similar in content without necessarily being similar in wording. So if they cover the same topics with roughly the same intentions and the same topics, I feel pretty confident that they may be rewordings of the same document.

J.Ja

Do you mean in terms of common topic, polarity, etc.? This is a great idea--document similarity is huge for Google, Yahoo, etc. of course, for search results. The usual approach is lexical similarity (common words or many words in a common semantic neighborhood). But effectively OpenAmp[lify does the same when it determines topic--and the addition of other metrics would be a bonus!

How does it calculate topic and intention similarity? I'm surprised by the zeros in Ram's example...

It should really be called "how many things from the first document appear in the second document" instead of similarity, because that's all it does :) And the topic intentions are buggy, I don't think it compares them all right now..have to take a look as soon as I find some time! The whole approach of this is really too simplistic, and it would be fun to take it a bit further and try clustering a bunch of documents using these features..

On the other end of the simplicity spectrum, here is a recent article on a document similarity measure. There are many different metrics used and some articles provide comparison statistics. 

http://web.jhu.edu/HLTCOE/Publications/acl08_elsayed_pairwise_sim.pdf

You should actually be seeing a series of articles coming out soon about this! In fact, I just figured out exactly how to do the comparison. Big Smile Or, I should say, I just remembered something I read a long time ago, meant to do something with, and just not realized that it is 100% applicable to the task at hand.

For my purposes, I am looking for things which are copies of documents above and beyond revisions, but including a full rewording. That's where OpenAmplify really comes into play. If I find a document that only has a few phrases in common with my source document, but OpenAmplify shows that they are nearly identical in topics, guidance, etc., then I can then do things like try to do an in-depth thesaurus comparison.

J.Ja

Hi Justin,

I'm not sure I follow your reasoning on weighting by depth. The domain nodes, for example, provide more granular information near the children; a matching topic with matching polarity seems like a stronger indicator of similarity than just a matching top-domain? But on the other hand, you can't do much by just matching the polarity nodes, they only make sense in the context of their topic, so I think just matching nodes and scoring based on depth won't do you you much good.

My guess is that the key is to find the right algorithm to compare each signal structure on it's own; for topics you could compare a complete topic subtree and award one point if the actual topic matches and bonus points for matching polarity, guidance, etc. 

Bu maybe I didn't completely understand where you were going?

 

janeS:

On the other end of the simplicity spectrum, here is a recent article on a document similarity measure. There are many different metrics used and some articles provide comparison statistics. 

http://web.jhu.edu/HLTCOE/Publications/acl08_elsayed_pairwise_sim.pdf

The algorithm is really simple and easy to understand as it just counts the number of occurrence of a word in the list of  inputted documents. But that doesn't solve the problem as we are really interested to know "How each of the document treat a Topic?" (i.e) Is X is spoken positively in both doc 1 and doc 2? etc..

 

Adam Svanberg:

My guess is that the key is to find the right algorithm to compare each
signal structure on it's own; for topics you could compare a complete
topic subtree and award one point if the actual topic matches and bonus
points for matching polarity, guidance, etc.

This would be more accurate than just comparing Topics as it  process the Topic based on the context in which it is spoken in each document.

 

 

 

Justin,

I think your basic approach of taking each node from the first document and then searching the second document for matches should work.  A couple of thoughts:

1) The comparison function for Document level scores (like Styles and Demographics) is different than the function for Feature level scores (like Topics, Actions and Topic Intentions).  For the Document level scores you want to see if the double in the matching <value> tag is within a certain "similarity threshold". 

2) For the feature level scores you need to match the string in the <name> tags, but because the score in the <value> tag is not normalized across documents you would need to select another heuristic to determine an additional weight to the co-occurrence of terms.

Adam -

Hmm. Part of the problem here is my poor memory! I last looked at OpenAmplify a number of months ago, and I am in the planning stages of my project, and I seem to recall there being a deep structure. I guess I was mistaken! I like your suggestion though, and I'll give it a whirl this weekend.

Thanks!

J.Ja

ramS -

Thanks! I will be sure to keep this in mind. I hope to do this implementation over the weekend.

J.Ja

I think we may have 2 threads going at the same time:

* One seems to be comparing two arbitrary documents -- sort of an XML diff -- I think that's what Justin's last post was referring to.

* The other is comparing the OpenAmplify analyses of two arbitrary documents. That's a much easier comparison, because you don't really have to think about the XML structure at all -- just compare some of the numbers it returns.

XML diff has a long history, and there are lots of known methods; Fabio Vitali published something on this recently. Comparing the OpenAmplify analyses has the advantage that you're actually comparing topics, goals, etc -- so it should have a decent chance of noticing similarity even if the whole piece has been re-written or re-phrased. I'd say that even a moderate similarity there is significant.

An easy way to start might be to calculate the difference in weight assigned to each topic found in both, and discount for the weights of topics that aren't found in both. Likewise for actions and intentions.

 

Just as a quick experiment I put a dead simple similarity tool here:

http://79.125.39.138:8888/

Nothing clever so far, just compares two URLs to see how many of the topics, actions and intentions the articles have in common.

Here's the output of Adam's similarity test on the Washington Post and New York Times articles above.

Topic similarity: 44.4%
  • president
  • Bernanke
  • crisis
  • Obama
Action similarity: 0%
Topic intention similarity: 0%

sderose:

How does it calculate topic and intention similarity? I'm surprised by the zeros in Ram's example...

I'm not too surprised by the zeros; wouldn't you expect intentions to be near zero for any news source?

Let us know how this turns out.  Sounds like a really interesting application of the OpenAmplify engine.  Tracking down copies and older versions of documents on a file/email/content server can result in huge reductions in storage and backup requirements.

mikepetit:

The XML structure returned by OpenAmplify should lend itself to using the service to determine how similar texts are. This could be used, just to name a couple of examples, in automated polling or in brand management.

This is definitely very interesting and could lead to new achievements in the field of IR. Even if there already exist useful document representations for some traditional similarity tasks, such as the vector space model, e.g. for finding duplicates / very similar documents in a document collection, there is a number of much more sophisticated and challenging tasks that OpenAmplify can facilitate.

Some examples:

  • Finding similar attitudes towards a certain topic - even in quite unrelated texts - "Which users in this forum like X but dislike Y?" "Find users who tend to express very negative attitude towards Z." (e.g. detecting hate or racism).
  • Matching documents to one another, where some are seeking advice about a topic, while the others are offering guidance about it.
  • Question Answering.

Is there anyone - e.g. a forum moderator - who has experienced such a need or would like to experiment a little on their document collection? This could be fun :)

 

Slightly related thread about using OpenAmpify to find duplicate news articles:

  http://community.openamplify.com/forums/p/305/460.aspx#460

I think actions (and topic intentions) could be an interesting variable in general document similarity as well.

Seems to me a basic module that traverses two XML input streams and returns a percentage match would be a great way to test this. Anybody want to try this? "Topical match: 69% Intentional match: 85%". Styles could, I believe, be safely ignored, as could demographics. This would actually permit us to compare the core message of the two texts without regard to differences in authors' styles and characteristics. The more I think about this, the more I like it. Just run it against TopicIntentions. I have a hunch it would prove very enlightening.

Volunteers?

I think my initial idea was something closer to Microsoft Single Instance Storage.

http://www.microsoft.com/downloads/details.aspx?FamilyID=99f8ee58-4faf-4951-ba84-7237b5c639b5&DisplayLang=en

I am trying to do something just like this in my application. My idea is to walk the tree of the source document, and for each node in it, find an identical node in the target document. In each pair of matching nodes, walk them and tally up the number of matching children, and provide a score for the found node. Then, add up each node's score in the document, weighted by depth so that nodes on near the root carry more weight than nodes amongst the children.

Thoughts?

J.Ja

Here is the OpenAmplify TopicIntentions analysis of the coverage of the President's announcement that he was nominating Ben Bernanke to a second term as Chairman of the Federal Reserve in the New York Times and the Washington Post.

New York Times

Main Topics and Intentions: Attitude: Intention: Guidance: Timescale:
1 Bernanke Offering
     <- communicate Future
     -> communicate Past
     <- assess Offering Recent Past
     <- choose Offering Past
     <- help Past
     <- want Offering Past
     <- socialize Past
2 Obama Offering
     -> choose Offering Past
     -> vote Future
     -> communicate Future
     -> request Offering Past
3 president Offering
     <- believe Past
     -> believe Offering Present
     -> vote Future
4 U.S.
     -> communicate Present
     <- compete Present
5 economy Offering
     <- help Offering Present
     <- socialize Past

The Washington Post

Main Topics and Intentions: Attitude: Intention: Guidance: Timescale:
1 Bernanke Offering
     <- create Past
     -> create Offering Future
     <- move Past
     -> move Present
     -> help Future
     <- pursuing Present
     <- request Past
2 Obama Offering
     -> communicate Present
     <- move Past
     -> move Past
     <- request Offering Past
     -> pursuing Present
     -> socialize Present
     -> vote Past
     <- admire Present
     <- say Offering Present
3 chairman Offering
     -> move Offering Present
4 Ben Bernanke
     <- communicate Present
     <- want Present
5 economy Offering
     <- help Future
     <- move Past

Predicting Stock Price Movement using human emotions? Sounds funny.  Does it really make sense to do it?

I think movement of stock prices is because of the human reaction [attitude,feeling(emotions) ]towards a certain event. If we can anticipate that we might hit the right spot. We are already predicting the mood of the text using Polarity and combine that with human emotions ,shouldn't  that solve our problem?. We also  have a thread running for emotions - http://community.openamplify.com/forums/t/303.aspx and on track.

The following site was the motivation for me to start this discussion:

BAM - The Behavioral Analysis Model predicts future price movements in human traded markets through the study of market participants’ emotional responses during periods of high emotion and “capitulation.” It uses fractal theory (and the Fibonacci sequence!) to predict emotional mood swings in the market.

You can follow them on twitter.

I think if you want the full aggregate picture, I'd go with the wire-taps... but then again, that's just me... Smile

This is interesting... I've always thought that emotion was the driving force behind economic surges and lapses. I'm sure it was mostly responsible for the real estate hysteria of a couple years ago... I know I got caught up in it myself.

The interesting question is what subset of the news media would you have to pay attention to in order to generate decent predictions? And the flip side: What portion of the financial community's communications are you not going to get short of bugging every office and cell tower in lower Manhattan?

You don't think those offices and cel towers are bugged already? ;)

SteveS:

You don't think those offices and cel towers are bugged already? ;)

Beside the point. The interesting question is whether they need to be for this purpose; can you do good enough just watching public sources, or do you really need the phone and office bugs.

Besides, the lease rates on a couple dozen high-grade Wall Street phone taps are... non-trivial.

I couldn't resist justifying my reaction to "Time
flies like an arrow, fruit flies like a banana" etc. as a bit too
well-known, and did a Google search for the whole phrase. To my surprise, the
originator of the quote is...Groucho Marx! It is in the Wikipedia entry for
syntactic ambiguity and comes up as a complete phrase in several pages of
Google results. So, maybe it has been heard by more than a few people.

 

How about "Time flies like a banana, fruit flies
like an arrow"? :-)

Since I was amused by all this, I looked more into Groucho's quips. I wonder if he was an amateur computational linguist? He certainly understood the biggest problems we face, things like PP attachment, etc. Amazing.

I'd buy that shirt... one size Large, please! :)

Ha! I knew there was a reason I loved it...Groucho rules. I have agreed to hold back on making a shirt out of it until we can amp it more accurately...after all, a "time fly" is rather uncommon.

Mike :-)

I can't help but notice that Groucho Marx was a master of using syntactic ambiguity:

"One morning I shot an elephant in my pyjamas.
How he got into my pyjamas I'll never know."
-- Groucho Marx

Lucky for us that we are not resolving PP-attachment (yet!).

HA! That’s wonderful…perhaps we should consider that first sentence for the T-shirts…

Mike :-)

 

> I can't help but notice that Groucho Marx was a master of using
> syntactic ambiguity:

They all were; that's what makes Groucho/Chico conversations marvelous.

Also impressive was Harpo's ability to do puns without speaking.

-John

> Ha! I knew there was a reason I loved it...Groucho rules. I have
> agreed to hold back on making a shirt out of it until we can amp it
> more accurately...after all, a "time fly" is rather uncommon.

How exactly would you do this? Try to incorporate a rule that "time" 
as an adjective is rarer than "time" as a noun? Then you'd get "time card" etc. wrong. Maybe "time" can't modify animate things? Then mark all words in the lexicon as "+/-animate"... and hope that "time" never actually does modify anything animate. And this rule would take care of only one word, time...

Nancy

PS By the way, there is a third parse of the famous sentence: "Herd flies like mosquitos". ("Herd" as a verb with implicit "you" as
subject.)

How about anagrams of OpenAmplify:

A few selected ones….

Famine Polyp
Amplify Nope
Amplify Peon
Foamy Nipple
Leaf Imp Pony
Flea Imp Pony
Piny Flea Mop
Leafy Pin Mop
Pony Lip Fame
Mealy Fin Pop
Ape Film Pony
Flay Pine Mop
Flay One Pimp
Flay Open Imp
Flay Imp Peon
Foam Pine Ply
Foamy Pen Lip
Manly Pie Fop
Pal Fine Mopy
Play Fine Mop
Lay Fine Pomp
Many File Pop
Many Life Pop
Many Pie Flop
Mayo Pen Flip
Map File Pony
Map Pony Life
Map Fine Ploy
Amp File Pony
Amp Life Pony
Fine Ploy Amp
Yelp Amp Info
May Open Flip
Any File Pomp
Any Life Pomp
Any Poem Flip
Any Pope Film
Pay Elfin Mop
Pay Felon Imp
Open Film Pay
Pay Peon Film

Someone pointed me to http://www.nickburcher.com/2009/06/search-engines-show-wrong-michael.html, which discusses how most major search engines utterly failed recently on searches for "Michael Jackson died," returning information about other people than the singer.

This is pretty understandable: it takes a while for search engines to crawl the news sites (though I presume they crawl major news sites far more often than it crawls pages in general, and probably grab some RSS news feeds in real time). But if you've got an established set of links to discussions of other Michael Jacksons who've died, how do you know which to return? Pages recently updated probably should be given greater relevance in general; but how much?

Wikipedia has a nice solution within its scope: if there are multiple entries with the same name, you get a disambiguation page first, with a brief description, and you get to pick, in effect: "Did you mean the singer or the writer?" But with common names this won't scale; Wikipedia only has about 250,000 entries for individual people; if they decided (for whatever reason) to try to cover a lot more less famous people, name-collision would rise quickly.

This is part of the problem known as "named entity recognition". First, you have to recognize that "Michael Jackson" is a person's name (and most likely male); and second, that the following "Michael" or "he" also refers to him. But it's quite another, also hard, problem to decide which real-world individual is meant, and then to return a meaningful identifier for the individual.

 

 

I blogged about parsing Michael Jackson's obituaries in "Pretzel Logic".

http://community.openamplify.com/blogs/devteam/archive/2009/06/26/parsin...

One more pointer as well :

 

Check the comment part in this blog :

http://community.openamplify.com/blogs/devteam/archive/2009/06/25/Parsin...

 

I'd like to kick off the discussion by posing a question:

What are the key requirements/characteristics for a corpus that helps develop and evaluate measures for specific text characteristics?

As most readers probably know, Amplify produces measures of very specific text characteristics rather than trying to discover the entire semantics. What kinds of corpora are most effective for this? For example:

* What are the genres/styles of most interest to you?

* What size texts are most useful?

* How should texts be selected?

* Which existing corpora seem most relevant for theses kinds of measures?

* What metadata should be attached to samples?

Short texts are much harder than longer ones in some ways, since sometimes you don't have enough information to determine things like topic or the correct sense of a word or term. At the same time they can be a lot easier because there is only one point to the text--so, for example, it should be pretty straightforward to see if someone is asking for guidance in a 1 or 2 sentence message.

I find the issue of text size to be especially intriguing these days. The advent of micromessaging in such forms as Twitter forces us to try to discover meaning in very small packages. Yet, as a human being, I can still discern a lot from 140 characters. In a way, our auccess in closing the gap between human and machine understanding could have no better benchmark than short texts.

Short texts are definitely a hot topic. They differ in so many ways: spelling, punctuation, word choice, syntax, how pronouns are used,...

I wonder how rapidly the characteristics of Tweets change; is the syntax a fairly predictable "adjustment" from normal syntax, or is it being invented continually in new and varying ways (vocabulary surely changes that way, but syntax? Maybe).

Regular English only has about 1.3 bits per letter of "real" information (that why, for example, you can compress general text down to about that size, and no further). I'd bet the information rate is far higher for Tweets, but I haven't heard of anyone checking. Come to think of it, I'll go check now......

 

Some time ago I saw a proposal for a study of online conversations, involving discourse analytic methods to look at the way social "position" evolves. I seem to recall that the work involved looking for evidence that certain people would become "leaders" and others would become "followers"--in particular, they were looking at the possibility that you could distinguish leaders by their tendency to introduce new topics, new terminology, even new abbreviations, etc. that the followers would then adopt.

It would be interesting to consider this kind of thing as a means to identify bullies and predators that can become problematic on some lists, etc. Is anyone aware of any work on something like this? 

The most direct way to identify roles in multi-party communication is network analysis: essentially, who talks to whom. Penny Eckert used it to identify the paths of linguistic innovation in a suburban Detroit high school. The analysis identified two groups--jocks and burnouts. Burnouts were more oriented to the inner-city African-American community, and imported new words and phrases from there to the 'burbs. If the innovation was cool enough (i.e., was used often enough and by the coolest burnouts--particularly those with the most communicative links to the jocks), the jocks might adopt the innovation.

Twitterers have follower stats; do bloggers? If so, one might use those numbers to identify leaders within some fora and then use OpenAmplify to build a 'profile' of the stylistic characteristics most representative of leadership style.

Perhaps the Guidance signal provides a decent starting point for this? Those who offer a lot of guidance could be leaders; those who request a lot are perhaps followers (although asking advice is a form of leadership). In combination with, say, topical diversity acrossa given author's posts, might we infer leadership?

If I may partially digress: one aspect of discourse analytics that I find intriguing is the concept of time. OpenAmplify currently views the content to be analyzed as a point in time. But flow over time is such an important element of our understanding. How can we introduce time into our signals, or, indeed, derive new signals from that introduction? The detection of predators is definitely time-based. The insidious nature of their method is to some degree progressive: establish rapport; identify a common gripe; make a subtle call to action; reinforce and close. Am I making sense?

Time. In discussion forums, blogs, and such, time tells a lot. Someone who's posting at all hours of the day and night, or who replies to things very quickly after they're posted, is obviously really active, and that suggests leadership potential.

I remember some psych course in college, where they mentioned that one of the most effective ways to predict who would emerge as leaders in new groups, was absurdly simple: they talk more. That's really easy to measure in a forum (a couple technical forums I've been in, even sent out yearly summary of who posted how often).

I wonder if for flamers, predators, and/or other categories, their language tends to be stereotyped? Perhaps analyzing their conversations might show some clear markers, such as repetitive vocabulary, sentence length and type, etc. used at various stages,

Of course one difficulty here is the same as with profiling terrorists, making bomb-sniffing devices, putting face-recognition cameras in public places, and so on: mistakes in either direction can be very costly: "But the camera said he looked like bin Laden, so I shot him." versus "Oh, sorry, we didn't notice *that* bomb."

Still, used judiciously some measures like that could be really valuable.

Please read and follow the OpenAmplify Community DISCUSSION RULES before posting on our forums.

Thank you...

 

The OpenAmplify Community Team

Calais are concentrating upon what we refer to as "izza's": New York izza State, Microsoft izza company, etc. This is an essential element of effective Semantic Web tagging, and is known, in the business, as "Named Entity Recognition." After all, if you can't tell that Proctor and Gamble is a company, not just two guys, you don't have a very valid place to start understanding the content that refers to P&G (it's nice, also, to know that P&G is Proctor and Gamble: easy for a human, tough for a computer.)

The release, in the latest version of the OpenCalais service, of so-called "Social Tags", is a solid step forward. They're employing a taxonomical approach that permits the creation of "meaningful" classifications of named entities, such as "sports cars", which publishers will find both relevant and monetizable. Bravo!

To me, this represents a solid step forward because it recognizes that classification schemes must support the pragmatic goals of those who apply them. This resonates strongly with the OpenAmplify view that forward motion in the Semantic Web must come through real-world applications.

I like what these folks are doing.

Could be true but base on some essay writing service we can move towards the opposite ideology of something that's not behind the fact of letting it be done for something that is not.

Hi James - not sure I understand. Could you explain your point of view more fully on that?

... considering the description given at http://www.crunchbase.com/company/wavii it would be quite obvious that Wavii might be pioneering practical web3.0. Don't you think so?

Is Openamplify involved? :)

Just guessing since I did not fid any comments on Wavii (the obvious soon to be web3.0 breakthrough) at OA's forums.

Alas, I wonder if the lack of immediacy implied by replacing direct statements with citations might not hinder adoption of such an application. Despite the lack of economy inherent in repetition, it requires less effort for both the author and the reader. And, of course, the use of citations prevents people from posing as authorities themselves, which certain types of people will consider disadvantageous ;-)

I guess it would be interesting to consider what the user profile would be for such an app...

I see  your point and I would agree entirely, if not convinced that good and bad comes from context.

UX is crucial, it's all about how something is presented. It is true that I would find myself crippled, only able to cite others in order to express myself. But that would not be the case. I can still express myself uniquely and in those cases the application finds that my intentions are not unique, ie. someone else already had this idea, I get suggested to cite or reference those other persons.

This would be done in such a way that I find my article enforced by references, citations and even statistics in case I am one of thousands who have the same thought.

Ie. writing something about a politician P that should do X, I could use:

1. a statistic saying that 30% of social flows believe X is good for environment.

2. a citation from P saying that environment will be top priority after health care.

So what just happened?

a. I saved myself a lot of writing in a convincing way, since these are "hard facts".

b. I get "assisted empirical argumentation" and can focus more on emotional penetration.

c. I just clustered/associated 30% of the social flows about X with one or more articles of P. This is net reduction of entropy(adding structure to data) by adding new data.

The UX should be that:

1. I judge for myself what I need to express in my own words.

2. I continuosly get assistance as I write my text. The application saves time and enforces my text empirically.

3. Eventually I will be rewarded (here's the prestige and authority) by being acknowledged to be a person that has an overall view of the "world" and seeing the relations between events, opinions and the overall "world order". I.e. a manager lets other people do the hard work, finding out things, then he/she uses those findings to draw conclusions, make decisions and insights.

The last point (3) would become more obvious when the added structure comes to use. My name would be rated higher the more useful structure(insights) I have created.

And lastly... I tend to write quite lengthy messages. With this message, as all other messages I write, I have already tried to reduce the length a lot. I could for instance write much more to explain exactly what I mean. But to keep it "short", I try to write what I think will make the reader extrapolate to what is unsaid. However this often fails :). Also, I have not said anything about how to technically achieve this application/concept. With such an application that I describe here, I would have been able to tell much more, since someone else has probably said it before me and also because I have previously written vaguely about this elsewhere.

HA! Here's what happened to threewords.me, just a little while ago: http://mashable.com/2011/01/19/threewords-me-acquired-by-domain-name-czar/

Hmm...so, tell me: what *is* the next, big social media news/search/hybrid/whatever we need out there? There's so much value inherent in all this social media content, but harvesting the value still seems to be held up by the sheer volume of it all, the poor signal to noise ratio, etc. What's the app we all need? Let's figure it out, and build it ;-)

I've been thinking about an application concept for about 2 years by now. It addresses the signal to noise ratio that you mention.

I believe that we need a conept of applications that reuses information rather than creating new information. The reuse of information shall also contribute to a reduction of entropy. So, when you want to express something you should express it mainly with help of citations rather than rewrite it.

Ok, that sounds like a retweet, right? But a retweet does not add anything. So with this concept you will be able to add short information in a way that clusters information.

The result is that you have for instance drawn a conclusion from 4 pieces of information by adding a few sentences which explain how they relate.

Correction: Retweet might add information but it does not contribute to entropy reduction..

Hi Arnold -

Good to hear from you!

Interesting company :-) What makes you believe they will have such an impact?

Although I can't ever comment on relationships, real or hypothetical, that are not publicly announced, I can't resist noting that there are thousands of API key holders out there. In most cases, we have no idea who they are or what they're doing with OA...so who knows?

Hi Mike!

Ok, I am not convinced they will succeed, but everyones' expectations seem to be skyhigh and "they've already suceeded", that's the reason I use the word "obvious" (slightly sarcastically).

However, considering the description of what wavii produces which would be, ie. an automated wikipedia/meme tracker/social news hybride... in my ears, that sounds like dawn of web3.0... given that it succeeds of course.

Personally, I think that what I have read about Wavii sounds too complicated. Complicated stuff has great usage depth but no one cares to use it. UX is crucial. On the contrary we have sites like threewordsme.com which has high UX but usage is shallow/pointless giving high impact and short life.

Only an Atlantic Ocean to cross for me, but wouldn't this be an interesting symposium for us?

Sentiment Analysis Symposium, April 13 in New York.

Set Grimes: "The symposium is designed to bring
together experts, business executives, and IT practitioners --
experienced and new users and those evaluating solutions -- in a
single, unique event."

http://intelligent-enterprise.informationweek.com/blog/archives/2010/01/sentiment_analy.html;jsessionid=XFH1EMABO3URBQE1GHPCKHWATMY32JVN

Here goes one more Sentiment based application ( We already  have http://tweetsentiments.com/ which uses openamplfy)  TweetFeel,  which evaluates real-time tweets about whatever search term the user has
entered for positive and negative feelings, presumably taking into
account words like ‘good’, ’sucks’, ‘great’, ’screw’, ‘love’ and whatnot

TweetFeel
is a new web service by marketing research startup Conversition
Strategies that combines real-time search for Twitter with sentiment
detection algorithms.

You can also  just enter your name and see what people really think about you. In real-time.

url: http://tweetfeel.com

twitter id :http://twitter.com/tweetfeeldotcom

 

How good are the results?

Resulst are pretty OK. They don't have  a Neutral Signal and hence they add up to Positive.For example , I saw one of  the tweets

" MJ vs. Madonna? oh that is too hard. mj ftw?? maybe "

TweetFeel show it as Positive Signal whereas Tweetsentiments (which uses OpenAmplify)  shows MJ and Madonna as Neutral.

You can check the following URL for results

http://tweetsentiments.com/analyze?q=MJ+vs.+Madonna%3F+oh+that+is+too+ha...

 

Their results are so good that either they have a breakthrough technology or it's done manually:-)  The volume of tweets they process seemed to be very low and mostly outdated - not as realtime as they claim to be.

I appreciate that in the previous comment TipTop is mentioned as a new semantic real-time search engine.  We launched the beta version of this service recently at http://FeelTipTop.com  While we are far from perfect in terms of both the functionality and the interface, I would strongly encourage this community to take a look at this engine closely.  Our approach towards solving this problem is entirely different from anything that has been tried so far.  You can look at write-ups about us on various sites including our blog and at http://www.altsearchengines.com/2009/08/31/search-engine-tiptop-announces-new-features/  I am happy to chat with any of you, if you are interested in finding out more.  Thanks.

I'd like to pick up the discussion around something that came up in this thread earlier:

"RIP,
R.I.P, or 'rest in peace': How should that be analyzed? How would a
human analyze it? It is of course a negative thing that a person has
died, but rest in peace is a nice thing you say.

So, a simple short tweet like: "RIP Patrick Swayze", should that be analyzed as expressing positive or negative sentiment?
What do you think?"

 

Twitter is absolutely flooded again today of RIP messages, this time dedicated to the young actress Brittany Murphy. I am considering today - again - to add support for this kind of context in our Sentiment engine. However, opinions here in the office are quite diverse, regarding the flavor of this expression.. 

What would you say?

RIP Brittany Murphy = Polarity: negative or positive?

Personally I am inclined to give it a score for negativity.

RIP Brittany Murphy = Polarity: negative or positive?

I feel like that statement is positive, but that is just my opinion.

I took a moment to test this application this morning. I run a whole bunch of different queries, and just as Balaji says, they have no Neutrals. When I run the same queries through TweetSentiment, I get a lot of Neutrals, so we know they exist.

I am thinking, maybe they just output the stuff they feel secure about: Could it be that the only tweets that are presented are the ones that are definite matches of rules or wordlists for positive/negative? Not that they are right every time: The tweets below come out negative:

"I am organizing for health care. http://bit.ly/kJ0Gk #OFA Read this to know what's up with Health care reform."

"What's going on with Health care reform? Read today's health news here! http://bit.ly/Vh8St"

TweetFeel was one of the companies mentioned in this morning's NY Times article on Sentiment Analysis.

http://www.nytimes.com/2009/08/24/technology/internet/24emotion.html

Of course I will keep an eye on TipTop!
And I wish you all the best and good luck!

 

Thanks for your messages, Alexandra.  Your points are all excellent.  I will review the links you sent.  It will be great if you'd continue to visit TipTop (http://FeelTipTop.com) from time to time to see how well we are doing.  More thoughts on how we can do even better will also be much appreciated.  Also, please feel free to reach out to me anytime by e-mail: shyam at tiptopbest.com

Hi Steve, that was the issue here:

Currently, when spelled out: Rest in peace Brittany Murphy, it is assigned a positive sentiment (polarity score), from the word "peace".

Missing in our resources were variants like RIP, R.I.P. and R.I.P, now added, with positive sentiment.

 

I can still see use for negative sentiment analysis in cases like these: if you get out on twitter and see RIP-messages all over, it's clear something tragic has happened. What do you think?

Nice to hear from you again, shyamkapur!

We have quite busy days ourselves, but I'll be happy to take a look at your new interface, as soon as I get the time. Thanks for reminding me, it's always nice to get some new views and ideas from other actors. Hope your interface is a success,

best, Alexandra

 

Few more to the list but little organized

  1. RankSpeed -Search by sentiments.
  2. http://feeltiptop.com

RankSpeed is a search tool that does a sentiment analysis on the
blogosphere / twittersphere to find the best websites, the most useful
web apps, the most secure web services, etc.

I searched for NewsPaper, and i got Forbes and CNBC as top Result

Feeltitop wasn't that great and the UI was horrible



 

I also haven't been seeing results from TweetFeel for my username or any terms I've been trying out.  I wonder if they are still working out the kinks.

It appears to work on trends.  So unless someone has used #weinberg81 recently, a search for your username will inevitably fail.  Try other trend topics like "Obama" or "bicycling."

shyamkapur:
This is an extremely hard problem and no existing solution comes anywhere close to perfection.  I would argue that even humans would not agree in many cases on what should be positive or negative.

Yesterday Steve blogged about a new publication: Detecting Sadness in 140 Characters: Sentiment Analysis and Mourning Michael Jackson on Twitter, and I quote:

"With 6 raters rating whether these tweets expressed sadness (all
contained the word 'sad'), they have a Fleiss kappa inter-rate
agreement of only 0.561. Pretty low for what would seem like a really
easy task."

Well... as stated before: sentiment analysis maybe isn't such an easy task after all.

 

Thanks, Alexandra.  It has been a busy couple of months at TipTop as we continue to innovate in a number of directions.  Please come and check out our new interface soon.  We have recently added a real-time, semantic, social shopping experience which you must also check out.  Please send me your feedback.  Thanks.

Pretty cool! Unfortunately, I'm not popular enough to garner any feedback on my own name... however, I CAN follow what people think about Silversun Pickups (an incredibly AWESOME band)! Thanks for sharing this, Balaji!

Hi shyamkapur, and thanks for your input!

I played around with some of these tools again today. It is quite clear that sentiment analysis on short texts like tweets is not the easiest task to take on.

One of the main issues that people talk about today on Twitter is obviously the tragic death of Patrick Swayze. I run searches of his name in our own analysis engine, OpenAmplify, and compared with results from TipTop above. I have to admit though, that I was not overly impressed. Maybe you could tell us some more about how your tool is unique.

Here's some of my observations from testing TipTop:
The latest set of hits that I got at one point were the following:

Positive:

  May Patrick Swayze be at peace and his family find comfort in the sweet memories they made together while he was here in this world. 

Negative:

  RIP Patrick Swayze. Dirty Dancing will always be one of my favorit movies and Swayze one of my first celebrity crushes.

 

Remaining messages: (According to the a definition in the article above: "Tweets deemed less useful are placed in Remaining messages, and may include polls, advertisements and standard, plain facts".)

BREAKING NEWS: Kanye West just interrupted Patrick Swayze's death to say that Michael Jackson's death was much better.

 

How are the three tweets different from each other? Why is one negative and one positive, and what makes Kanye West's crazy ideas "deemed less useful"?

A completely different issue that came up during my testing was this:
RIP, R.I.P, or 'rest in peace': How should that be analyzed? How would a human analyze it? It is of course a negative thing that a person has died, but rest in peace is a nice thing you say.

So, a simple short tweet like: "RIP Patrick Swayze", should that be analyzed as expressing positive or negative sentiment?
What do you think?

Is this the hot topic now in Web 3.0 - Sentiments on RealTime

One more to the list ..

Insttant finds real time updates to compile detailed analytics including sentiment,
user analysis and real time headlines, Insttant generates unfiltered,
real time news based upon individuals as well

 

I'm convinced! :)

Hi Alexandra,

I am grateful for your feedback on TipTop.  Your understanding is perfect.  This is an extremely hard problem and no existing solution comes anywhere close to perfection.  I would argue that even humans would not agree in many cases on what should be positive or negative.  (cf., the "RIP Patrick Swayze" question your raise at the end of your note.).  I would like to make only a few general points.

1.  Accuracy of the TipTop semantics engine is measured by us through two means:

(a) anecdotal evidence - people try it and give feedback.  Positive opinions have so far far outnumbered the negative ones.  In fact this forum is the first forum I have come across on the web where you and one other person have formed and expressed negative sentiment about TipTop's quality.

(b) We run through a large number of random selected queries and then evaluate the results.  The accuracy numbers are very impressive.

2.  To see sentiment about a specific concept or category, you have to click on it.  The overall sentiment for the query is only an approximation which we show to make the user experience a bit simpler.  Asking users to click twice before they see any interesting results would not be good user experience.

3.  TipTop search experience is about a lot more than sentiment analysis.  To really understand what TipTop is and will be, you need to play around with it a lot more.  Use all the features and then see what all magical abilities it has (and many more it will have as we are working 24/7 to add more cool features).  The value of some features won't be obvious to typical users for days and perhaps even weeks.

4.  There are lots of feeble attempts at Semantic analysis that end up drawing quite a bit of hype.  TipTop, in contrast, is based on 20 plus years of highly sophisticated research.  In the background there is both academic rigor as well as adequate practical understanding of the search industry.  I hope this comes through in the product.

I am happy to chat with you or anyone else who is interested in finding out more.

Thanks again.

 

 

 

Shyamkapur, I agree there are probably a lot of actors out there with semantic analysis tools that are less rigorous and serious - regarding both the research and the size and quality of the test corpora they base their analysis systems on, than other.. Everyone wants a piece of the cake.

OpenAmplify, develops natural language processing and text analysis software. Behind our core technology lies over 20 years of academic research. The company has been operating since 1999, I myself will in January celebrate my tenth anniversary  :)

It is nice to hear that we're not the only ones trying to do this properly, and we do enjoy a bit of competition.

I encourage you to have a look around on our site, see also openamplify.com, and of course: grab an API and test it!

Has this been analyzed for sentiment? Just curious as to the output of this.

End of Year Analysis: 2009 roundup and 2010 predictions

"We’re likely to see some big developments in search next year... Semantic search should develop further as will so-called decision engines."

2010, I can't wait to get there!

Nice sum up, but the article doest not speak about Nexus(Google Phone) and Android as they are going revolutionize the Carrier dependent Phone concept - Its going to be an unlocked phone.

Few thing that i would like to add:

1) Real Time streams with Geo-Data is going to change the way we consume feeds.

  • Think of a person tweeting from his phone with his location information tagged. 
  • Think of you getting an alert when you friend passes by the same place where you are currently in.
  • Think of you getting a feed  - "You friend is in the same restaurant you visited 2 days back. Do you wanna suggest him something?"

2) As Mentioned , Social Media is going to be "THE" player.

3) With Google , Microsoft giving importance to  Microformats,RDF,linked data -  Yes OpenAmplify is going to play a vital role in  WEB 3.0

 

 

Hear, hear! 2010 will be our year, methinks... Smile

This morning the two companies announced a deal whereby,

"Microsoft will now power Yahoo! search while Yahoo! will become the exclusive worldwide relationship sales force for both companies’ premium search advertisers." [from the official press release]

What do you think the implications are for this search merger deal?

Innovate or spend?  They are attacking the search market with so much cash in hand to spend, which is why their market share keeps growing.

Although they recently lost a couple % of market share: http://erictric.com/online/bing-us-market-share-dips-2-from-october-to-november

I really think Bing is going to have to innovate to overcome Google's user base advantage.

It means that Bing is going to be one of two major players in the search space.  Install that Bing Greasemonkey script now, before it becomes a "non unbundleable" piece of IE 9.

From a security blog that I read regularly: inferring private information about someone based on their social network connections.

I wonder how much more you could get out of this sort of analysis by throwing semantic analysis into the mix. I have friends in my social networks that I disagree with rather profoundly; a semantic analysis of our posts might be able to infer this and disregard the connections to them when trying to infer things about me.

Reminds me of the adage -  Friend is a reflection of yourself.

Interesting; though these days they'll have to correct for people inviting tons of "friends" to build up their team for various games (Mafia Wars, Farmville, etc. etc) -- since people invited for that may not have anything more in common....

I blocked all the Farmville and Mafia Wars updates, so I now have a better vision of my friends...

Point your friends to this post.  I would love to hear what they have to say.

Have you ever visited Sweden?

July 2010 is the time to come here. Sweden in summer is really something, and this coming summer it will be extraordinary:

The 48th Annual Meeting of the Association for Computational
Linguistics will be held in Uppsala, Sweden, July 11–16, 2010. The
conference will be organized by the Department of Linguistics and
Philology at Uppsala University.

The conference will take place at Uppsala University Campus in a
genuine university environment, dating back as far as 1477. The city
also holds a rich history, having for long periods been the political,
religious and academic center of Sweden. The proximity to the capital,
Stockholm, provides additional benefits as a potential site for
arranging both pre- and post-conference tours, as well as for
excursions or tourism during the conference. The city of Uppsala is
easy to reach by plane, train or car. Read more here.

I know we will be there, and we would love to meet all of you there.

Sounds like  a great conference.  

Yes, very relevant to a lot of the things we are trying to do.

The workshops look very interesting! http://www.acl2010.org/workshops.html

 

It would be great if we could have a demo of course. And then we need you guys to come over! :D

Amazing !!! . I see Domain Categorization , Named Entity Recognition and what not. It overlaps very well with whatever we do. Would be great Conference. Are you guys presenting any paper?

Would love to visit Sweden and of course meet you guys .. Big Smile

OpenAmplify field trip, anyone? Smile