What you will find here

Saturday, December 26, 2009

TRENDS 2 - Clustering

Clustering

Clustering is one of the methods that serves for data classification. It is traditionally used as algorithm beyond the information retrieval process as the assessment documents relevance. The innovation which could be brought by this approach is the projection of this algorithm in the presentational level of the information retrieval system.

Cluster

Cluster is defined as number of similar items – things, persons or groups - grouped closely together. The difference between clusters and thesaurus classes is the unsupervised classification – clusters are not predefined. The initiative which activates the clustering process is the user’s need expressed by user’s information retrieval query. Clusters could show the natural grouping or structure in data set. There are several clusters as resulting forms that are exploited in different clustering methods and models (Zaïane, 1999):

  • Exclusive Clustering – definite cluster with strict data
  • Overlapping Clustering – fuzzy sets to cluster data, each data has different degree of membership, each cluster belongs to two or more clusters
  • Hierarchical Clustering – union between two nearest clusters
  • Probabilistic Clustering – completely probabilistic approach

Distance-based clustering

We could divide clusters in different groups according to the algorithm that defines different grouping. In the case of the first picture, we easily identify 4 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance. This is called distance-based clustering (Zaïane, 1999) – items in the group share almost the same characteristics expressed by their position in the information space; items are depicted in the 2D or 3D space – in our case 2D - according to their options that establishes their uniform position.

Conceptual clustering

Another kind of clustering is conceptual - two or more objects belong to the same cluster if each one is defined by common concept to all that objects. Conceptual clustering is not based on perfect match and similarity between objects, but rather conceptual likeness (Tutorial, 2000). Categories and features that determinate the similarity of the group are fuzzy and more open than in the previous distance model, they cold be defined as overlapping clusters – items in the group have at least one “same” character.

The example of conceptual approach is Latent Semantic Indexing (LSI, see Deerwester et al., 1990). A query with one term (such as “pigs”) could have a high similarity with a document that has a related term (“hogs”). Rather than expanding queries based only a small set of term relations, LSI considers all terms potentially related to each other, and all documents to be similarly related (Newby, 2002).

Model-based clustering

Another of the conceptual clustering approaches is the model-based clustering methods. It is based on fit between two different data sets the data set and model. It emerges from the nonlinear m-dimensional inputs in data set. Which position is based on closeness. Thus the data set is selfcorrecting according to the changeable mental model. This theory is in connection with the SOM – self organizing model - theory from 1981 proposed by Kohen.

The further development in clustering theories is based on the likeness with human information acquisition. According to this approach precede the clustering theory the learning, statistic and probabilistic theories.

Cognitive aspects

The advantage of the clustering is the close similarity to the human way of thinking. It responses to the theory of inner mental modelling according to Wittgenstein and the theory of term and conceptual thinking, that enables people to deal with large data sets and easier to classify their long term memory (Loukotová, 2009). Clustering method though reflects the higher mental activities and is sufficient for information retrieval. Other important advantage is its relation to the changeable context of the real world. The structure of clusters is not fixed and it is reflecting the changes of the inner mental model depending on the reality.

The clustering method on the representative level could then bring a tool for easier understanding of the data set’s environment and deeper understanding of the relations in between the terms and objects and not to say the reality.

Problems

The exploitation of clustering method in the Web environment brings problem as each method that is based on similarity to the human thinking. There emerge a lot of different unknown and changeable facts that have to be taken in account. As bigger data set as more unknown facts. Other problem is the changeability of data set itself. In the web environment is the change of the amount and kind of data high and fast.

All kinds of clustering models are basically founded on sort of “distance” between terms and thus the right identification of the cluster is based on their representation in the information space. In follows the problem of filtering clusters is primarily consequent on the position of the clusters in the information space.

Conclusion

Nowadays clustering methods are highly exploited in the form of hidden algorithm. However their exploitation is not fully utilized. The potential is in the cognitive aspects of the method. As will be presented later, this approach is closest to the cognitive perception and ways of human thinking. That could in connection with search engines serve as the perfect information retrieval and learning tool.

Examples

Solitary applications: Carrot2Workbench

Web search engines: clusty.com


References

BORGMAN, Christine L. (1989). All Users of Information Retrieval Systems are Not Created Equal: An Exploration into Individual Differences. Information Processing and Management, vol. 25, no.3, pp. 237–251.

CARD, Stuart K., Mackinlay, Jock D., and Shneiderman, Ben. (1999). Readings in Information Visualization : Using Vision to Think. San Francisco: Morgan-Kaufman.

CEJPEK, J. (1998) Informace, komunikace a myšlení. Karolinum, Praha. 178

HULL, David A. (1999). The TREC-7 Filtering Track: Description and Analysis. In Voorhees, Ellen and Harman, Donna (Eds.), Proceedings of the 7th Text REtrieval Conference (TREC-7), Gaithersburg. Maryland: National Institute of Science and Technology

INGWERSEN, P. (1996). Cognitive Perspectives in Information Retrieval Interaction: Elements of a Cognitive IR Theory. J. Documentation, vol. 52, no. 1, pp. 3–50.

LOUKOTOVÁ, K. (2009) Úvod do problematiky uživatelského rozhraní. In Červenková, A. & Hořava, M. (Eds.), Uživatelsky přívětivá rozhraní. Horava &Associates, Praha.

NEWBY, G. B. (2002) Empirical Study of a 3D Visualization for Information Retrieval Tasks. Journal of Intelligent Information Systems, vol. 18, pp. 31–53.

SABOL, V. et al. (2002) Applications of a Lightweight, Web-Based Retrieval, Clustering, and Visualization Framework. In D. Karagiannis and U. Reimer (Eds.): PAKM 2002, LNAI 2569, pp. 359–368, 2002.

SHNEIDERMAN, Ben. (1996). The Eyes Have It: User Interfaces for Information Visualization. Technical Report No. CS-TR-3665, Human Computer Interface Laboratory. University of Maryland at College Park. Available at http://www.cs.umd.edu/TRs/groups/HCIL-no-abs.html

SCHAMBER, Linda, Eisenberg, Michael, and Nilan, Michael. (1991). Towards a Dynamic, Situational Definition of Relevance. Information Processing and Management, vol. 26, no. 2, pp. 755–776.

TUTORIAL on Clustering Algorithms (2000) Politecnico di Milano. Available at: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/

ZAÏANE, Osmar R. (1999) Principles of Knowledge Discovery in Databases - Chapter 8: Data Clustering. University of Alberta. Available on : http://www.cs.ualberta.ca/~zaiane/courses/cmput690/slides/Chapter8/index.html


48 comments:

Hema said...

This is really a very good article. Thanks for taking the time to discuss with us, I feel happy about learning this topic. keep sharing your information regularly for my future reference.
Branding Services in Chennai

Sowmiya said...


Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot.thus these tips are really awesome and you had a wonderful products.


Digital Marketing Company in Chennai

Anonymous said...


I read your blog you have applied some information only in this blog, please give some more information too. It will be helpful for me.


Best Dental Clinic In Vellore

Sri akshaya said...


very interesting to read,thanks for sharing that wonderful useful information,given articles was very excellent and easily observe all provided information.

Digital Marketing Company in Chennai

Sowmiya said...

your concepts are really unique to me, as i am looking forward more much, it is very well impressive too.

Web Design Company in Chennai

deeksha said...

really the clustering concepts are very good, i am awaiting for your next blog, so update latest information.

Digital Marketing Company in Chennai

deeksha said...

I read your articles very excellent and the i agree our all points because all is very good information provided this through in the post.


Digital Marketing Company in Chennai

Unknown said...


It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.

Dotnet Training in Chennai

Nandhini said...

I just see the post i am so happy the post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be subscribing to your feed and I hope you post again soon.
House Cleaning Service in Chennai

Mahalya sree said...

This is a very interesting web page and I have enjoyed reading many of the articles and posts contained on the website, keep up the good work and hope to read some more interesting content in the future.
Office Interiors in Chennai
Home Interior Decorators in Chennai

Mahalya sree said...

I’ve been browsing on-line greater than three hours today, but I never discovered any attention-grabbing article like yours. It is beautiful worth sufficient for me. Personally, if all webmasters and bloggers made good content material as you did, the net will be a lot more helpful than ever before.
Architectural Firms in Chennai
Architects in Chennai

Anonymous said...

I could understand the basics concept of clustering .What cluster actually means and what are the types of clustering involved . It is explained very well by giving examples. Thanks for sharing.

Web Development Company in India

Unknown said...

wow amazing post.The key points you mentioned here related to maintenance of car is really awesome.Checking all fluid levels,changing and of

course the regular service of the car which is necessary to maintain our vehicle.Thank you for the information.
cloud computing training in chennai


dazzling said...


excellent blog makes readers to visit your blog again and again

android training in chennai

ios training in chennai

anupavi said...

Excellent way of describing, and nice post.you are posting such a new idea is very interesting and give updates.

Informatica Training in Chennai

Unknown said...

This is a way to implement and describe the blog in this manner; keep creating a blog like this continuously-:)

SAP ABAP Training in Chennai


SAP MM Training in Chennai


SAP HR Training in Chennai

pplcallmemeenu said...

There are many interesting information included and i can easily understand all given information.I post something on my blog to post something, or wait to post something worth saying. Keep update more information....
Web development Company in India

pplcallmemeenu said...

This is very important for web designers perfection is most important. This article contains some of the most informative content. I think much like this writer. It is a very valuable and helpful collection of blogs.
Web
development Company in India

pplcallmemeenu said...

This is very important for web designers perfection is most important. This article contains some of the most informative content. I think much like this writer. It is a very valuable and helpful collection of blogs.
Web
development Company in India

Raghu said...

This information is impressive; I am inspired with your post writing style & how continuously you describe this topic.

Eczema Treatment

Unknown said...

Just read your website. Good one. I liked it. Keep going. you are a best writer your site is very useful and informative thanks for sharing!
Peridotsystems
Data Science Course in Chennai

Sumathi said...

I appreciate your style of writing because it conveys the message of what you are trying to say. It's a great skill to make even the person to understand the subject . Your blogs are understandable and also informative. I hope to read more and more interesting articles from your blog. All the best.
Psoriasis Treatment

meenumaga said...


you are posting a good information for people and keep maintain and give more update too.
Self Employment Tax
Tax Preparation Services
Tax Accountant
Tax Consultant

Unknown said...

Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much
Wooden Temple for Home
Tanjore Painting
Pooja Mandir

Unknown said...

It's like you read my mind! You seem to know a lot about this, like you wrote the book in it or something. I think that you can do with some pics to drive the message home a little bit, but instead of that, this is fantastic blog. A great read. I will definitely be back.

Selenium Training in Chennai

Unknown said...

Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much







Microsoft Azure

sharath said...

I find this blog very useful for me

MSBI Training in Chennai
Informatica Training in Chennai
Data Warehousing Training in Chennai




Anonymous said...

Really i got more information from your knowledge...Thanks for sharing this.


SAP FICO Training in Chennai

Unknown said...


This is so informative blog and i have used this 5 minutes thing very useful too happy with this blog........
SAP FICO Training in Chennai

Unknown said...

It is a very informative blog. Thanks for sharing this information...
SAP MM Training in Chennai

Unknown said...

Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing
SAP Training in Chennai
SAP ABAP Training in Chennai

Nicole Bolton said...

This is a great post. I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting this again.

Manufacturing ERP
Human Resources Management Software
CCTV Camera Dealers in Chennai

Pallavi karthi said...

What you have written in this post is exactly what I have experience when I first started my blog.I’m happy that I came across with your site this article is on point,thanks again and have a great day.Keep update more information.
Research Paper Publication
Science Journal
IEEE Projects
Journal Impact Factor
Highest Impact Factor Journal

sugan2v said...



Thank you for information ,If your post was benified to people and share again this orientation of post.I need your post so you will share to me and I trust for all the concepts you have present on you post


Dental clinic in HSR Layout
Dentist in HSR Layout

radha said...


Awesome article. It is so detailed and well formatted that i enjoyed reading it as well as get some new information too. provides you with a state of the art software which combines modern GPU technology (Graphic Processing Units) with the best practices in today’s Big Data platforms, providing up to 100x faster insights from data.
Bigdata Training in Chennai OMR

Antony Jack said...

Good post. I learn something totally new and challenging on blogs I stumble upon on a daily basis. It will always be interesting to read articles from other authors and practice something from their websites...
Lead Management Software in India
legal management software in india
Procurement Management Software

Unknown said...

Its very informative and innovative post please continue updating like this for getting more new information from your side:
Best Architects in Chennai
Turnkey Interior Contractors in Chennai
Architecture Firms in Chennai
Warehouse Architect
Factory Architect Chennai
Office Interiors in Chennai
Rainwater Harvesting chennai

Unknown said...

This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot. 
Java training in Bangalore | Java training in Marathahalli

Java training in Bangalore | Java training in Btm layout

Java training in Bangalore | Java training in Marathahalli

Java training in Bangalore | Java training in Btm layout

Mounika said...

This is such a great post, and was thinking much the same myself. Another great update.
python interview questions and answers
python tutorials
python course institute in electronic city

Unknown said...

That was a great message in my carrier, and It's wonderful commands like mind relaxes with understand words of knowledge by information's.
Best Devops training in sholinganallur
Devops training in velachery
Devops training in annanagar
Devops training in tambaram

prabha said...

Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information. 

angularjs-Training in pune

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in chennai

automation anywhere online Training

angularjs interview questions and answers

simbu said...

Appreciating the persistence you put into your blog and detailed information you provide
Java training in Chennai

Java training in Bangalore

tamilsasi said...


This is quite educational arrange. It has famous breeding about what I rarity to vouch.
Colossal proverb. This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved.
This arrange is synchronous s informative impolite festivity to pity. I appreciated what you ok extremely here.

Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Selenium interview questions and answers


Oris Dental said...
This comment has been removed by the author.
Unknown said...

Great article.Thanks!


Dentist in Tirunelveli

Best Dental Clinic in Tirunelveli

Best Dentists in Tirunelveli

ranjitha said...

Nice post this is had enough information. I thanking you for sharing this kind of information, it may help clear the people. Please continue to share this kind of information.
Cosmetic Dental Treatment
Full mouth dental implants in Chennai
Nobel Implants In Chennai
Best Dentist In Chennai
Best Dental Clinic in Chennai
Best dental clinic in nungambakkam

latchu kannan said...

Your website is awesome.

AngularJS training in chennai | AngularJS training in anna nagar | AngularJS training in omr | AngularJS training in porur | AngularJS training in tambaram | AngularJS training in velachery


aarthi said...

Very nice explanation.Expecting for further more updates. Java training in Chennai | Certification | Online Course Training | Java training in Bangalore | Certification | Online Course Training | Java training in Hyderabad | Certification | Online Course Training | Java training in Coimbatore | Certification | Online Course Training | Java training in Online | Certification | Online Course Training