I am just back from WWW'09. I presented the article "A Data Mashup Language for the Data Web" at the LDOW workshop. Here is my presentation:
4/23/09
A Data Mashup Language for the Data Web
4/4/09
Corruption in the research culture
You may want to read ypwong's post quoted below, which describes (some) of the corruption among our scientists. True!!! Some of the good scientists even blindly accept this corruption as "these are the rules we have to play". I wish Google Scholar will start filtering SPAM papers, even they appeared in "A+" conferences, as the corruption their has other colors.
ypwong commenting on the "STOP THE NUMBERS GAME":
Many people seems to think that academic world (i.e. universities and institute of higher learning, particularly those with strong emphasis on research other than just teaching) is all perfect and free from all those negative thing associated with the industries, such as politics, corruptions, injustice, unfairness etc. Many people who have worked in both industries and academia told me that academic world (in general) can be as bad as in industries, even worst sometimes. Ok, before you jumped on me, let me say that i am not implying industries is a bad place to work, I simply want to state that whatever problems exist in industries, the same problems exist in academic world as well. Likewise, academic world is not as trouble-free as most people thought.
The above is even more true as nowadays educations have becoming so commercialized that degrees, awards, papers etc. are becoming commodities, at the price of neglecting quality.
As far as academicians are concern, I think there is nothing as sickening as the papers-chase (research papers, that is) where many 'researchers' (quotes intended) competing with one another on number of papers they have published as many universities are awarding their staff based on the number of papers they have published, because universities will get higher ranking based on these numbers, which in turn will translate into students enrollment and funding, at least that is what they thought. At the end of the day, just like any other industries, it is dollars and ringgits signs they all interested. So very often, all these are done REGARDLESS the actual content of the papers and i beg almost all the time, most universities will not even read those papers that being published! They argue that since those papers have been accepted in conferences and journals, the papers must surely be good.
One only need to read through some of the conference and journal papers to know that many of these conferences and journals are there to serve only two purpose:
- The authors published to add one more paper to his resume and better prospect for promotion, the university do not mind sponsoring them because one more paper credited to the university as well.
- The publishers laugh all the way to the bank
Not forgetting that as for conferences, it is also good for the economy of the country and city of the local organizer, I call this ACADEMIC TOURISM !!!
So often, 'research papers' now serve the writers and not the readers. There are even instances where RANDOMLY-GENERATED PAPERS are accepted in conference!!! See http://pdos.csail.mit.edu/scigen/ for a funny and yet serious site about this issue.
This is SO SICKENING !!!
STOP THE NUMBERS GAME !
See below a recent ACM article on the same issue, I felt so compelled to put the entire article here below (since we are given permission to do so, see notice at the bottom of the article):
Communications of the ACM
Volume 50, Number 11 (2007), Pages 19-21
Viewpoint: Stop the numbers game
David Lorge Parnas
As a senior researcher, I am saddened to see funding agencies, department heads, deans, and promotion committees encouraging younger researchers to do shallow research. As a reader of what should be serious scientific journals, I am annoyed to see the computer science literature being polluted by more and more papers of less and less scientific value. As one who has often served as an editor or referee, I am offended by discussions that imply that the journal is there to serve the authors rather than the readers. Other readers of scientific journals should be similarly outraged and demand change.
The cause of all of these manifestations is the widespread policy of measuring researchers by the number of papers they publish, rather than by the correctness, importance, real novelty, or relevance of their contributions. The widespread practice of counting publications without reading and judging them is fundamentally flawed for a number of reasons:
* It encourages superficial research. Those who publish many hastily written, shallow (and often incorrect) papers will rank higher than those who invest years of careful work studying important problems; that is, counting measures quantity rather than quality or value;
* It encourages overly large groups. Academics with large groups, who often spend little time with each student but put their name on all of their students' papers, will rank above those who work intensively with a few students;
* It encourages repetition. Researchers who apply the "copy, paste, disguise" paradigm to publish the same ideas in many conferences and journals will score higher than those who write only when they have new ideas or results to report;
* It encourages small, insignificant studies. Those who publish "empirical studies" based on brief observations of three or four students will rank higher than those who conduct long-term, carefully controlled experiments; and
* It rewards publication of half-baked ideas. Researchers who describe languages and systems but do not actually build and use them will rank higher than those who implement and experiment.
Paper-count-based ranking schemes are often defended as "objective." They are also less time-consuming and less expensive than procedures that involve careful reading. Unfortunately, an objective measure of contribution is frequently contribution-independent.
Proponents of count-based evaluation argue that only good papers get into the "best" journals, and there is no need to read them again. Anyone with experience as an editor knows there is tremendous variation in the seriousness, objectivity, and care with which referees perform their task. They often contradict one another or make errors themselves. Many editors don't bother to investigate and resolve; they simply compute an average score and pass the reviews to the author. Papers rejected by one conference or journal are often accepted (unchanged) by another. Papers that were initially rejected have been known to win prizes later, and some accepted papers turn out to be wrong. Even careful referees and editors review only one paper at a time and may not know that an author has published many papers, under different titles and abstracts, based on the same work. Trusting such a process is folly.
Measuring productivity by counting the number of published papers slows scientific progress; to increase their score, researchers must avoid tackling the tough problems and problems that will require years of dedicated work and instead work on easier ones.
If you get a letter of recommendation that counts numbers of publications, rather than commenting substantively on a candidate's contributions, ignore it.
Evaluation by counting the number of published papers corrupts our scientists; they learn to "play the game by the rules." Knowing that only the count matters, they use the following tactics:
* Publishing pacts. "I'll add your name to mine if you put mine on yours." This is highly effective when four to six researchers play as a team. On occasion, I have met "authors" who never read a paper they purportedly wrote;
* Clique building. Researchers form small groups that use special jargon to discuss a narrow topic that is just broad enough to support a conference series and a journal. They then publish papers "from the clique for the clique." Formation of these cliques is bad for scientific progress because it leads to poor communication and duplication, even while boosting the apparent productivity of clique members;
* Anything goes. Researchers publish things they know may be wrong, old, or irrelevant; they know that as long as the paper gets past some set of referees, it counts;
* Bespoke research. Researchers monitor conference and special-issue announcements and "custom tailor" papers (usually from "pre-cut" parts) to fit the call-for-papers;
* Minimum publishable increment (MPI). After completing a substantial study, many researchers divide the results to produce as many publishable papers as possible. Each one contains just enough new information to justify publication but may repeat the overall motivation and background. After all the MPIs are published, the authors can publish the original work as a "major review." Science would advance more quickly with just one publication; and
* Organizing workshops and conferences. Initiating specialized workshops and conferences creates a venue where the organizer's papers are almost certain to be published; the proceedings are often published later as a book with a "foreward" giving the organizer a total of three more publications: conference paper, book chapter, and foreward.
One sees the result of these games when attending conferences. People come to talk, not to listen. Presentations are often made to nearly empty halls. Some never attend at all.
Some evaluators try to ameliorate the obvious faults in a publication-counting system by also counting citations. Here too, the failure to read is fatal. Some citations are negative. Others are included only to show that the topic is of interest to someone else or to prove that the author knows the literature. Sometimes authors cite papers they have not studied; we occasionally see irrelevant citations to papers with titles that sound relevant but are not. One can observe researchers improving both their publication count and citation count with a sequence of papers, each new one correcting an error in the hastily written one that preceded it. Finally, the importance of some papers is not recognized for many years. A low citation count may indicate a paper that is so innovative it was not initially understood.
Accurate researcher evaluation requires that several qualified evaluators read the paper, digest it, and prepare a summary that explains how the author's work fits some greater picture. The summaries must then be discussed carefully by those who did the evaluations, as well as with the researcher being evaluated. This takes time (external evaluators may have to be compensated for that time), but the investment is essential for an accurate evaluation.
A recent article [1], which clearly described the methods used by many universities and funding agencies to evaluate researchers, offered software to support these methods. Such support will only make things worse. Automated counting makes it even more likely that the tactics I've described here will go undetected.
One fundamental counting problem raised in [ 1] is the allocation of credit for multiple-author papers. This is difficult because of the many author-ordering rules in use, including:
* Group leaders are listed first, whether or not they contributed;
* Group leaders are listed last, whether or not they contributed.
* Authors are listed in order of contribution, greatest contribution first;
* Authors are listed by "arrival," that is, the one who wrote the first draft is first; and
* Authors are listed alphabetically.
Attributing appropriate credit to individual authors requires either asking them (and believing their answers) or comparing the paper with previous papers by the authors. A paper occasionally contributes so much that several authors deserve full credit. No mechanical solution to this problem can be trusted. It was suggested in [1] that attention be restricted to a set of "leading" journals primarily distinguished by their broad coverage. However, there are often more substantive and important contributions in specialized journals and conferences. Even "secondary" journals publish papers that trigger an important new line of inquiry or contribute data that leads to a major result.
Only if experts read each paper carefully can they determine how an author's papers have contributed to their field. This is especially true in computer science where new terms frequently replace similar concepts with new names. The title of a paper may make old ideas sound original. Paper counting cannot reveal these cases.
Sadly, the present evaluation system is self-perpetuating. Those who are highly rated by the system are frequently asked to rate each other and others; they are unlikely to want to change a system that gave them their status. Administrators often act as if only numbers count, a probability because their own evaluators do the same.
Those who want to see computer science progress and contribute to the society that pays for it must object to rating-by-counting schemes every time they see one being applied. If you get a letter of recommendation that counts numbers of publications, rather than commenting substantively on a candidate's contributions, ignore it; it states only what anyone can see. When serving on recruiting, promotion, or grant-award committees, read the candidate's papers and evaluate the contents carefully. Insist that others do the same.
back to top References
1. Ren, J. and Taylor, R. Automatic and versatile publications ranking for research institutions and scholars. Commun. ACM 50, 6 (June 2007), 81–85.
back to top Author
David Lorge Parnas is Professor of Software Engineering and Director of the Software Quality Research Laboratory in the Department of Computer Science and Information Systems at the University of Limerick, Limerick, Ireland.
back to top Footnotes
I am grateful for suggestions made by Roger Downer and Pierre-Jacques Courtois after reading an early version of this "Viewpoint." Serious scientists, they did not ask to be co-authors.
back to top
©2007 ACM 0001-0782/07/1100 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2007 ACM, Inc.
6/5/08
ConferenceCalendar
I built a Conference Calendar ordered by submission date, which you may find useful. I included only some important conferences in the areas of Knowledge\Ontology\Data\Web engineering. I plan to update this calendar from time to time.
I wish Google and Yahoo will "volunteer" to build an advanced conference system for all research areas and activities. It is really strange that they still don't see the business value of leading the Innovation...Innovation...Innovation.... life-cycle.
3/14/08
The problems of SPARQL
In the following I would like to summarise the main challenges that may hamper the utility of SPARQL:
- Formulating a query in SPARQL requires the query writer to understand the data structures of the RDF resources being queried, which is usually done by eye-parsing of these resources. Since RDF resources are typically large and written in technical and long terms, the eye-parsing scenario is not practical, and certainly hampers the whole utility of SPARQL. Compared with SQL, writing a SQL query requires also the writer of this query to lookup and understand the underpinning database schema, however, such database schemes are typically concise and manageable even in case of large organizations. In addition, the people who design a database for an organization, usually, are the same people who write the SQL queries, i.e. unlike RDF, the world for a database is closed.
- Learning SPARQL (and RDF) is not easy for the majority of the IT community. SPARQL is not yet known outside its small community. Although RDF and SPARQL are indeed simple technologies, but I found that the majority of the IT people (including many senior researchers) cannot understand or author simple RDF documents. I believe this is because the intuition of representing knowledge in directed labelled graphs and graph patterns is not familiar in the IT education. Unlike databases and SQL, where the intuition and the logic of relations is being taught in every IT program since more than 40 years, the description logic underpinning RDF and SPARQL, even it is simple, but it needs many years of killer applications and tools to go.
- SPARQL expressivity seems unsatisfactory. Although SPARQL is an expressive language, but as one may notice in the SPARQL literature, the majority is requesting many extensions to enrich SPARQL in different ways. In one hand, I found that only few of these suggestions are fundamental to the core of SPARQL, but the majority are conductive, i.e. they do not affect the functionality or the expressiveness of SPARQL, as they can be emulated somehow, but they make SPARQL more natural, practical, and concise for different usage scenarios. On the other hand, the amount of proposed extensions, even they are conductive, but they give an indication that the expressivity of the SPARQL operators is not satisfactory in practice. In my opinion, SPARQL should not be extended unless this is very necessary, as these extensions may harm the scalability and optimization of SPARQL. Instead, I believe an extra layer(s) toping SPARQL for handling such extensions is necessary.
The Importance of SPARQL
I believe SPARQL has made the web as a database, where each RDF resource is seen as a table; all resources can be accessed and queried using SPARQL, as simple as querying tables using SQL. Imagine a researcher would like to generate his list of publications from the DLPB, Google Scholar, and CiteSeer. Suppose these scholarly libraries generate RDF of the results they provide, or a third service converts their results into RDF. Integrating and mashing up this content can be done easily using SPARQL, I shall illustrate this examples, in later posts.
It is important noting that Oracle 10g and 11g support RDF storage and querying. Oracle stores RDF triples in one simple table, in addition to some indices. Querying RDF in Oracle is done in a SPARQL style. As shown by Oracle, this implementation is scalable. For example, a query (with medium size complexity) over 80 Million RDF triples (5.2 GB) takes less than a second. It has been shown that retrieval cost per result row remains almost the same as the dataset size changes. I believe this simple and scalable support of the RDF technology (by Oracle) will accelerate the adoption of RDF as the web metadata format, and thus SPARQL as the web query language.
2/12/08
"Semantic Pipes- Towards Web 3.0 Information Systems"
The first part of the lecture gives an introduction about Web 2.0 (Wikis, RSS, Blogs, Mashups, and a live demo about Yahoo Pipes). Although these technologies have shown a great success on how people can collaborate to produce and share content on the web; however, these technologies are still being used for social-oriented web pages. To expose these technologies for business and enterprise applications (i.e. to use it in business), we discuss (in the second part) how to build a Web Information System, where the information sources are distributed over different locations and edited by different people. Similar to the idea of Yahoo Pipes (that processes and mashes up RSS feeds easily), Semantic Pipes can be built to process and query RDF information sources. RDF is not only a well-structured data format, but it enables also the data semantics to be well-defined.
Querying RDF can be done using SPARQL (the RDF query language). In this way, querying multiple RDF sources in SPARQL, processing and storing these queries, and reusing them later -similar to Yahoo pipes- would not only enable information reuse and sharing, but it would also enable meaningful integration, search, access, and interoperation, which are the necessary services for a Web Information System. In this way, one can imagine a Web Information System as a database, where "tables" are seen "RDF sources" (distributed over different locations), and "views" are stored SPARQL queries.
Transforming the data on Web into such databases is called Web3.0.
2/6/08
Yahoo Pipes by its Developers

An interesting introduction about Yahoo Pipes, presented by the its developers. Click to Watch the Video.
12/6/07
Web 2.0, Social Programming, and Mashups -What is in for me!"
I gave an invited lecture about on Web 2.0 to students in the "Foundations of Internet Technologies" course, at the University of Cyprus. This tutorial was extended and presented again to many colleagues at the computer science department, with a new title "Web 2.0, Social Programming, and Mashups -What is in it for me!" In this tutorial you can see many examples of Web 2.0 websites and mashups. In fact it is very interesting to talk about Web 2.0 and how the big companies (e.g. Google, Yahoo, YouTube, and many others) are taking the lead, encouraging us to contribute, and providing all possible user-friendly interfaces.
As users, we find this is amazing. We feel we have to join these social websites, contribute with content, review products, upload photos and videos, write blogs, and socialize and date through our computers. However, I suspect many of the social websites will fail because of the privacy crises that may come soon. Many users are not aware with the hacking and openness of those websites. Let's wait and see!
From a research viewpoint, I believe Web 2.0. has created a great opportunity for researchers to build on the RSS and Mashup technologies.... which is currently leading to Web 3.0, or I like to call it Semantic Pipes. In other words, Semantic-based Mashups. One way to achieve this (but maybe there are other ways) is to mashup RDF data using SPARQL.
11/12/07
ER 2007, New Zealnd
I am just back from New Zealand. It is nice to experience the end of the world, very interesting country, but it's a long and difficult trip.
I attended the ER'07 conference on conceptual modeling. I presented not only "Towards Automated Reasoning on ORM Schemes, -Mapping ORM into the DLR_idf description logic" (which is accepted in this conference), but also a related paper called "Mapping ORM into the SHOIN/OWL Description Logic- Towards a Methodological and Expressive Graphical Notation for Ontology Engineering", which I will present again at the ORM workshop.
Thanks to many people who gave me comments and suggestions!
9/17/07
Moving to Cyprus
On the 2nd of October 2007 I will start my new Marie Curie Postdoc Follow at the University of Cyprus. I am joining the HPCLab, which is lead by Prof. Marios Dikaiakos.
Soon I will write you about my new research activities, which will be combine Ontology Engineering, Logic, ORM, and (Web 2.0).
8/22/07
DogmaModeler is Open Source
DogmaModeler is open source now. You can download the tool, source files, and others from http://www.jarrar.info/Dogmamodeler/
8/1/07
The Customer Complaint Ontology is open now
Some years ago -and in collaboration with many lawyers (CCFORM project partners)- I developed an ontology about customer complaints. The ontology is open now, and you can browse its modules. The application scenario of this ontology was published in a paper [1] and [2], and recently as a book chapter[3]. I think this scenario shows an example of an ontology-based application (that is NOT Semantic Web technology ;-)
11/13/06
Hello World!!
It is because of the RSS technology that I start blogging. It is amazing that your browser collects the latest news and posts from interesting blogs, and then read it when you have time. I will be contributing too, about these keywords: Semantic Web, SPARQL, RDF, semantic mashups, Semantic Web Pipes, Web 3.0, Ontology Engineering, Domain ontology, Application ontology, Ontological Commitment, Conceptual modeling, business rules, Object Role modeling ORM, description Logic, OWL, , DOGMA, DogmaModeler,…