Archive for the ‘Data management and mining’ Category

Finding a better answer

Thursday, August 9th, 2007

Don’t you ever wish you could just open your favorite browser, type a question (in natural language, not computer speak) in the input box, then wait a millisecond for the right answer? Or, better yet, just turn on the computer and verbally ask a question and wait for a response (think the Starship Enterprise here)? Spencer Tracy and Katharine Hepburn dealt with the “information search & retrieval” problem in the movie “The Desk Set” from 1957 in which a giant computer was brought in to supplement a staff of librarians, with the thought that the computer would aid in efficiency. The computer took your question, submitted on a sheet of paper, did some crunching in the background, then spit out the answer - correctly (most of the time). Maybe even better than that would be something like a virtual librarian (think the one in Neal Stephenson’s 1992 book Snow Crash), an avatar who takes your question then pilfers through, presumably, yottabytes of data in milliseconds and gives an answer. Of course, the avatar (nothing more than code brought to life) is incapable of thinking, which is where the real problem lies. We haven’t realized Stephenson’s or Gene Roddenberry’s vision yet, but plenty of folks are working on it. To get a glimpse of the current status on this front as well as where we might be headed, check out “The Ultimate Answer Machine” in the Aug. 6th issue of InformationWeek or read it online here (same article, different title).

What do E.T. and cancer research have in common?

Sunday, July 8th, 2007

For all the extraterrestrial fans out there and in recognition of the 60th Anniversary of the Roswell incident, it seems only natural to take a look at the latest in public distributing computing. Purchasing a supercomputer (or time on one) is one way to perform research that requires heavy computational power. Another is to utilize the idle time and computing power of broadband connected, public PCs. One such project (and one you Roswell buffs should appreciate) would be the SETI@home effort, established in 1999 to use PC computing cycles to detect radio signals from space. If interested, here’s a pretty good primer on the project, though dated a little. But many projects are cropping up of a humanitarian nature to take advantage of the growing number of personal computers worldwide. Interested in letting your own computer help in cancer research, climate change research, etc? This site might be of interest.

Managing Data

Tuesday, June 19th, 2007

If you’re interested in the previous post about emerging organizational structures and their challenges in utilizing/managing the lifecycle of data, especially within academia, then the upcoming issue of CTWatch Quarterly should be of interest. Among the articles in the upcoming issue is one by Herbert Van de Sompel and Carl Lagoze in which digital interoperability within scholarly communication will be the focus. Expect some very interesting discussion and information about the Object Re-Use and Exchange (ORE) project of the Open Archives Initiative (OAI).

Revisiting digitization

Monday, November 7th, 2005

In a post from last month (, the digitization of books by Google was mentioned. Amazon and Microsoft are both in the picture as well. Bringing information to the masses, especially in the form of published material, is taking on new levels of salience with many web-based businesses (especially the book publishing industry). This article on book digitization revisits the issue. What’s not being mentioned much is the role of the hardware in the effort. E-books aren’t new nor are the technologies created to view them. But e-books have never really caught on, and a big reason is the display technology. Palm, Sony, and Philips Electronics are just three players who have tested the e-book waters, but the display technology still can’t compensate for the high contrast of print, at least not that’s widely portable and affordable. And haptics still hasn’t produced a replacement for people’s comfort with paper.

Data Intensive Science University Network

Thursday, August 4th, 2005

NSF recently awarded a group of universities $10 million over five years to set up and operate a grid that will allow researchers and students to access physics data produced by the Large Hadron Collider at CERN in Geneva, Switzerland. The Data Intensive Science University Network, or DISUN for short, will provide access to results from the Compact Muon Solenoid (CMS) experiment, which will account for a portion of the petabytes of data produced by the Collider annually. The CMS effort will also contribute to other grid projects including the Open Science Grid.

More detailed information about the project can be found in Supercomputing Online’s story about DISUN from last week.

What’s next for digital libraries?

Monday, July 18th, 2005

The latest issue of D-Lib Magazine has an interesting commentary on the future of digital libraries by Clifford Lynch, Executive Director of the Coalition for Networked Information. Tracing the evolution of digital libraries since the 1960s, his article examines some of the more recent accomplishments and concludes with a list of some of the more interesting issues facing digital library research. Digital libraries play an integral role in cyberinfrastructure but are often underemphasized compared to more glamorous components such as supercomputers and fast networks.

The moderators and/or administrators of this weblog reserve the right to edit or delete ANY content that appears on the site. In other words, the moderators and administrators have complete discretion over the removal of any content deemed by them to be inappropriate, in full or in part.

Any opinions expressed on this site belong to their respective authors and are not necessarily shared by the sponsoring institutions or the National Science Foundation.

Any trademarks or trade names, registered or otherwise, that appear on this site are the property of their respective owners and, unless noted, do not represent endorsement by the editors, publishers, sponsoring institutions, the National Science Foundation, or any other member of the CTWatch team.

No guarantee is granted by CTWatch that information appearing in the Blog is complete or accurate. Information on this site is not intended for commercial purposes.