Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts

Wednesday, December 22, 2010

Five-Million-Book Google Database Gets a Workout, and a Debate, in Its First Days

Ngram, Google’s new searchable dataset of words and phrases from 5.2 million published books, got quite a workout on its first day. Within 24 hours after its launching last Thursday afternoon, more than a million queries were run.

Various Web sites have had fun with the new technological toy since its unveiling, running idiosyncratic searches on topics of interest. For example, Tablet magazine focused on Jewish topics. The Atlantic compared “vampire” and “zombie,” and asked whether “pen” is mightier than “sword.” And Jezebel played with terminology about sex and relationships.

On an enormous scale, the database is the kind of resource that humanities scholars are increasingly using for their research, the subject of a New York Times series. And scholars and other interested observers have vigorously debated the reliability of this sort of data, pointing out previous problems with Google Books, including mistakes in dates, misattributed authors and errors in the actual texts as a result of misinterpretations by the automated scanning devices that copy the books.

Geoff Nunberg, a linguist at the University of California, Berkeley, who has been critical of Google Books data, still has his complaints, as he outlined in a Chronicle of Higher Education article. But he conceded that the error rate is much improved in this dataset.

Jean-Baptiste Michel, who designed the database with Google, said by e-mail this weekend that the team recognized that including information with errors was worse than not including it at all, so all books that did not pass strict standards for accurate labeling and scanning were filtered out.

“That is why we end up working with 5.2 million books and not the whole 15 million,” Mr. Michel wrote. (The 15 million figure refers to the number of published books that Google has digitally scanned so far.) “These filtering algorithms took us over a year to improve to our satisfaction. Indeed, if we hadn’t worked on them, we’d have published our very first version of the Ngrams, totally unfiltered, back in 2008.”

Their methodology is explained in detail in the supplemental materials attached to the paper by Mr. Michel and his collaborator, Erez Lieberman Aiden, published in the journal Science.

For their paper, Mr. Michel and Mr. Lieberman Aiden based their research on books published in English from 1800 to 2000. “We do not consider that trajectories outside of English 1800-2000 are scientifically validated,” Mr. Michel wrote. “In particular, before 1800 there are just too few books: one does not have enough statistical power.”

So while you can search back to 1500 on the Ngram database, don’t try using the information you might find to win tenure.

Mr. Lieberman Aiden, who has a Ph.D. in applied mathematics, also addressed the criticism that no humanists were on the research team. “I don’t think this is a very fair criticism,” he wrote in an e-mail on Tuesday. “I studied philosophy at Princeton as an undergrad, got a master’s degree in Jewish history, and actually took a leave of absence from a Ph.D. program in Jewish history when I went to grad school in the sciences (I did not return).

“Two of our other authors, Joseph Pickett (Ph.D., English language and literature, University of Michigan) and Dale Hoiberg (Ph.D., Chinese literature, University of Chicago), are the executive editor of the American Heritage Dictionary and the editor in chief of the Encyclopedia Britannica, respectively; although not academics, they are certainly humanists of profound influence whose expertise directly bears on the contents of the paper,” he added. “Furthermore, we spoke with dozens of other humanists throughout the development of the project, as can be seen in our acknowledgments.”

You can read more about the researchers’ work at www.culturomics.org.

This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php
Five Filters featured site: So, Why is Wikileaks a Good Thing Again?.


View the original article here

Tuesday, December 7, 2010

Google and the Victorians: The History Goes Way Back

December 3, 2010, 5:49 pm

Most computer database searches — where you use key words to retrieve documents — are based on something called Boolean logic. What you may not know is that the term refers to a 19th century mathematician named George Boole, who developed his now indispensable theory in the 1854 book “The Laws of Thought.”

Boole is one of the Victorians who inspired Dan Cohen, a historian at George Mason University, whose work I discuss in an article today, the second part of a series on how technology is transforming humanities scholarship.

Mr. Cohen and a fellow historian have been relying on Boolean logic a lot these days, as they mine Google’s vast database of English books published in the 19th century to search for new insights into the Victorian mind.

In a keynote address at the Victorians Institute conference held at the University of Virginia in October, Mr. Cohen presented preliminary findings of that research. He also shared anecdotes about Boole, mathematical logic, and the sectarian conflict of his day.

” ‘The Laws of Thought,’ ” Mr. Cohen said, “is as much a work of literary criticism as it is of mathematics.”

This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php
Five Filters featured article: Beyond Hiroshima - The Non-Reporting of Falluja's Cancer Catastrophe.


View the original article here

Friday, October 15, 2010

Microsoft’s Bing About to Include Information Regarding Facebook Users to Compete Google

facebook and microsoft(press release distribution) SAN FRANCISCO (DPA) Recent updates confirms that Microsoft Bing will combine the results of consultation with data available in the network, which already has more than 500 million users, with the aim of providing more relevant and personalized results.

Microsoft launched a new strategy to compete with Google on the market for Internet advertising, by agreement of the Bing with the social network Facebook. In this way, Microsoft and Facebook expanded the search with the options offered by social networks. The idea is that what appears on the list of results Bing combined with information from friends, acquaintances and relatives. Thus, online searches will become more personalized and more quickly shed most relevant results for the user.

Analysts believe the association an attempt by Microsoft to reduce the distance it takes its competitor Google. With its continued growth and its more than 500 million users, Facebook has a huge network of social connections. Google is also working in parallel to design the future of Internet search. Their project is ambitious, because by their knowledge of users want their own initiative to provide information they consider may be of interest.

In searching the Internet is not just about the connection between data, but also between people, said Microsoft in announcing the project jointly. Thus, the new “Liked Results” included results of Facebook contacts that have marked events, proposals or pages with the phrase “I like it.” “People make decisions using information from their friends,” said Microsoft, which gave the example of personal recommendations on movies, restaurants or mobile phones. With the new user can choose from the list on the results that have been expressed and their friends.

It will also be easier to find friends and acquaintances in Bing. Both companies stressed that respect the privacy of those involved and will only public information.

Internet search is more than 50 percent of millionaire business online advertising. The greater the number of users, the more ads attracts a search engine. Microsoft continues to have no weight in the market even after buying Yahoo! division specializing in it. According to September data from market research firm com Score, Bing was in the United States a share of 11.2 percent of the market, compared with 16.7 66.1 Yahoo and Google. In Europe, Google’s dominance is even greater.

Facebook CEO Mark Zuckerberg, believes that everything people do has a social component and therefore wants to create from its platform a network of connections for all aspects of life. Three years ago Microsoft bought a 1.6 per cent in the capital of the platform.

Posted by News Desk on Oct 15th, 2010 and filed under Computer and Internet, Technology. You can follow any responses to this entry through the RSS 2.0. You can skip to the end and leave a response. Pinging is currently not allowed.

This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php
Five Filters featured article: Beyond Hiroshima - The Non-Reporting of Falluja's Cancer Catastrophe.


View the original article here