<< ynada.com

Curriculum Vitae :: Cornelius Puschmann, M.A.

Last updated 13 March 2007. You can also get this in PDF format.

Benzenbergstrasse 39
40219 Düsseldorf
Germany

+49 211 5413546 (home)
+49 177 3036590 (mobile)

cornelius.puschmann@uni-duesseldorf.de

Education

August 2006 - present
Doctoral candidate in English Linguistics, Heinrich-Heine Universität Düsseldorf, Germany
Thesis instructor: Professor Dieter Stein

June 2003 - August 2003
University of California at Berkeley, USA

1999 - 2006
Heinrich-Heine Universität Düsseldorf, Germany
Magister Artium (equivalent of American Master degree) in English and Information Sciences.
Grade: 1,1 (A)

January 1990 - June 1990
A&M Junior High School, College Station (TX), USA

1988 - 1998
Abtei Gymnasium (High School), Duisburg, Germany

September 1986 - August 1987
Erlem Elementary School, Norwich, UK

1984 - 1988
Grundschule Gartenstrasse (Elementary School), Duisburg, Germany

Professional Experience

Starting in my time as an undergrad, I have interned and worked as a web applications developer for several companies and institutions. Among others, these include:


I mostly code in PHP (meaning in a LAMP environment), although I have also realized one major project in CFML (ColdFusion). In addition to MySQL, I have worked briefly with MS SQL and PostgreSQL and am fluent in XHTML and CSS. I also have basic knowledge of JavaScript and like playing around with APIs such as Google's GData. Since most of my coding is research-related these days, I can afford to use jEdit as my IDE, instead of having to resort to a professional solution. My developing experience has proven quite useful for my thesis work, as I am in the process of developing a web-based corpus based on RSS feeds (see below). Finally, I am familiar with the usual basic software for word processing, presentations etc (though I use OpenOffice, not MSO). Recently I've begun to experiment with statistical analysis via R as well as common corpus-related tools such as Wordsmith, TextSTAT and tOKo.

Research Interests

PhD Project

The corporate blog as an emerging genre of computer-mediated communication: features, constraints, discourse situation

"Genres are how things get done, when language is used to accomplish them" - this remark by James R. Martin aptly captures the role of discourse genres as more than just formal traditions. Purpose is always at the center of communication and arguably purpose is nowhere both as pronounced and complex as when institutions communicate, be it externally, with clients and partners, or internally, across departments and hierarchies. The ability to speak with a single voice, to communicate as unambiguously as possible and avoid any dissonance in the public awareness has long been central in corporate and institutional communication. Yet while such a carefully scripted and highly strategic approach to communication was well-adapted to traditional mass media it is increasingly problematic in the age of online discourse, where any message that is dissasociated from an identifyable speaker risks seeming displaced and fake. Blogs are a prime example of a new, highly individualized way of articulating oneself and the association of blogging with individuals, not institutions, is part of the appeal they possess for corporations. How does the duality of corporate interest and individual communicative goals manifest itself in company blogs? Are blogs really "less formal" and "more personal" than other forms of (institutional) communication? And if yes, what exactly do these terms mean?

These are some of the questions I seek to address with my research, analyzing a total of 130 corporate English language web blogs. I use both ethnographic-qualitative and empirical-quantitative data in order to describe what functions corporate blogs perform. Read on for a technical description of the project, or download the full thesis outline.

Technical description
I have developed a web-based corpus engine in PHP that checks a pre-determined list of RSS feeds and then stores the content in MySQL (with parsing done via MagpieRSS). It performs shallow parsing (tagging) compare individual sources. Key issues specifically related to blogs are a) variation from one post to the next in the same blog (via TreeTagger and stores the results in two separate tables (types/tokens). A number of statistical calculations are then performed to determine word count, sentence count, average word length, average sentence length, type-token ratio and other values. In addition to this, word- and POS-frequency lists are automatically compiled and concordances can be made. My main goal is to use a number of cross-source quantitative procedures to contrastivelyintra-source variation) and b) variation among different blogs (extra-source variation). My goal is thus not to asses a specific feature of language using blogs as a data source, but to asses blogs as a text type using their syntax and lexis as an indicator.

For a very brief example application of one of the above-mentioned measures - Heylighen and Dewaele's f-score - see this entry in my blog.

My blog largely focuses on my ongoing research (i.e. it's not a personal but professional medium for me). However, note that I use it both to play with concepts relating to CMC that interest me and to discuss these ideas with practitioners (i.e. corporate bloggers). It is not written in the mode of an academic publication and does not claim that status for itself.

Projects

Building linguistic corpora using blog data
As a side-effect of my work on corporate blogs, I have developed a strong interest in using blogs as a source of language data. In my view, it is a significant advantage of blog data over "generic" web sources that the textual material comes with annotations (when something was posted, who wrote it, what the author considers it to be about etc). This and various technical aspects of blogs (for example, that they are aggregated as XML vs. HTML for "regular" web pages) and the fact that blogs are not a single, clearly delineated genre has lead me to believe that it makes sense to build blog-based corpora. Such corpora can retain valuable meta-data, especially about the author, and this meta-data can be used for linguistic investigation into areas such as language and gender and the development of individual linguistic patterns over the life time. Examples of such investigations are the "gender markers" extracted by Schler et al. and the research into style and personality traits conducted by Nowson and Oberlander.

What's more is that change and variation can be tracked for both individuals and groups, and that observations can be made both in synchronic and diachronic terms (e.g. the chart to the left shows a simple measure of linguistic contextuality for three sources over time).

I have expanded the corpus tool that was initially developed for my PhD research accordingly, with the goal of producing a fully tagged 100-million-word corpus of English-language blogs in the near future.


eLanguage & Open Access
Apart from my academic work, I am also proud to be involved in a project that I hope will have a lasting impact on Linguistics: eLanguage. The goal of this project, which was initiated by the Linguistic Society of America, is to house a wide range of e-journals devoted to different subfields of the discipline under a single technical and organizational umbrella. The editors of the member journals inside such a hub are fully independent, but agree on a common set of rules (peer review, a common style sheet etc). They benefit from pooling their resources, both in terms of tech support and because the content from all member journals can be centrally aggregated, increasing every single member's visibility.

eScience and iScience
Another of my interests that goes beyond linguistics is how the Internet is reshaping the way we do research, how we learn and communicate. We have previously perceived the Web largely as a "read-only" medium, a convenient channel for transporting information, but not much more. With the spread of wikis and blogs we are increasingly documenting our interactions online, be it in private or professional contexts.

Science is not unaffected by this. When a publication "lives" on the Web, it is not only accessible to people around the globe, it is also potentially interactive. The academic article or paper inevitably moves from a monologic to a more dialogic form in a context where everything can be linked, commented and bookmarked. What's more is that models, visualisations and other multimedial elements also become interactive, a change that fundamentally upsets our conceptualization of research publications as static documents (records).
One area I am especially interested in is the creation of personalized information ecologies using existing web services such as del.icio.us, FlickrMany Eyes, CiteUlike and many others (can you think of one?) The first paradigm in this model is the individual researcher, the second one is the network of peers and collaborators. The individual and the network can be enabled and supported with all sorts of tools, which by themselves perform only limited functions (share a bibliography, share dates of conferences, maintain a blog) but together form a powerful whole. While many complex issues (e.g. cancer research, nuclear physics) need specialized tools, much of what defines the "day to day" of science can be vastly improved using existing tools and services - if we stop perceiving them as "not for serious research". iScience is a pragmatic, straightforward approach to the social web and academia that says let's play with what's there and make it ours instead of building yet more monolithic scientific software architectures that mistake complexity for power and solve problems we don't actually have.

Teaching


I am in the process of developing an electronic text book for my introductory course, using (of course) a blog and working with feedback from my students. Have a look at the blog and at this overview of topics covered in class.

Presentations

Held as part of the workshop on genre and the Web, which was in turn part of the Corpus Linguistics 2007 conference. My presentation abstract was submitted for blind peer-review by three experts before being accepted.
http://www.slideshare.net/coffee001/schemacmd-an-xmlbased-storage-schema-for-the-compilation-of-mixedsource-cmd-corpora

I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007.
http://www.slideshare.net/coffee001/elanguagenet-shifting-the-paradigm-in-linguistics/

This short presentation was held as part of the "Tag des wissenschaftlichen Nachwuchses" ("junior researchers' day") at the University of Duesseldorf on 22 June 2007.
http://www.slideshare.net/coffee001/what-mcdonalds-is-talking-about-a-computational-analysis-of-the-language-of-company-web-logs/

I was invited to present this methodological piece at the linguistic research colloquium of the University of Osnabrück.
http://www.slideshare.net/coffee001/quantitative-individuated-corpus-linguistics

Presentation held as part of the research colloquium of the Department of English Language and Linguistics.
http://www.slideshare.net/coffee001/lies-at-wal-mart-presented-at-duesseldorf-university-germany

Presentation held as part of the workshop "Syntactic Variation and Emerging Genres" at the 29th annual meeting of the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS).
http://www.slideshare.net/coffee001/variation-and-genrefication-in-blogs-presented-at-dgfs-2007-siegen-germany

Publications

Miscellaneous

Languages
Apart from a few computer languages, I am also proficient in German (first language), English (near-native) and French (though actual speakers of French may dispute this).

Editorial work
Together with Khan-Duc Kuttig and Petra B. Schubert, I currently serve as general editor of Register and Context.

Conferences
In concert with our partners at the Max Planck Digital Library, I am involved in the organization of the Berlin 6 Open Access Conference, which will take place in November 2008 in Duesseldorf. I also serve as the panel chair for the section New Forms of Scholarly Communication: Blogs, Wikis and Web 2.0 in Academia. Have a look at this post in my blog for more information.

Scholarly Societies
I am a member of the Linguistic Society of America (LSA), the International Pragmatics Association (IPrA) and the Deutsche Gesellschaft für Sprachwissenschaft (DGfS). I am also proud to serve as an ex officio member in LSA's Technical Advisory Committee (TAC) and the Committee for Member Services and IT (COMSIT).