<< ynada.com
Curriculum Vitae :: Cornelius Puschmann, M.A.
Last updated 13 March 2007. You can also get this in
PDF
format.
Benzenbergstrasse 39
40219 Düsseldorf
Germany
+49 211 5413546 (home)
+49 177 3036590 (mobile)
cornelius.puschmann@uni-duesseldorf.de
Education
August 2006 - present
Doctoral candidate in English Linguistics, Heinrich-Heine Universität
Düsseldorf, Germany
Thesis instructor: Professor
Dieter
Stein
June 2003 - August 2003
University of California at Berkeley, USA
1999 - 2006
Heinrich-Heine Universität Düsseldorf, Germany
Magister Artium (equivalent of American Master degree) in English and
Information Sciences.
Grade: 1,1 (A)
January 1990 - June 1990
A&M Junior High School, College Station (TX), USA
1988 - 1998
Abtei Gymnasium (High School), Duisburg, Germany
September 1986 - August 1987
Erlem Elementary School, Norwich, UK
1984 - 1988
Grundschule Gartenstrasse (Elementary School), Duisburg, Germany
Professional Experience
Starting in my time as an undergrad, I have interned and worked as a web
applications developer for several companies and institutions. Among others,
these include:
I mostly code in PHP (meaning in a
LAMP
environment), although I have also realized one major project in CFML
(ColdFusion).
In addition to MySQL, I have
worked briefly with MS SQL and
PostgreSQL and am fluent
in XHTML and
CSS. I
also have basic knowledge of
JavaScript
and like playing around with APIs such as Google's
GData.
Since most of my coding is research-related these days, I can afford to use
jEdit as my IDE, instead of having
to resort to a professional solution. My developing experience has proven quite
useful for my thesis work, as I am in the process of developing a web-based
corpus based on RSS
feeds (see below). Finally, I am familiar with the usual basic software for
word processing, presentations etc (though I use OpenOffice, not MSO). Recently
I've begun to experiment with statistical analysis via
R as well as common corpus-related
tools such as
Wordsmith,
TextSTAT
and tOKo.
Research Interests
-
text linguistics, specifically the investigation of emerging modes of
computer-mediated communication (blogs, social media)
-
corpus linguistics, especially the development of web-based corpora
-
quantitative and computational approaches to style and variation
-
register / domain-specific language (business)
PhD Project
The corporate blog as an emerging genre of computer-mediated
communication: features, constraints, discourse situation
"Genres are how things get done, when language is used to accomplish them" -
this remark by James R. Martin aptly captures the role of discourse genres as
more than just formal traditions. Purpose is always at the center of
communication and arguably purpose is nowhere both as pronounced and complex as
when institutions communicate, be it externally, with clients and partners, or
internally, across departments and hierarchies. The ability to speak with a
single voice, to communicate as unambiguously as possible and avoid any
dissonance in the public awareness has long been central in corporate and
institutional communication. Yet while such a carefully scripted and highly
strategic approach to communication was well-adapted to traditional mass media
it is increasingly problematic in the age of online discourse, where any
message that is dissasociated from an identifyable speaker risks seeming
displaced and fake. Blogs are a prime example of a new, highly individualized
way of articulating oneself and the association of blogging with individuals,
not institutions, is part of the appeal they possess for corporations. How does
the duality of corporate interest and individual communicative goals manifest
itself in company blogs? Are blogs really "less formal" and "more personal" than
other forms of (institutional) communication? And if yes, what exactly do these
terms mean?
These are some of the questions I seek to address with my research, analyzing a
total of 130 corporate English language web blogs. I use both
ethnographic-qualitative and empirical-quantitative data in order to describe
what functions corporate blogs perform. Read on for a technical description of
the project, or
download
the full thesis outline.
Technical description
I have developed a web-based corpus engine in PHP that checks a pre-determined
list of RSS feeds and then stores the content in MySQL (with parsing done via
MagpieRSS).
It performs shallow parsing
(tagging)
compare individual sources. Key issues specifically related to blogs are a)
variation from one post to the next in the same blog (via
TreeTagger
and stores the results in two separate tables (types/tokens). A number of
statistical calculations are then performed to determine word count, sentence
count, average word length, average sentence length, type-token ratio and other
values. In addition to this, word- and POS-frequency lists are automatically
compiled and concordances can be made. My main goal is to use a number of
cross-source quantitative procedures to contrastivelyintra-source variation) and
b) variation among different blogs (extra-source variation). My goal is thus not
to asses a specific feature of language using blogs as a data source, but to
asses blogs as a text type using their syntax and lexis as an indicator.
For a very brief example application of one of the above-mentioned measures -
Heylighen
and Dewaele's f-score - see
this
entry in my blog.
My blog largely focuses
on my ongoing research (i.e. it's not a personal but professional medium for
me). However, note that I use it both to play with concepts relating to CMC that
interest me and to discuss these ideas with practitioners (i.e. corporate
bloggers). It is not written in the mode of an academic publication and does not
claim that status for itself.
Projects
Building linguistic corpora using blog data
As a side-effect of my work on corporate blogs, I have developed a strong
interest in using blogs as a source of language data. In my view, it is a
significant advantage of blog data over "generic" web sources that the textual
material comes with annotations (when something
was
posted, who wrote it, what the author considers it to be about
etc). This and various technical aspects of blogs (for example, that they are
aggregated as XML vs. HTML for "regular" web pages) and the fact that blogs are
not a single, clearly delineated genre has lead me to believe that it makes
sense to build blog-based corpora. Such corpora can retain valuable meta-data,
especially about the author, and this meta-data can be used for linguistic
investigation into areas such as language and gender and the development of
individual linguistic patterns over the life time. Examples of such
investigations are the "gender markers" extracted by Schler et al. and the
research into style and personality traits conducted by Nowson and Oberlander.
What's more is that change and variation can be tracked for both individuals and
groups, and that observations can be made both in synchronic and diachronic
terms (e.g. the chart to the left shows a simple measure of linguistic
contextuality for three sources over time).
I have expanded the corpus tool that was initially developed for my PhD research
accordingly, with the goal of producing a fully tagged 100-million-word corpus
of English-language blogs in the near future.
eLanguage & Open Access
Apart from my academic work, I am also proud to be involved in a project that I
hope will have a lasting impact on Linguistics:
eLanguage. The goal of this
project, which was initiated by the
Linguistic
Society of America, is to house a wide range of e-journals devoted to
different subfields of the discipline under a single technical and
organizational umbrella. The editors of the member journals inside such a hub
are fully independent, but agree on a common set of rules (peer review, a common
style sheet etc). They benefit from pooling their resources, both in terms of
tech support and because the content from all member journals can be centrally
aggregated, increasing every single member's visibility.
eScience and iScience
Another of my interests that goes beyond linguistics is how the Internet is
reshaping the way we do research, how we learn and communicate. We have
previously perceived the Web largely as a "read-only" medium, a convenient
channel for transporting information, but not much more. With the spread of
wikis and blogs we are increasingly documenting our interactions online, be it
in private or professional contexts.
Science is not unaffected by this. When a publication "lives" on the Web, it is
not only accessible to people around the globe, it is also potentially
interactive. The academic article or paper inevitably moves from a monologic to
a more dialogic form in a context where everything can be linked, commented and
bookmarked. What's more is that models, visualisations and other multimedial
elements also become interactive, a change that fundamentally upsets our
conceptualization of research publications as static documents (records).
One area I am especially interested in is the creation of personalized
information ecologies using existing web services such as
del.icio.us,
Flickr, Many
Eyes,
CiteUlike and many
others (can you think of one?) The first paradigm in this model is the
individual researcher, the second one is the network of peers and collaborators.
The individual and the network can be enabled and supported with all sorts of
tools, which by themselves perform only limited functions (share a bibliography,
share dates of conferences, maintain a blog) but together form a powerful whole.
While many complex issues (e.g. cancer research, nuclear physics) need
specialized tools, much of what defines the "day to day" of science can be
vastly improved using existing tools and services - if we stop perceiving them
as "not for serious research". iScience is a pragmatic, straightforward approach
to the social web and academia that says let's play with what's there and
make it ours instead of building yet more monolithic scientific software
architectures that mistake complexity for power and solve problems we don't
actually have.
Teaching
-
Introduction to English Linguistics (Part I) (Winter 2007 / Spring 2008)
-
Introduction to English Linguistics (Part II) (Summer 2008)
-
Corpus Linguistics on and through the Internet (Summer 2005, with Theresa
Heyd)
I am in the process of developing an electronic text book for my introductory
course, using (of course) a blog and working with feedback from my students.
Have a look at the
blog and at
this
overview of topics covered in class.
Presentations
-
Corpora, Blogs and Linguistic Variation - Arguments for Using Structured
Web Data in Corpus Development (8 November 2007, University of
Paderborn, Germany)
A presentation on corpus development and the advantages of structured web
data (blogs) to gain new perspectives on style and individual differences in
language use.
http://www.slideshare.net/coffee001/corpora-blogs-and-linguistic-variation-paderborn
-
From Publishing to Communication - eLanguage, WALS and Digital
Linguistics (5 November 2007,
Max
Planck Institute for Evolutionary Anthropology, Leipzig, Germany)
This presentation touched upon some of the similarities between eLanguage
and the WALS project, which aims to make information on the world's
languages accessible on the Net using some pretty neat Web 2.0 technology.
Keep an eye on
www.wals.info
if you happen to be a language typologist... Many thanks to Martin
Haspelmath for inviting me and for an immensely interesting discussion.
http://www.slideshare.net/coffee001/wals-and-elanguage-leipzig-168798
-
Institutional Blogging - Sharing and linking organizational
knowledge, one post at a time (5 September 2007,
Max
Planck Digital Library, Munich, Germany)
Another practically oriented presentation, this one held at the Max Planck
Digital Library (MPDL) in Munich. I formulated a few general observations
about the use of blogs in organizations, pointing out (among other things)
that blogging is often misunderstood purely as a new method for PR and
marketing, when personal knowledge management (PKM) is in fact one important
way of utilizing blogs that may have far greater potential. Thanks to Robert
Forkel for inviting me.
http://www.slideshare.net/coffee001/institutional-blogging-sharing-and-linking-organizational-knowledge-one-post-at-a-time
-
Blogs or Flogs? Genre Conventions and Linguistic Practices in Corporate
Web Logs (31 August 2007,
Telematica
Instituut, Enschede, The Netherlands)
This practically-oriented presentation was held while I visited Telematica
Instituut in August 2007 on invitation from Lilia Efimova. The discussion
after the talk was very productive and I have closely integrated much of
what we talked about into my thesis research.
http://www.slideshare.net/coffee001/blogs-or-flogs-genre-conventions-and-linguistic-practices-in-corporate-web-logs/
-
SchemaCMD - An XML-based storage schema for the compilation of
mixed-source CMD corpora (27 July 2007, Birmingham, UK)
-
eLanguage.net: Shifting the paradigm in Linguistics (12 July 2007,
Vancouver, Canada)
-
What McDonald's is talking about: a computational analysis of the
language of company web logs (22 June 2007, Düsseldorf, Germany)
-
Quantitative Individuated Corpus Linguistics: A Speaker-Centric Approach
to Variation (5 June 2007, Osnabrück, Germany)
-
Lies at Wal-Mart: Style, Function and Discursive Strategy of a Corporate
Web Log (31 May 2007, Düsseldorf, Germany)
-
Variation and "Genrefication" in Blogs (28 February 2007, Siegen,
Germany)
Publications
Miscellaneous
Languages
Apart from a few computer languages, I am also proficient in German (first
language), English (near-native) and French (though actual speakers of
French may dispute this).
Editorial work
Together with Khan-Duc Kuttig and Petra B. Schubert, I currently serve as
general editor of
Register
and Context.
Conferences
In concert with our partners at the
Max
Planck Digital Library, I am involved in the organization of the
Berlin
6 Open Access Conference, which will take place in November 2008 in
Duesseldorf. I also serve as the panel chair for the section New Forms of
Scholarly Communication: Blogs, Wikis and Web 2.0 in Academia. Have a look
at
this
post in my blog for more information.
Scholarly Societies
I am a member of the
Linguistic
Society of America (LSA), the
International
Pragmatics Association (IPrA) and the
Deutsche
Gesellschaft für Sprachwissenschaft (DGfS). I am also proud to serve as an
ex officio member in LSA's Technical Advisory Committee (TAC) and the Committee
for Member Services and IT (COMSIT).