Language and AI

Artificial intelligence does not speak, it generates.
In a world disrupted by language models such as ChatGPT, EFL scientifically examines what these AIs say about our own language, our cognitive structures… and their limitations. Language becomes both an object of analysis and a technical tool, at the crossroads of the humanities and computer science.

Linking language and computers

Between segmentation and simulation, artificial intelligence attempts to reproduce the complexity of human language. But can it grasp its intention, variation, and contextual depth?

© Pexels

Co-managers :

Benoît Crabbé    (LLF – Université Paris Cité)

Joseph Le Roux  (LIPN – Université Sorbonne Paris Nord)

Why combine linguistics and AI today?

Since the emergence of large language models (LLMs), language has become a raw material, an interface, and a technical issue for artificial intelligence. This theme aims to answer two questions:

  1. What can AI teach us about human language?
  2. What are the linguistic and cognitive limitations of current AI?

It is not just a question of improving the technical performance of models, but of understanding the epistemological, linguistic, social, and ethical issues related to their use.


Scientific questions related to the topic

Understanding language through models

  • Using AI as a tool to explore linguistic regularities (morphology, syntax, semantics, discourse).
  • Testing hypotheses on the cognitive structuring of language by comparing human and artificial models.

Analyzing models as linguistic objects

  • Studying the internal “grammar” of LLMs (e.g., GPT-4) and their learning modes.
  • Questioning their ability to handle variation, polyphony, ambiguity, and pragmatics.

Explore the social uses and applications of automated language.

  • Generative AI and linguistic inclusion/exclusion.
  • Impacts of corpus biases on generated responses.
  • Issues of responsibility, interpretability, and appropriation of language technologies.

Methodologies used

  • Computational linguistics: formal modeling, extraction of regularities, corpus annotation.
  • Comparative experimental analysis: human performance vs. machine performance on specific linguistic tasks.
  • Generated vs. natural corpora: comparison of structures, registers, and coherences.
  • Critique of training data: linguistic diversity, representativeness, gray areas.
  • Interdisciplinarity: intersection of linguistics, computer science, philosophy, and sociology.

Concrete applications and societal issues

  • Training: development of intelligent teaching tools that are sensitive to linguistic variation.
  • Ethics and regulation: critical evaluation of the use of LLMs in sensitive contexts (education, health, justice).
  • Inclusion: exploration of AI solutions for audiences who are distant from standardized language.
  • System improvement: integration of linguistic contributions to design more robust, explainable, and fair AI systems.

Partnerships and involvement in the EFL consortium

  • Collaboration with teams specializing in formal and computational linguistics (LLF, LaTTiCe, LIPN).
  • Dialogue with other work packages: linguistic diversity (to enrich data), cognition (to compare processing models), society (to examine usage).
  • Implementation of shared platforms for automatic language processing and generated vs. observed corpora.

In summary

AI offers a unique opportunity to rethink language as a system, a practice, and a technology. The EFL project aims to make it a critical testing ground, shedding light on both the cognitive foundations of language and the ethical contours of its automation.

Read more