commit edcbb997c5b2264872c74ce04b266a43766ab3a4
parent 1936730c02e2ebe1d3c8cd4180e304a36b370200
Author: Antoine Amarilli <a3nm@a3nm.net>
Date: Mon, 3 Oct 2022 19:07:44 +0200
fixes following discussion with thomas
Diffstat:
README.md | | | 148 | ++++++++++++++++++++++++++++++++++++++++++++----------------------------------- |
addnoise.py | | | 16 | ++++++++++++++++ |
co2.py | | | 29 | ++++++++--------------------- |
compute.py | | | 37 | +++++++++++++++++-------------------- |
run.sh | | | 4 | +++- |
5 files changed, 127 insertions(+), 107 deletions(-)
diff --git a/README.md b/README.md
@@ -6,38 +6,66 @@ an academic conference.
It was used to compute the footprint of the [Highlights'22
conference](https://highlights-conference.org/2022/).
-## Input data format
-
-The input data should be provided as a CSV field containing the following
-fields:
-- Field 1: Name of participant
-- Field 2: Institution of participant
-- Field 3: 3-letter airport or metropolitan area code of origin (first leg, before the conference)
-- Field 4: Transportation means of the first leg: "train", "plane", "bus/coach", or
- "other" or "" to mean it is unknown.
-- Field 5: 3-letter code of destination (second leg, after the conference)
-- Field 6: Transportation means of the second leg
-- Field 7: "True" if the participant is extending their stay, i.e., travelling for
- scientific reasons other than the conference. For such participants, the
- computation will only take the longest of the two legs into account.
+## Data collection
+
+We collected information about the travel plans of participants using a [web
+form](https://framaforms.org/highlights-participant-travel-information-1664806487)
+([archive](https://web.archive.org/web/20221003161159/https://framaforms.org/highlights-participant-travel-information-1664806487)).
+To ensure that everyone filled the form, the link to payment was only given once
+the form was completed.
+
+We manually removed duplicate records and fake data.
+
+For people who did not fill in the details of their travel, we:
+
+- assumed that they were coming to/from the institution of their first
+ affiliation
+- when the transportation mechanism was unspecified, we assumed that trips of
+ <=400km were done by rail and trips of >400km were done by plane, following:
+ https://github.com/ConferenceCarbonTracker/CarbonFootprintAGU#44-mode-of-transport
+
+Afterwards, we discarded the name and institution of participants.
+
+We manually translated the free-form city and country to a
+machine-understandable location by searching by hand for the closest
+three-letter code (airport or metropolitan area). This step could be automated.
+
+The result is a CSV file in the following format:
+
+- Field 1: 3-letter airport or metropolitan area code of origin (first leg, before the conference)
+- Field 2: Transportation means of the first leg: "train", "plane", or "bus/coach".
+- Field 3: 3-letter code of destination (second leg, after the conference)
+- Field 4: Transportation means of the second leg
+- Additional fields, e.g., a field indicating if the participant is extending
+ their stay for scientific reasons other than the conference.
## Running the computation
You need python3, standard shell utilities, and `GeodSolve` from Debian package
`geographiclib-tools`.
-Run `./run.sh FILE CODE LAT LON` where:
+Run `./run.sh FILE CODE LAT LON NOISE` where:
- FILE is the CSV file above
- CODE is the 3-letter code used for local participants (their trips will be
ignored)
-- LAT and LON are the geographical coordinates
+- LAT and LON are the geographical coordinates where the conference is taking
+ place.
+- NOISE is the percentage of random error added to the distance (e.g., 0.2 for
+ 20%)
The script will generate:
- map.geojson: a Geojson file displaying the various points of travel with color
describing whether they are by plane or not. This can be plotted, e.g., with
[uMap](http://umap.openstreetmap.fr/fr/).
+- `anonymized_participants`, a comma-separated list of participants with headers
+ and with the following fields:
+ - Field 1: mode of first leg (as above)
+ - Field 2: distance of first leg in meters, with random error added
+ - Field 3: mode of second leg (as above)
+ - Field 4: distance of second leg in meters, with random error added
+ - All additional fields in the input are left as-is.
- `trips_with_footprint`, a comma-separated list of trips with the following
fields:
- Field 1: name (note that commas are dropped from names)
@@ -48,65 +76,22 @@ The script will generate:
- It will also output some aggregate values on the standard error output, and prepare temporary files `trips`
and `trips_with_dist`
-## Highlights'22 methodology
+## Footprint computation methodology
-### Registration form data collection
+### Local participants
-The Highlights registration form asked particiants:
-
-- "To estimate the carbon footprint of this edition of Highlights, please give
- us some information about your travel"
-- "Arriving from": city and country, free text
-- "Arriving by": other / plane / train / bus or coach / car / local transportation (for locals)
-- ditto for departure
-- Extended stays: we asked whether:
- - They participated to a co-located conference
- - They participated to an extended stay support scheme
- - They were "extending their stay for scientific reasons by another way"
-
-The fields were optional, but almost everyone filled them.
-
-### Processing and completing the registration form information
-
-We took the registration data and manually removed obviously fake submissions
-and apparent duplicates.
-
-We ignored local participants, for which we estimate a CO2 footprint of 0.
-
-For people who did not fill in the details of their travel, we:
-
-- assumed that they were coming to/from the institution of their first
- affiliation
-- when the transportation mechanism was unspecified, we assumed that trips of
- <=400km were done by rail and trips of >400km were done by plane, following:
- https://github.com/ConferenceCarbonTracker/CarbonFootprintAGU#44-mode-of-transport
-
-This gives us a list of trips: each participant has 2 trips, each trip has an origin and
-destination (one of them conference venue) and a transportation mode.
+We ignore local participants, for which we estimate a CO2 footprint of 0.
### Geocoding and distance computation
-We manually translated the free-form city and country to a
-machine-understandable location by searching by hand for the closest
-three-letter code (airport or metropolitan area). We used the OpenFlights
+ We used the OpenFlights
database airport-extended.dat on [this page](https://openflights.org/data.html) to convert these
to geographical coordinates, and used known geographic coordinates for the
conference venue. We used GeodSolve to compute the distance of each trip.
-### Adjusting for other scientific reasons
-
-For participants whose stay had other scientific justifications (no matter
-which), we counted only the longest of the two trips. The effect is basically to
-halve their emissions by considering that the conference carries half the
-responsibility. The reason why we do this instead of dividing the total by two
-is to make sure that we account for one of the "long trips" required between
-their institution and conference venue: indeed, some participants gave details
-of these long trips, whereas other gave details of one long trip and one trip to
-a neighboring place, e.g., for an extended stay.
+### Carbon footprint
-### Footprint computation
-
-Given this list of trips, we then compute their CO2 fotprint following the
+We compute the CO2 fotprint following the
[labos1point5](https://labos1point5.org/ges-1point5) data, which is adapted from
the French agency [Ademe](https://www.ademe.fr/).
@@ -129,3 +114,36 @@ the French agency [Ademe](https://www.ademe.fr/).
We then sum the total emissions to arrive at the final value.
+## Highlights'22 methodology
+
+### Data collection
+
+The [Highlights registration
+form](https://framaforms.org/highlights2022-on-site-registration-1652701135)
+([archive](https://web.archive.org/web/20220622164245/https://framaforms.org/highlights2022-on-site-registration-1652701135))
+asked particiants:
+
+- "To estimate the carbon footprint of this edition of Highlights, please give
+ us some information about your travel"
+- "Arriving from": city and country, free text
+- "Arriving by": other / plane / train / bus or coach / car / local transportation (for locals)
+- ditto for departure
+- Extended stays: we asked whether:
+ - They participated to a co-located conference
+ - They participated to an extended stay support scheme
+ - They were "extending their stay for scientific reasons by another way"
+
+The fields were optional, but almost everyone filled them.
+
+### Adjusting for other scientific reasons
+
+In the carbon footprint, to account for participants whose stay had other
+scientific justifications (no matter which), we counted only the longest of the
+two trips. The effect is basically to halve their emissions by considering that
+the conference carries half the responsibility. The reason why we do this
+instead of dividing the total by two is to make sure that we account for one of
+the "long trips" required between their institution and conference venue:
+indeed, some participants gave details of these long trips, whereas other gave
+details of one long trip and one trip to a neighboring place, e.g., for an
+extended stay.
+
diff --git a/addnoise.py b/addnoise.py
@@ -0,0 +1,16 @@
+#!/usr/bin/env python3
+
+import sys
+from random import uniform
+
+noise = float(sys.argv[1])
+
+print( "mode,distance in meters")
+
+for l in sys.stdin.readlines():
+ f = l.strip().split(',')
+ mode = f[0]
+ dist = float(f[3])
+ dist_anon = round(uniform(dist * (1-noise), dist * (1+noise)))
+ print(','.join((mode, str(dist_anon))))
+
diff --git a/co2.py b/co2.py
@@ -8,8 +8,6 @@ import json
import sys
from collections import defaultdict
-seen = set()
-
places = defaultdict(lambda: (0, 0, None))
n_trips = 0
@@ -22,23 +20,11 @@ co2_by_type = defaultdict(lambda : 0)
for l in sys.stdin.readlines():
f = l.strip().split(',')
- person = f[0]
- inst = f[1]
- mode = f[2]
- multitrip = f[3] == "True"
- lat = f[4]
- lon = f[5]
-
- if multitrip and person in seen:
- # for a multi-purpose trip, only count the first transport leg of that
- # person
- # we assume that the input is sorted by decreasing distance so that it's
- # the longest leg
- continue
-
- seen.add(person)
+ mode = f[0]
+ lat = f[1]
+ lon = f[2]
- distance = float(f[6])
+ distance = float(f[3])
if mode.strip() not in ['plane', 'train', 'bus/coach']:
if distance > 400000:
@@ -52,7 +38,7 @@ for l in sys.stdin.readlines():
k = (lat,lon)
plane = mode == "plane"
- places[k] = (places[k][0] + (1 if plane else 0), places[k][1] + 1, inst)
+ places[k] = (places[k][0] + (1 if plane else 0), places[k][1] + 1)
dist_by_type[mode] += distance
n_trips += 1
@@ -72,7 +58,7 @@ for l in sys.stdin.readlines():
co2 = (g_km_person * (distance / 1000.))/1000.
co2_by_type[mode] += co2
total_co2 += co2
- print (','.join([person, inst, str(distance), mode, str(co2)]))
+ print (','.join([str(distance), mode, str(co2)]))
## OUTPUT GEOJSON
@@ -86,7 +72,8 @@ for k in places.keys():
feature = {
"type": "Feature",
"properties": {
- "name":places[k][2], "_umap_options": {"color": color}
+ #"name":places[k][2],
+ "_umap_options": {"color": color}
},
"geometry": {
"type": "Point",
diff --git a/compute.py b/compute.py
@@ -27,28 +27,25 @@ n_extend = 0
with open(datafile, 'r') as f:
csvreader = csv.reader(f)
for row in csvreader:
- name = row[0].replace(',', '')
- institution = row[1].replace(',', '')
- fromcode = row[2]
- frommode = row[3]
- tocode = row[4]
- tomode = row[5]
- extend = row[6] == "True"
- added = False
+ fromcode = row[0]
+ frommode = row[1]
+ tocode = row[2]
+ tomode = row[3]
n_participants += 1
- if extend:
- n_extend += 1
+ if fromcode == localcode:
+ assert (frommode in ["local", "other", ""])
+ assert (tomode in ["local", "other", ""])
+ assert (tocode == localcode)
+ continue
+ if frommode == "":
+ frommode = "other"
+ if tomode == "":
+ tomode = "other"
+ assert (frommode in ["bus/coach", "plane", "train", "other"])
+ assert (tomode in ["bus/coach", "plane", "train", "other"])
+ n_nonlocals += 1
for (mode, code) in [(frommode, fromcode), (tomode, tocode)]:
- if code == localcode:
- continue
- added = True
- n_nonlocal_trips += 1
- print (','.join((name, institution, mode, str(extend),
- airports[code][0], airports[code][1])))
- if added:
- n_nonlocals += 1
+ print (','.join((mode, airports[code][0], airports[code][1])))
print("%d total participants" % n_participants, file=sys.stderr)
print("%d nonlocal participants" % n_nonlocals, file=sys.stderr)
-print("%d nonlocal trips" % n_nonlocal_trips, file=sys.stderr)
-print("%d extending" % n_extend, file=sys.stderr)
diff --git a/run.sh b/run.sh
@@ -7,6 +7,7 @@ FILE="$1"
LOCALCODE="$2"
LAT="$3"
LON="$4"
+NOISE="$5"
if [ ! -f airports.dat ]
then
@@ -15,6 +16,7 @@ then
fi
./compute.py "$FILE" "$LOCALCODE" > trips
-paste -d, trips <(cut -d, -f5,6 trips | tr ',' ' ' | sed "s/^/$LAT $LON /" | GeodSolve -i| cut -d ' ' -f3 ) | sort -t',' -k7,7nr > trips_with_dist
+paste -d, trips <(cut -d, -f2,3 trips | tr ',' ' ' | sed "s/^/$LAT $LON /" | GeodSolve -i| cut -d ' ' -f3 ) | sort -t',' -k4,4nr > trips_with_dist
+./addnoise.py "$NOISE" < trips_with_dist > trips_anonymized.csv
python3 co2.py < trips_with_dist > trips_with_footprint